UNL-NL Memory: Difference between revisions

From UNLwiki
Jump to navigationJump to search
imported>Martins
No edit summary
 
imported>Martins
No edit summary
 
(8 intermediate revisions by the same user not shown)
Line 1: Line 1:
The '''UNLization Memory''', or simply '''UM''', is a set of mappings between a given natural language and UNL. It is claimed to improve the results of [[UNLization]] process, as it provides them with extralinguistic information normally required for solving ambiguities, anaphora and co-reference in natural language analysis and generation. The UNL KB should be provided as XML table whose schema is presented below.
The '''UNL<->NL Memory''' is a set of mappings between a given natural language and UNL. It may be unidirectional (UNL-NL Memory or NL-UNL Memory) or bidirectional (UNL<->NL Memory). It is used to improve and normalize the results of the [[UNLization]] and the [[NLization]], as it contain segments that have been previously UNLized or NLized.<br/><br />
 
The UNL<->NL Memory may be provided in two different formats:
The UNL UM may be provided in two different formats:
*Extended, in TMX; or
*Extended, in TMX; or
*Simplified, as a set of [[Grammar_Specs#Disambiguation_Rules|network disambiguation rules]]
*Simplified, as a set of [[Grammar_Specs#Disambiguation_Rules|network disambiguation rules]]
Line 9: Line 8:
== Extended format ==
== Extended format ==


UNL UM entries in extended format must comply with the [[http://www.lisa.org/fileadmin/standards/tmx1.4/tmx.htm Translation Memory eXchange Specs]], as follows:
UNL<->NL Memory entries in extended format must comply with the [http://www.gala-global.org/lisa-oscar-standards Translation Memory eXchange Specs], as follows:


     <tu>
     <tu>
Line 23: Line 22:
<seg> is the beginning of the translation segment<br />
<seg> is the beginning of the translation segment<br />
</seg> is the end of the translation segment<br />
</seg> is the end of the translation segment<br />


== Simplified format ==
== Simplified format ==


UNL UM entries in simplified format must have the following structure:
UNL<->NL Memory entries in simplified format must be represented as a set of [[Grammar_Specs#Disambiguation_Rules|network disambiguation rules]], as follows:


  equ(SOURCE;TARGET)=DC;
  equ(SOURCE;TARGET)=DC;
Line 35: Line 33:
SOURCE is the source segment;<br />
SOURCE is the source segment;<br />
TARGET is the target segment; <br />
TARGET is the target segment; <br />
DC is the degree of certainty (i.e., the likelihood of the relation between the SOURCE and the TARGET)
DC is the degree of certainty (i.e., the likelihood of the relation between the SOURCE and the TARGET)<br />

Latest revision as of 12:12, 7 April 2014

The UNL<->NL Memory is a set of mappings between a given natural language and UNL. It may be unidirectional (UNL-NL Memory or NL-UNL Memory) or bidirectional (UNL<->NL Memory). It is used to improve and normalize the results of the UNLization and the NLization, as it contain segments that have been previously UNLized or NLized.

The UNL<->NL Memory may be provided in two different formats:


Extended format

UNL<->NL Memory entries in extended format must comply with the Translation Memory eXchange Specs, as follows:

   <tu>
       <tuv xml:lang="en"><seg>a good deal</seg><tuv>
       <tuv xml:lang="unl"><seg>400059171</seg><tuv>
   </tu>
    

Where:
<tu> is the beginning of the translation unit
</tu> is the end of the translation unit
<tuv> is the beginning translation unit variant
</tuv> is the end of the translation unit variant
<seg> is the beginning of the translation segment
</seg> is the end of the translation segment

Simplified format

UNL<->NL Memory entries in simplified format must be represented as a set of network disambiguation rules, as follows:

equ(SOURCE;TARGET)=DC;

Where:
equ is the UNL relation for "equivalent";
SOURCE is the source segment;
TARGET is the target segment;
DC is the degree of certainty (i.e., the likelihood of the relation between the SOURCE and the TARGET)