CORNELIA
- Open-Class Word List (3,000 word forms)
- Corpus NC-A1
- Original corpus: 5-10 original articles from the Wikipedia about culture-specific subjects (minimum of 5,000 words), in separate files, in plain text format with UTF-8 encoding
- List of at least 1,000 noun phrases appearing in the corpus with the following characteristics:
- the length of the NP must be equal or greater than 2 words (one-word NP's must be excluded): Geneva
- NP's must not contain foreign words: the city of Genève(note that "the city of Geneva" is OK)
- NP's must be continuous (there cannot be any extra-content, e.g., parentheses, inside the NP): the second most populous city in Switzerland (after Zurich)(note that the NP will be "the second most populous city in Switzerland")
- NP's must not contain verbs, even when used as nouns, adjectives or adverbs: French-speaking part of Switzerland,numerous international organizations, including the headquarters of many of the agencies of the United Nations and the Red Cross(in the latter case, there will be 2 NP's: "numerous international organizations" and "the headquarters... Red Cross")
- NP's must be original (no change should be made to the original text from the Wikipedia)
- NP's must ignore nesting (only the longest NP must be considered): "the headquarters of many of the agencies of the United Nations and the Red Cross" must be treated as a single NP (the inner NP's, such as "the agencies of the United Nations and the Red Cross" must not be extracted from the longer NP)
- NP's must be unique (repetitions must be ignored)
- NP's must be provided one per line in a plain text file, with UTF-8 encoding.
 
- the length of the NP must be equal or greater than 2 words (one-word NP's must be excluded): 
 
The completion of the post-workshop tasks is not mandatory but any intermediate-level workshop will only accept candidates having finished all A1 activities described in FoR-UNL.
FOLLOW-UP
The following projects will be open upon the accomplishment of the post-workshop tasks
- BRUNO-A1 (open only for languages where number of subcategorization frames (all languages) > 15 and number of paradigms (inflectional languages) > 15): 2,000 entries (around 4,000 UNLdots)
- NC-A1: 1,000 entries (3,000 UNLdots)
ADDITIONAL MATERIAL
Open Class Word List
Extracted from the most frequent words in Wikipedia
| Language | File | 
|---|---|
| Arabic | ar_words.xls | 
| Armenian | hy_words.xls | 
| Bulgarian | bg_words.xls | 
| Chinese | zh_words.xls | 
| Kannada | kn_words.xls | 
| Khmer | km_words.xls | 
| Malay | ms_words.xls | 
| Punjabi | pa_words.xls | 
| Ukrainian | uk_words.xls | 
NP Examples
| original text | NP | 
|---|---|
| Geneva is the second most populous city in Switzerland (after Zurich) and is the most populous city of Romandy, the French-speaking part of Switzerland. Situated where the Rhone exits Lake Geneva, it is the capital of the Republic and Canton of Geneva. The municipality (ville de Genève) has a population (as of March 2013) of 194,245, and the canton (République et Canton de Genève, which includes the city) has 472,530 residents. In 2007, the urban area, or agglomération franco-valdo-genevoise (Great Geneva or Grand Genève in French) had 1,240,000 inhabitants in 189 municipalities in both Switzerland and France. | the second most populous city in Switzerland | 
SSS Examples
| sentence | SSS | 
|---|---|
| book | NH(book) | 
| the book | NS(book;the) | 
| beautiful book | NA(book;beautiful) | 
| book of John | NA(book;:01) PC:01(of;John) | 
| the book of John | NS(book;the) NA(book;:01) PC:01(of;John) | 
| the beautiful book of John | NS(book;the) NA(book;beautiful) NA(book;:01) PC:01(of;John) | 
| the book of Math of John | NS(book;the) NA(book;:01) PC:01(of;Math) NA(book;:02) PC:02(of;John) | 
| the book about the construction of Babel | NS(book;the) NA(book;:01) PC:01(about;:02) NS:02(construction;the) NA:02(construction;:03) PC:03(of;Babel) | 
UNL Simplified Examples
| sentence | UNL | 
|---|---|
| book | book | 
| the book | book.@def | 
| beautiful book | mod(book;beautiful) | 
| book of John | pos(book;John) | 
| the book of John | pos(book.@def;John) | 
| the beautiful book of John | mod(book.@def;beautiful) pos(book.@def;John) | 
| the book of Math of John | cnt(book.@def;Math) pos(book.@def;John) | 
| the book about the construction of Babel | cnt(book.@def;:01) obj(construction.@def;Babel) |