Orth orthographyabbreviations and variants
Figure 27. This is what the CAMBIO/TIDES Fact DB knows about South Korea
The lexicon for a given language is a collection of superentries which are indexed by the citation form of the word or the phrasal lexical unit (set expression). A superentry includes all the lex-emes which have the same base written form, regardless of syntactic category, pronunciation, or sense. Each lexicon entry is comprised of a number of zones corresponding to the various types of lexical information. The zones containing information for use by an NLP system are: CAT (lex-ical category), ORTH (orthography—abbreviations and variants), PHON (phonology), MORPH (mor-phological irregular forms, class or paradigm, and stem variants or “principal parts”), SYN (syntactic features such as attributive for adjectives), SYN-STRUC (indication of sentence- or phrase-level syntactic dependency, centrally including subcategorization) and SEM-STRUC (lexical semantics, meaning representation). The following scheme, in a BNF-like notation, summarizes the basic lexicon structure. Some additional information is added for human consumption in the ANNOtations zone.
|
|||
---|---|---|---|
ORTHOGRAPHIC-FORM: | |||
|
buy-v1 |
|
v |
|
||
---|---|---|---|---|---|
stem-v | |||||
def | |||||
“when A buys T from S, A acquires possession of T previously owned | |||||
ex | |||||
|
|||||
|
time-stamp | ;the acquirer and the date | |||
syn-class |
Page 196