Explanatory combinatorial dictionary


An explanatory combinatorial dictionary is a type of monolingual dictionary designed to be part of a meaning-text linguistic model of a natural language. It is intended to be a complete record of the lexicon of a given language. As such, it identifies and describes, in separate entries, each of the language's lexemes and phrasemes. Among other things, each entry contains a definition that incorporates a lexeme's semantic actants complete information on lexical co-occurrence ; an extensive set of examples. The ECD is a production dictionary — that is, it aims to provide all the information needed for a foreign learner or automaton to produce perfectly formed utterances of the language. Since the lexemes and phrasemes of a natural language number in the hundreds of thousands, a complete ECD, in paper form, would occupy the space of a large encyclopaedia. Such a work has yet to be achieved; while ECDs of Russian and French have been published, each describes less than one percent of the vocabulary of the respective languages.
The ECD was proposed in the late 1960s by Aleksandr Žolkovskij and Igor Mel'čuk and was later further developed by Jurij Apresjan. Three ECDs are currently available in print, one for Russian, and two for French. A dictionary of Spanish collocations—DICE —is under development.

Characteristics of an ECD

A complete ECD of a language would provide an entry for every lexeme, construction, or idiom—referred to collectively as "Lexical Units" —in use in the language. Entries in the ECD are based on the semantic definition of an LU, and each entry contains a complete list of its collocations and lexical functions as well.
Entries for historically-related Lexical Units which are homophones and share significant semantic component are grouped into larger units called "vocables," thereby acknowledging polysemy while maintaining the distinct status of the independent items in question. The English vocable improve, for example, includes six Lexical Units, each of which is provided a separate lexical entry:
IMPROVE, verb

The lexicographic numbers reflect degrees or levels of semantic distance between Lexical Units within a vocable: Roman numerals mark the highest-level semantic groupings, while Arabic numerals mark the next highest level, and letters indicate the lowest level distances. The four lexemes grouped under IMPROVEI, for example, are considered to be closer to each other than to IMPROVEII or IMPROVEIII, because the meanings of each of IMPROVEI.1b and IMPROVEI.2 actually include the meaning of IMPROVEI.1a. IMPROVEI.1a and IMPROVEI.1b are even more closely related because in English there are many pairs of words—specifically, labile or ambitransitive verbs—that are related by the semantic alternation ’P’ ~ ‘cause1 to P’.
The subscript and superscript numbers attached to words in the definition refer to subsenses and homophonous entries for a word as given in the Longman Dictionary of Contemporary English —thus, “device11” refers to the first entry for device in this dictionary, first subsense.

Structure of the ECD entry

An ECD entry for a given Lexical Unit, let’s call it "L", is divided into three major sections or "zones":

The semantic zone

The semantic zone describes the semantic properties of L and consists of two sub-zones:

The phonological/graphematic zone

The phonological/graphematic zone gives all of the data on L’s phonological properties. Here again we find two sub-zones:

The co-occurrence zone

The co-occurrence zone presents all of the data on L’s combinatorial properties. It is organized into five sub-zones—morphological, syntactic, lexical, stylistic, and pragmatic.