Semantic role labeling


In natural language processing, semantic role labeling is the process that assigns labels to words or phrases in a sentence that indicate their semantic role in the sentence, such as that of an agent, goal, or result.
It consists of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles. For example, given a sentence like "Mary sold the book to John", the task would be to recognize the verb "to sell" as representing the predicate, "Mary" as representing the seller, "the book" as representing the goods, and "John" as representing the recipient. This is an important step towards making sense of the meaning of a sentence. A semantic analysis of this sort is at a lower-level of abstraction than a syntax tree, i.e. it has more categories, thus groups fewer clauses in each category. For instance, "the book belongs to me" would need two labels such as "possessed" and "possessor" whereas "the book was sold to John" would need two other labels such as "goal" and "receiver" even though these two clauses would be very similar as far as "subject" and "object" functions are concerned.

History

The FrameNet project produced the first major computational lexicon that systematically described many predicates and their corresponding roles. Daniel Gildea and Daniel Jurafsky developed the first automatic semantic role labeling system based on FrameNet. The PropBank corpus added manually created semantic role annotations to the Penn Treebank corpus of Wall Street Journal texts. Many automatic semantic role labeling systems have used PropBank as a training dataset to learn how to annotate new sentences automatically.