Identification of Ambiguous Multiword Expressions Using Sequence Models and Lexical Resources
Résumé
We present a simple and efficient tagger capable of identifying highly ambiguous multiword expressions (MWEs) in French texts. It is based on conditional random fields (CRF), using local context information as features. We show that this approach can obtain results that, in some cases, approach more sophisticated parser-based MWE identification methods without requiring syntactic trees from a tree-bank. Moreover, we study how well the CRF can take into account external information coming from a lexicon.
Domaines
Informatique et langage [cs.CL]
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...