Service interruption on Monday 11 July from 12:30 to 13:00: all the sites of the CCSD (HAL, EpiSciences, SciencesConf, AureHAL) will be inaccessible (network hardware connection).
Skip to Main content Skip to Navigation
Journal articles

A two-tier corpus-based approach to robust syntactic annotation of unrestricted corpora

Abstract : This article gives a state of the art of robust parsers and proposes a more efficient automatic way of syntactically annotating corpora based on a diagnosis of a sentence before the application of specialized grammars. After describing some available systems and showing their limits in terms of parsing certain type of raw corpora, a two-tier approach is proposed for the architecture of a robust parser. The splitting of the grammar rules into several modules permits to formalize first core sentences and in a second time some syntactic phenomena containing punctuation or implying structural ambiguities. The advantage of this approach is, for any kind of corpora, the application of a single optimized grammar followed by the parser's adaptation to the presence of certain phenomena which are specifically processed. This strategy guarantees high precision and recall rates for any kind of unrestricted corpora. MOTS-CLÉS : Analyseurs robustes, analyseurs de surface, grammaires de constituants vs gram-maires de dépendances, annotation syntaxique de corpus tout-venant.
Complete list of metadata

Cited literature [29 references]  Display  Hide  Download
Contributor : Núria Gala Pavia Connect in order to contact the contributor
Submitted on : Wednesday, April 4, 2018 - 11:09:18 AM
Last modification on : Wednesday, February 23, 2022 - 3:08:01 PM


Files produced by the author(s)


  • HAL Id : hal-01758031, version 1



Núria Gala. A two-tier corpus-based approach to robust syntactic annotation of unrestricted corpora. Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2001. ⟨hal-01758031⟩



Record views


Files downloads