Skip to Main content Skip to Navigation
Journal articles

A two-tier corpus-based approach to robust syntactic annotation of unrestricted corpora

Abstract : This article gives a state of the art of robust parsers and proposes a more efficient automatic way of syntactically annotating corpora based on a diagnosis of a sentence before the application of specialized grammars. After describing some available systems and showing their limits in terms of parsing certain type of raw corpora, a two-tier approach is proposed for the architecture of a robust parser. The splitting of the grammar rules into several modules permits to formalize first core sentences and in a second time some syntactic phenomena containing punctuation or implying structural ambiguities. The advantage of this approach is, for any kind of corpora, the application of a single optimized grammar followed by the parser's adaptation to the presence of certain phenomena which are specifically processed. This strategy guarantees high precision and recall rates for any kind of unrestricted corpora. MOTS-CLÉS : Analyseurs robustes, analyseurs de surface, grammaires de constituants vs gram-maires de dépendances, annotation syntaxique de corpus tout-venant.
Complete list of metadatas

Cited literature [29 references]  Display  Hide  Download

https://hal-amu.archives-ouvertes.fr/hal-01758031
Contributor : Núria Gala Pavia <>
Submitted on : Wednesday, April 4, 2018 - 11:09:18 AM
Last modification on : Thursday, April 9, 2020 - 11:50:05 AM

File

gala-tal42_01.pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-01758031, version 1

Collections

Citation

Núria Gala. A two-tier corpus-based approach to robust syntactic annotation of unrestricted corpora. Traitement Automatique des Langues, Lavoisier (Hermes Science Publications) / ATALA (Association pour le Traitement Automatique des Langues), 2001. ⟨hal-01758031⟩

Share

Metrics

Record views

121

Files downloads

82