Crowdsourcing Dialect Characterization through Twitter - Aix-Marseille Université Accéder directement au contenu
Article Dans Une Revue PLoS ONE Année : 2014

Crowdsourcing Dialect Characterization through Twitter

David Sanchez
  • Fonction : Auteur

Résumé

We perform a large-scale analysis of language diatopic variation using geotagged microblogging datasets. By collecting all Twitter messages written in Spanish over more than two years, we build a corpus from which a carefully selected list of concepts allows us to characterize Spanish varieties on a global scale. A cluster analysis proves the existence of well defined macroregions sharing common lexical properties. Remarkably enough, we find that Spanish language is split into two superdialects, namely, an urban speech used across major American and Spanish citites and a diverse form that encompasses rural areas and small towns. The latter can be further clustered into smaller varieties with a stronger regional character.
Fichier principal
Vignette du fichier
fetchObject.pdf (1.18 Mo) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01242109 , version 1 (11-12-2015)

Identifiants

Citer

Bruno Goncalves, David Sanchez. Crowdsourcing Dialect Characterization through Twitter. PLoS ONE, 2014, 9 (e112074 ), ⟨10.1371/journal.pone.0112074⟩. ⟨hal-01242109⟩
95 Consultations
115 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More