Automatic Machine Learning-based OLAP Measure Detection for Tabular Data - Systèmes d’Informations Généralisées Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

Automatic Machine Learning-based OLAP Measure Detection for Tabular Data

Résumé

Nowadays, it is difficult for companies and organizations without Business Intelligence (BI) experts to carry out data analyses. Existing automatic data warehouse design methods cannot treat with tabular data commonly defined without schema. Dimensions and hierarchies can still be deduced by detecting functional dependencies, but the detection of measures remains a challenge. To solve this issue, we propose a machine learning-based method to detect measures by defining three categories of features for numerical columns. The method is tested on real-world datasets and with various machine learning algorithms, concluding that random forest performs best for measure detection.
Fichier principal
Vignette du fichier
dawak2022yz.pdf (368.47 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03668454 , version 1 (03-10-2022)

Licence

Paternité

Identifiants

Citer

Yuzhao Yang, Fatma Abdelhedi, Jérôme Darmont, Franck Ravat, Olivier Teste. Automatic Machine Learning-based OLAP Measure Detection for Tabular Data. 24th International Conference on Big Data Analytics and Knowledge Discovery (DaWaK 2022), Aug 2022, Vienna, Austria. pp.173-188, ⟨10.1007/978-3-031-12670-3_15⟩. ⟨hal-03668454⟩
175 Consultations
105 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More