Fondazione GRINS
Growing Resilient,
Inclusive and Sustainable
Galleria Ugo Bassi 1, 40121, Bologna, IT
C.F/P.IVA 91451720378
Finanziato dal Piano Nazionale di Ripresa e Resilienza (PNRR), Missione 4 (Infrastruttura e ricerca), Componente 2 (Dalla Ricerca all’Impresa), Investimento 1.3 (Partnership Estese), Tematica 9 (Sostenibilità economica e finanziaria di sistemi e territori).



GRINS THEMATIC AREAS
RESOURCES
The novel context of Big Data has demonstrated that classical relational databases are not suitable: novel platforms for managing an incredible variety of datasets have become necessary, as demonstrated by the popularity of “data lakes” and “data lakehouses”.
One common issue of modern data platforms is to detect pairs of datasets that concern the same topic. However, a matching that is purely syntactic is not effective: the exploitation of modern AI techniques for Natural-Language Processing, such as word embedding and sentence embedding, promise to address the issue in a (more or less) semantic way.
The contribution of the paper is a novel methodology (called “TopicRank”) for flexible querying data platforms, so as to find out pairs of datasets that concern the same topic, on the basis of the textual description that accompany datasets as meta-data. The paper presents the results of a preliminary experiment that was conducted on a real pool of datasets.
AKNOWLEDGEMENTS
This study was funded by the European Union - NextGenerationEU, in the framework of the GRINS - Growing Resilient, INclusive and Sustainable project (GRINS PE00000018). The views and opinions expressed are solely those of the authors and do not necessarily reflect those of the European Union, nor can the European Union be held responsible for them.
CITE THIS WORK