Hierarchical clustering for property graph schema discovery

The property graph model is becoming increasingly popular among users and is currently employed by several open-source and commercial graph database systems. Although property graphs are widely adopted, there is a lack of understanding of their underlying schema structure. In particular, the schema discovery problem consists of extracting the schema concepts from a property graph. A property graph schema helps build a concise description of the data it represents, to make it more digestible for humans and interactive processes, as well as usable for query optimization purposes. In this paper, we address the property graph schema discovery problem and introduce the GMMSchema method based on hierarchical clustering using a Gaussian Mixture Model, which accounts for both label and property information on nodes. We experimentally analyze the accuracy and performance of GMMSchema, compared to those of its closest competitor, and showcase its superiority on several commonly used datasets, including real-world ones, such as the Covid19 knowledge graph, as well as the Fib25 and Mb6 NeuPrint graphs.

Mots clés

Graph Schema Hierarchical Clustering Property graph

Domaines

Base de données [cs.DB] Intelligence artificielle [cs.AI]

Fichier principal

EDBTShort_2022.pdf (936)

Origine	Fichiers éditeurs autorisés sur une archive ouverte

Stefania Dumbrava : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03692293

Soumis le : jeudi 9 juin 2022-16:27:16

Dernière modification le : mardi 17 septembre 2024-15:52:04

Archivage à long terme le : samedi 10 septembre 2022-19:34:45

Dates et versions

hal-03692293 , version 1 (09-06-2022)

Licence

Paternité - Pas d'utilisation commerciale - Pas de modification

Identifiants

HAL Id : hal-03692293 , version 1

Citer

Angela Bonifati, Stefania Dumbrava, Nicolas Mir. Hierarchical clustering for property graph schema discovery. 25th International Conference on Extending Database Technology (EDBT ), Mar 2022, Edinburgh, United Kingdom. pp.449-453. ⟨hal-03692293⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

Collections

CNRS UNIV-LYON1 UNIV-LYON2 INSA-LYON EC-LYON TELECOM-SUDPARIS LIRIS INSA-GROUPE IP_PARIS UDL ENSIIE INSTITUT-MINES-TELECOM

241 Consultations

277 Téléchargements