Hierarchical clustering for property graph schema discovery - Télécom SudParis
Conference Papers Year : 2022

Hierarchical clustering for property graph schema discovery

Abstract

The property graph model is becoming increasingly popular among users and is currently employed by several open-source and commercial graph database systems. Although property graphs are widely adopted, there is a lack of understanding of their underlying schema structure. In particular, the schema discovery problem consists of extracting the schema concepts from a property graph. A property graph schema helps build a concise description of the data it represents, to make it more digestible for humans and interactive processes, as well as usable for query optimization purposes. In this paper, we address the property graph schema discovery problem and introduce the GMMSchema method based on hierarchical clustering using a Gaussian Mixture Model, which accounts for both label and property information on nodes. We experimentally analyze the accuracy and performance of GMMSchema, compared to those of its closest competitor, and showcase its superiority on several commonly used datasets, including real-world ones, such as the Covid19 knowledge graph, as well as the Fib25 and Mb6 NeuPrint graphs.
Fichier principal
Vignette du fichier
EDBTShort_2022.pdf (936.1 Ko) Télécharger le fichier
Origin Publisher files allowed on an open archive

Dates and versions

hal-03692293 , version 1 (09-06-2022)

Licence

Identifiers

  • HAL Id : hal-03692293 , version 1

Cite

Angela Bonifati, Stefania Dumbrava, Nicolas Mir. Hierarchical clustering for property graph schema discovery. 25th International Conference on Extending Database Technology (EDBT ), Mar 2022, Edinburgh, United Kingdom. pp.449-453. ⟨hal-03692293⟩
212 View
234 Download

Share

More