Hierarchical clustering for property graph schema discovery - Télécom SudParis Access content directly
Conference Papers Year : 2022

Hierarchical clustering for property graph schema discovery


The property graph model is becoming increasingly popular among users and is currently employed by several open-source and commercial graph database systems. Although property graphs are widely adopted, there is a lack of understanding of their underlying schema structure. In particular, the schema discovery problem consists of extracting the schema concepts from a property graph. A property graph schema helps build a concise description of the data it represents, to make it more digestible for humans and interactive processes, as well as usable for query optimization purposes. In this paper, we address the property graph schema discovery problem and introduce the GMMSchema method based on hierarchical clustering using a Gaussian Mixture Model, which accounts for both label and property information on nodes. We experimentally analyze the accuracy and performance of GMMSchema, compared to those of its closest competitor, and showcase its superiority on several commonly used datasets, including real-world ones, such as the Covid19 knowledge graph, as well as the Fib25 and Mb6 NeuPrint graphs.
Fichier principal
Vignette du fichier
EDBTShort_2022.pdf (936.1 Ko) Télécharger le fichier
Origin : Publisher files allowed on an open archive

Dates and versions

hal-03692293 , version 1 (09-06-2022)


Attribution - NonCommercial - NoDerivatives


  • HAL Id : hal-03692293 , version 1


Angela Bonifati, Stefania Dumbrava, Nicolas Mir. Hierarchical clustering for property graph schema discovery. EDBT 2022: 25th International Conference on Extending Database Technology, Mar 2022, Edinburgh, United Kingdom. pp.449-453. ⟨hal-03692293⟩
84 View
89 Download


Gmail Facebook Twitter LinkedIn More