Extracting node comparison insights for the interactive exploration of property graphs

Requirements

This code has been tested with Python 3.12, and Neo4J 5.x, to install the required packages we recommend the use of a python virtual environement.

python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Using custom datasets

Sample data are provided in this repository. To use your own data, you need to setup the graph in a local Neo4J database (using Neo4J desktop for example) in order to process indicator extraction. We only tested with Neo4J 5.x on Mac OS and Linux.

Indicators extraction

To extract indicators from property graphs, use the collect_indicators.py script. This script uses command line arguments to specify the Neo4J database and parameters:

-r: the number of runs
-dh: the threshold for acceptable variance (high)
-dl: the threshold for acceptable variance (low)
-c: the threshold for non-redundancy (Pearson's correlation)
-n: the threshold for acceptable density
-u: a list of suffixes for discarding properties
--pushdown: if unwanted properties, acceptable density (validation) are pushed down indicator collection
--keep-nulls: to remove nodes with at least one null value
--create-index: to create all indices on numerical properties
config: the json file for the Neo4j database configuration (see examples in the config subdirectory)

Solving the Partiton/Clustering problem

To run the clustering and indicator partition heuristics, use the main.py script. This script uses command line arguments to specify datasets and parameters:

--k: the number of clusters desired
--steps: To limit the local search steps
--method: ls : local search, exp : full tree enumeration, sls: 'simple' start local search
--dataset: Use one of : iris (debug only), airports, movies, directors, actors or custom
--path: the path to the dataset (custom only, must be a csv file with a header)
--delimiter : for custom dataset

Name		Name	Last commit message	Last commit date
Latest commit History 162 Commits
bin		bin
configs		configs
data		data
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
analyzeIndicatorDevisingTimes.py		analyzeIndicatorDevisingTimes.py
averageRunsCollectAndLatex.py		averageRunsCollectAndLatex.py
b_and_b.py		b_and_b.py
clustering.py		clustering.py
collect_indicators.py		collect_indicators.py
concatCSV.py		concatCSV.py
contextualization.py		contextualization.py
countRowsColsCSV.py		countRowsColsCSV.py
datasets.py		datasets.py
distanceFromLabel.py		distanceFromLabel.py
experiments.py		experiments.py
findContextOfNode.py		findContextOfNode.py
generate_agg_template.py		generate_agg_template.py
inDegrees.py		inDegrees.py
insightExtraction.py		insightExtraction.py
laplacian_heuristics.py		laplacian_heuristics.py
main.py		main.py
many2many.py		many2many.py
orchestrate_neo4j.py		orchestrate_neo4j.py
plotLogs.py		plotLogs.py
plotLogs2.py		plotLogs2.py
plotPushdown.py		plotPushdown.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
utility.py		utility.py
validation.py		validation.py
visu-results.ipynb		visu-results.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extracting node comparison insights for the interactive exploration of property graphs

Requirements

Using custom datasets

Indicators extraction

Solving the Partiton/Clustering problem

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Extracting node comparison insights for the interactive exploration of property graphs

Requirements

Using custom datasets

Indicators extraction

Solving the Partiton/Clustering problem

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages