-
Notifications
You must be signed in to change notification settings - Fork 0
Description
It often quite tricky, especially for git newbies, to visualize the "space" of git clones and remotes, in particular whenever different remotes have different names across the clones. Here is e.g. a "hand" drawn representation of remotes for spacetop dataset of @jungheejung which we have done in mermaid (original comment)
flowchart TD
ORIGIN[1076_spacetop]-->|heejung\nremote=rolando|DISCOVERY
DISCOVERY-->|heejung\nremote=rolando-exchange| EXCHANGE
subgraph rolando
EXCHANGE[1076_spacetop.git]-->|yarik| ORIGIN
end
subgraph discovery
DISCOVERY[.../dartmouth]
DISCOVERY-FMRIPREP[.../fmriprep]
end
subgraph laptop
ORIGIN-->|heejung\nremote=origin|laptop-clone-name
laptop-clone-name-->|heejung\nremote=rolando-exchange| EXCHANGE
laptop-clone-name-->|heejung\nadds events| laptop-clone-name
end
subgraph JHU
DISCOVERY-->|patrick| jhu-clone
jhu-clone-->|used-for| JHU-FMRIPREP[.../fmriprep]
JHU-FMRIPREP-->|patrick| DISCOVERY-FMRIPREP
end
which helped to visualize which clones are out there and what is their relationships (on which computers, names of remotes) .
In addition we could annotate:
- either it is carrying git-annex (all did there)
- either it is a bare git repo (like the
.giton rolando above) - addition to git remotes, there are also git annex special remotes worth visualizing but we did not have them here
In addition to the above, now there is already https://github.com/spatialtopology/ds005256/ and soon there would be one among OpenNeuroDatasets and with S3 special remote in export mode.
Presenting them all would have made a super nice visualization to orient anyone in the "space" of available clones.
In the scope of datalad-registry with @candleindark we could then potentially collect/export such visualizations for clones we identify on https://registry.datalad.org/ .
So we need
- walker: given a collection of git repositories, utility to "walk" those and their remotes to collect the information about them
- would align with effort @mih has on formalizing datalad datasets (https://concepts.datalad.org/) but without all gory details
- for hosting portals like github.com, gitlab.com etc could have plugin system to discover
- forks/clones and have option(s) to enabling their addition too: by default might just want to add "personal" forks for a specific user.
- upstream repository (might not even be present locally as a remote)
- might need logic to harmonize URLs since the same repo could be reached via different protocols (https, ssh, git) and have some optional suffix (like
.giton github makes no difference but might matter for local ones - separate folder). Can use https://github.com/nephila/giturlparse for that- if
git-annexfound - on the system record its version. For each repo collectgit annex info --jsonoutput with all the gory details, and explicitly that clone'sannex.uuidfrom config - for git-annex ones, we could (somewhat) rely on
annex.uuidof those, but I would not be surprised if someone has a hard copy thus violating core principle of git-annex, and we better visualize that error - if
radfound on the system, get repo's rad ID viarad .(if any would be returned), and the node's id viarad self
- if
- walker should be able to continue walking if host was configured to forward ssh identity or has some other mechanism to auth into remotes from that target system. It should be smart enough to not visit already reached/visited host from another node in the network .
- renderer: a utility to render that information as e.g. mermaid flowchart and including specific to that rendering details. Could optionally be
- remote names
- annex UUIDs
- datalad dataset UUIDs (typically would be the same among clones).
- highlight errors and warnings:
- multiple instances with the same annex uuid (especially if anything is different among them, thus not just two different mounts of the same thing)
- dead known remotes (may be even for the same description/path)
- ... ?
- distinguish in rendering (may be with badges etc):
- tree vs bare git
- with vs without git-annex
- special remote:
- encrypted or not git-annex remote
- keystore vs exporttree
- importtree
- collectors: formalize (schema/model) to how we collect pertinent metadata and allow for plugins to extend it. Additional properties might be "sensed", e.g. it might make total sense to annotate connections with
- distance in ping hops or alike between two hosts
- bandwidth in bytes/sec
- {shared,unique}-content-size - amount of annexed data shared or unique between any two particular connected clones (valuable for balancing things out etc)
I think the metadata to collect and render from already could be the metalad_core (https://github.com/datalad/datalad-metalad/blob/master/datalad_metalad/extractors/core.py#L36) metadata records. The problem is that we need to make tool ssh into remote locations (remotes) to gather their information etc, and AFAIK we do not have a ready to use abstraction for remote instances.
Possible (pieces of) solutions:
- git annex map (thanks to @matrss for reminding about it) - provides graphviz rendering of git-annex remotes (potentially just local, no spider like navigation):
here is the graphviz drawing
digraph map {
subgraph "cluster_openneuro.org" {
label="openneuro"
style="filled"
fillcolor="lightblue"
"https://openneuro.org/git/0/ds005256" [ label="openneuro-git" ] [ style="filled" ] [ fillcolor="white" ]
}
subgraph "cluster_typhon.dartmouth.edu" {
label="typhon"
style="filled"
fillcolor="lightblue"
"97b6f5e4-4642-43a7-988a-c483caf553c5" [ label="yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
}
"97b6f5e4-4642-43a7-988a-c483caf553c5" -> "https://github.com/spatialtopology/ds005256" [ label="gh-spatialtopology" ]
"97b6f5e4-4642-43a7-988a-c483caf553c5" -> "https://openneuro.org/git/0/ds005256"
"97b6f5e4-4642-43a7-988a-c483caf553c5" -> "590b4fd0-0142-4e9d-8964-d1158c242c6a" [ label="origin" ]
"97b6f5e4-4642-43a7-988a-c483caf553c5" -> "40795e62-527c-4d26-ae8c-af42a6e2da5a" [ label="rolando-exchange" ]
subgraph "cluster_typhon.dartmouth.edu" {
label="typhon"
style="filled"
fillcolor="lightblue"
"01ec5571-2578-417a-988d-4c7339930635" [ label="yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop.git" ] [ style="filled" ] [ fillcolor="white" ]
}
"01ec5571-2578-417a-988d-4c7339930635" -> "ssh://typhon.dartmouth.edu/mnt/DATA/data/yoh/1076_spacetop"
subgraph "cluster_smaug.datalad.org" {
label="smaug"
style="filled"
fillcolor="lightblue"
"fa9e758a-01a1-4a55-abee-d70128cb1144" [ label="yoh@smaug:/mnt/btrfs/datasets/incoming/yoh/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
}
"fa9e758a-01a1-4a55-abee-d70128cb1144" -> "590b4fd0-0142-4e9d-8964-d1158c242c6a" [ label="origin" ]
"fa9e758a-01a1-4a55-abee-d70128cb1144" -> "40795e62-527c-4d26-ae8c-af42a6e2da5a" [ label="rolando-exchange" ]
subgraph "cluster_rolando.cns.dartmouth.edu" {
label="rolando"
style="filled"
fillcolor="lightblue"
"40795e62-527c-4d26-ae8c-af42a6e2da5a" [ label="bids@rolando:/inbox/BIDS/Wager/Wager/1076_spacetop.git" ] [ style="filled" ] [ fillcolor="white" ]
}
"40795e62-527c-4d26-ae8c-af42a6e2da5a" -> "590b4fd0-0142-4e9d-8964-d1158c242c6a" [ label="origin" ]
"40795e62-527c-4d26-ae8c-af42a6e2da5a" -> "97b6f5e4-4642-43a7-988a-c483caf553c5" [ label="typhon" ]
subgraph "cluster_rolando.cns.dartmouth.edu" {
label="rolando"
style="filled"
fillcolor="lightblue"
"590b4fd0-0142-4e9d-8964-d1158c242c6a" [ label="bids@rolando:/inbox/BIDS/Wager/Wager/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
}
"590b4fd0-0142-4e9d-8964-d1158c242c6a" -> "40795e62-527c-4d26-ae8c-af42a6e2da5a" [ label="spacetop-rolando-exchange" ]
subgraph "cluster_github.com" {
label="github"
style="filled"
fillcolor="lightblue"
"https://github.com/spatialtopology/ds005256" [ label="gh" ] [ style="filled" ] [ fillcolor="white" ]
}
subgraph "cluster_github.com" {
label="github"
style="filled"
fillcolor="lightblue"
"https://github.com/yarikoptic/ds005256" [ label="gh-yarikoptic" ] [ style="filled" ] [ fillcolor="white" ]
}
subgraph "cluster_github.com" {
label="github"
style="filled"
fillcolor="lightblue"
"https://github.com/OpenNeuroDatasets/ds005256" [ label="gh-openneuro" ] [ style="filled" ] [ fillcolor="white" ]
}
subgraph "cluster_discovery.dartmouth.edu" {
label="discovery"
style="filled"
fillcolor="lightblue"
"ssh://d31548v@discovery.dartmouth.edu/dartfs-hpc/rc/lab/C/CANlab/labdata/data/spacetop/dartmouth" [ label="discovery" ] [ style="filled" ] [ fillcolor="white" ]
}
subgraph "cluster_localhost" {
label="localhost"
style="filled"
fillcolor="lightblue"
"b14a3911-d089-44da-8327-6d2cbbd05871" [ label="yoh@lena:~/datasets/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
}
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "ssh://d31548v@discovery.dartmouth.edu/dartfs-hpc/rc/lab/C/CANlab/labdata/data/spacetop/dartmouth"
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "https://github.com/OpenNeuroDatasets/ds005256"
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "https://github.com/yarikoptic/ds005256"
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "https://github.com/spatialtopology/ds005256"
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "590b4fd0-0142-4e9d-8964-d1158c242c6a" [ label="origin" ]
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "40795e62-527c-4d26-ae8c-af42a6e2da5a" [ label="rolando-exchange" ]
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "fa9e758a-01a1-4a55-abee-d70128cb1144" [ label="smaug" ]
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "01ec5571-2578-417a-988d-4c7339930635" [ label="typhon-exchange" ]
"b14a3911-d089-44da-8327-6d2cbbd05871" -> "ssh://typhon.dartmouth.edu/mnt/DATA/data/yoh/1076_spacetop"
"5977e022-46ee-4c0c-a6ee-5a0e2e2ea442" [ label="yoh@typhon:/tmp/ds005256" ] [ style="filled" ] [ fillcolor="white" ]
"5ded375b-76eb-4a6c-899d-bef65f7b80b2" [ label="openneuro" ] [ style="filled" ] [ fillcolor="white" ]
"620673e7-dbcc-450c-9622-5394ea652632" [ label="f0042x1@discovery7.hpcc.dartmouth.edu:/dartfs-hpc/rc/lab/C/CANlab/labdata/data/spacetop/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
"73e9e7ca-f5d1-4b30-9f04-d343d37a456b" [ label="yoh@lena:~/proj/dbic-datasets/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
"8028ca7a-270c-4be7-b029-cb26d1770d91" [ label="f0042x1@discovery7.hpcc.dartmouth.edu:/dartfs-hpc/rc/lab/C/CANlab/labdata/data/spacetop/dartmouth" ] [ style="filled" ] [ fillcolor="white" ]
"930e5e3f-5d86-4e07-be36-fef0bc528b77" [ label="OpenNeuro" ] [ style="filled" ] [ fillcolor="white" ]
"9441b7fd-3c95-4977-9b4a-7a29b4e598c5" [ label="h@h-MacBook-Pro.local:~/Documents/projects_local/1076_spacetop" ] [ style="filled" ] [ fillcolor="white" ]
"e5f1e780-543c-421e-ad0b-7a270c1ad09b" [ label="s3-PUBLIC" ] [ style="filled" ] [ fillcolor="white" ]
}claude converted mermaid
graph TB
subgraph openneuro["openneuro.org"]
on_git["openneuro-git<br/>https://openneuro.org/git/0/ds005256"]
end
subgraph typhon["typhon.dartmouth.edu"]
typhon_data["yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop<br/>97b6f5e4-4642-43a7-988a-c483caf553c5"]
typhon_git["yoh@typhon:/mnt/DATA/data/yoh/1076_spacetop.git<br/>01ec5571-2578-417a-988d-4c7339930635"]
end
subgraph smaug["smaug.datalad.org"]
smaug_data["yoh@smaug:/mnt/btrfs/datasets/incoming/yoh/1076_spacetop<br/>fa9e758a-01a1-4a55-abee-d70128cb1144"]
end
subgraph rolando["rolando.cns.dartmouth.edu"]
rolando_exchange["bids@rolando:/inbox/BIDS/Wager/Wager/1076_spacetop.git<br/>40795e62-527c-4d26-ae8c-af42a6e2da5a"]
rolando_origin["bids@rolando:/inbox/BIDS/Wager/Wager/1076_spacetop<br/>590b4fd0-0142-4e9d-8964-d1158c242c6a"]
end
subgraph github["github.com"]
gh_spatial["gh<br/>https://github.com/spatialtopology/ds005256"]
gh_yarik["gh-yarikoptic<br/>https://github.com/yarikoptic/ds005256"]
gh_openneuro["gh-openneuro<br/>https://github.com/OpenNeuroDatasets/ds005256"]
end
subgraph discovery["discovery.dartmouth.edu"]
discovery_ssh["discovery<br/>ssh://d31548v@discovery.dartmouth.edu/dartfs-hpc/rc/lab/C/CANlab/labdata/data/spacetop/dartmouth"]
end
subgraph localhost["localhost"]
lena["yoh@lena:~/datasets/1076_spacetop<br/>b14a3911-d089-44da-8327-6d2cbbd05871"]
end
%% Connections from typhon_data
typhon_data -->|gh-spatialtopology| gh_spatial
typhon_data --> on_git
typhon_data -->|origin| rolando_origin
typhon_data -->|rolando-exchange| rolando_exchange
%% Connections from typhon_git
typhon_git --> typhon_data
%% Connections from smaug_data
smaug_data -->|origin| rolando_origin
smaug_data -->|rolando-exchange| rolando_exchange
%% Connections from rolando_exchange
rolando_exchange -->|origin| rolando_origin
rolando_exchange -->|typhon| typhon_data
%% Connections from rolando_origin
rolando_origin -->|spacetop-rolando-exchange| rolando_exchange
%% Connections from lena
lena --> discovery_ssh
lena --> gh_openneuro
lena --> gh_yarik
lena --> gh_spatial
lena -->|origin| rolando_origin
lena -->|rolando-exchange| rolando_exchange
lena -->|smaug| smaug_data
lena -->|typhon-exchange| typhon_git
lena --> typhon_data
style on_git fill:#fff
style typhon_data fill:#fff
style typhon_git fill:#fff
style smaug_data fill:#fff
style rolando_exchange fill:#fff
style rolando_origin fill:#fff
style gh_spatial fill:#fff
style gh_yarik fill:#fff
style gh_openneuro fill:#fff
style discovery_ssh fill:#fff
style lena fill:#fff