Skip to content

Uellenberg/esu sampling#25

Closed
uellenberg wants to merge 30 commits intohanpham32:mainfrom
raduba:uellenberg/esu-sampling
Closed

Uellenberg/esu sampling#25
uellenberg wants to merge 30 commits intohanpham32:mainfrom
raduba:uellenberg/esu-sampling

Conversation

@uellenberg
Copy link

No description provided.

Radu and others added 30 commits February 17, 2026 19:01
This adds tests for exampleGraph and random graph generation. I modified
random graph generation slightly to make it easier to test - we might be
better off removing the "mimic graph" parameter entirely and just
specifying the exact properties of the graph we want.
format all files with black
Using ./NetMotif for all paths required always running the server from
outside its root directory, and required that it always lived inside a
directory called "NetMotif". Both requirements make development and
deployment a bit awkward, but we can fix it by always taking paths
relative to one of our source files (which should be constant).
This was causing a warning from streamlit, where mixed types ("NA" and
floats) worked, but weren't meant to be used. I also noticed the
four-decimal formatting was done by streamlit and not written down
anywhere, so I added it here explicitly.
Streamlit was giving a warning about the blank label (and there really
should be one for accessibility anyway).
This creates a labelg process and worker thread, which receive (non
canonical label, extra data), send it to labelg in batches, and run a
callback function when the canonical label is available. This is opposed
to the previous method, which required all data to be available upfront
and ran labelg at the end.

This will help the SubgraphProfile and SubgraphCollection
implementations. In particular, if we end up doing a streaming implementation,
we'll need labelg to be streaming as well, otherwise we'd need to keep all of
the node data in memory until ESU is done, then run labelg in one pass.

Technically, this allows labelg to run in parallel with ESU, although
labelg is so fast that this doesn't improve performance. I ran a few
tests and don't see any difference in performance between this and the
old version. I also added a few controls (batch size and max batches), but I
haven't tuned them.
remove the unused ESU progress param
Implement SubgraphProfile and SubgraphCollection dwnload
This is based on FANMOD's sampling mode, and allows us to get good
results at a fraction of the time a full ESU takes. It works by
probabilistically determining whether to explore certain branches of the
graph. I didn't add weighting because it always cancels out in our case.
If it needs to be added in the future, the weight is 1/(product of all
probabilities).

For the probabilities themselves, the formula I used is a bit arbitrary,
it's just one that scales with the depth (we don't want to take out too
many of the early paths) and has nice probabilities. There's probably a
more optimal formula to use, or we may allow users to input their own
probabilities by hand.
@uellenberg uellenberg closed this Mar 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants