Skip to content

at-cg/billi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

124 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GFA Tests Release

Billi

Billi is a tool for identifying bubbles in pangenome or assembly graphs, represented as bidirected graphs or in GFA format. Refer to our preprint for details.


Illustration of nested panbubbles and hairpins. The three red boxes and one blue box highlight the three panbubbles and one hairpin, respectively.

Table of Contents

Installation

git clone https://github.com/at-cg/billi.git
cd billi
make 

Usage

Decompose

Enumerates both panbubbles and hairpins in the input graph. Graph compaction is performed internally on the input before bubble detection:

./billi decompose -i inputgraph.gfa > out.txt 

Options:

  • -e, --exact — use the exact (slower) algorithm instead of the default heuristic

Compact

Merges long non-branching paths in the input graph into single vertices, producing a smaller, equivalent GFA:


Visualisation of compaction operation

./billi compact -i inputgraph.gfa -o compactgraph.gfa 

See docs/commands.md for the command-line options. The output format is similar to pangene.

Example

./billi decompose -i test_files/edge_cases/nested.gfa > out.txt


Bandage visualisation of the nested.gfa test graph

The graph (nested.gfa) contains two bubbles, one completely nested in another.

Expected output:

CC	FB	bbID	parID	side1	side2
CC	BB	bbID	parID	side1	side2	#alleles
CC	HP	bbID	side1	side2	#alleles
CC	AL	#hap	walk	hap_id
CC
BB	0	BB:1	<s6	<s4	-1
BB	1	-1	>s1	>s3	-1
  • Every output starts with CC comment lines documenting column layout, followed by one row per panbubble/hairpin found.
  • BB rows are panbubbles: bbID (unique ID), parID (-1 if top-level, or BB:<id>/HP:<id> if nested inside another panbubble/hairpin), side1/side2 (the two boundary nodes, with </> indicating the strand each is entered on), and #alleles.
  • HP rows are hairpins: same fields as BB, minus parID.
  • #alleles is -1 when allele walks weren't computed (e.g. the input GFA has no W/P lines). When alleles are available, each one is listed on its own AL row directly under the corresponding BB/HP row, and the block is terminated with a // line.

See the test folder for other test cases.

Running the test suite

python3 src/test_gfa.py --binary ./billi --test-dir test_files --verbose

This is the same check run on every push/PR (see .github/workflows/test.yml).

Citation

If you use Billi, please cite:

Shreeharsha G Bhat, Daanish Mahajan, and Chirag Jain. Billi: Provably Accurate and Scalable Bubble Detection in Pangenome Graphs. bioRxiv (2025). https://doi.org/10.1101/2025.11.21.689636

About

Enumerating bubbles in pangenome graphs

Topics

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors