Skip to content

I/O operations#7

Merged
RenzoTale88 merged 39 commits intomainfrom
io_speedup
Feb 2, 2026
Merged

I/O operations#7
RenzoTale88 merged 39 commits intomainfrom
io_speedup

Conversation

@RenzoTale88
Copy link
Copy Markdown
Owner

@RenzoTale88 RenzoTale88 commented Jan 29, 2026

This PR does the following:

  1. re-arrange I/O functions in multiple libraries
  2. adds experimental support for plink binary files as input
  3. add a docker container shipping xpclrs
  4. add crates.io publishing action
  5. add crates-style comments
  6. add more toy data to test the software
  7. add more CI tests

@RenzoTale88 RenzoTale88 changed the title Attempt speed-up I/O operations I/O operations Jan 31, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modularizes genotype I/O, adds experimental PLINK binary support, and introduces packaging/distribution improvements (Docker image and crates.io workflow), along with crate-style documentation comments.

Changes:

  • Extracts VCF/BCF logic into a dedicated xcf module and centralizes shared I/O utilities in io, updating the CLI to work with a more general INPUT argument and optional PLINK mode.
  • Adds a new plink module to read BED/BIM/FAM files into GenoData, plus README updates describing PLINK usage and behavior.
  • Introduces a multi-stage Docker build for xpclrs, bumps the crate version to 1.0.0, and adds GitHub Actions workflows for Docker image publishing and (intended) crates.io publishing.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
src/xcf/mod.rs New module that encapsulates indexed and streamed VCF/BCF reading into GenoData, with options for phasing, recombination rate, and multi-threading.
src/plink/mod.rs New PLINK BED/BIM/FAM reader that maps binary genotypes to GenoData, applies sample/position filters, and mirrors the XP-CLR filtering logic across two populations.
src/io/mod.rs Refactors shared I/O utilities (e.g., GenoData, gt2gcount, sample list helpers) and provides high-level process_xcf / process_plink, plus documented read_file, write_table, and to_table.
src/methods/mod.rs Adds crate-style docs and slightly refactors compute_complikelihood invocation while keeping the XP-CLR likelihood and windowing logic unchanged functionally.
src/main.rs Updates CLI to use a generic INPUT arg, wires in --plink to choose between process_xcf and process_plink, and propagates new options (start as Option<u64>, n_threads) into I/O.
src/lib.rs Exposes the new plink and xcf modules as part of the public crate API.
README.md Documents PLINK binary support, clarifies how PLINK sample IDs are constructed (FID_IID), and explains how genetic distances are derived for PLINK input.
Dockerfile Adds a multi-stage Docker build: compile xpclrs in a build stage, then copy the release binary into a minimal Ubuntu runtime image.
Cargo.toml / Cargo.lock Bumps version to 1.0.0 and adds a short crate description to prepare for crates.io publishing.
.github/workflows/docker.yml New CI workflow that builds and pushes multi-arch Docker images to Docker Hub on pushes to main and on releases.
.github/workflows/crates.yml New workflow intended to publish to crates.io on releases, logging in with a registry token and invoking cargo publish (currently only in dry-run mode).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@RenzoTale88 RenzoTale88 merged commit bc68a62 into main Feb 2, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants