Skip to content

RenzoTale88/xpclrs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

152 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

xpclrs

A rust implementation of the XP-CLR method. This implementation achieves near identical results in a fraction of the run time. The software analyses chromosome 24 of the VarGoat dataset (777,865 total variants, 236,145 used for the analysis, with two groups of 32 and 22 individuals, respectively) in 00m:26s and using 77Mb of memory, versus 55m:20s and 321Mb of the original implementation.

Installation

The compilation of the software requires the following packages to be installed:

  1. openblas
  2. libclang
  3. curl
  4. rust

Then, install with cargo:

cargo install xpclrs

Check that the package is successfully installed with:

xpclrs --help

Install with Docker/Singularity

The software is also available as a docker container in dockerhub. You can install it by pulling the image with docker:

docker pull tale88/xpclrs:latest

Or with singularity:

singularity build xpclrs.sif tale88/xpclrs:latest

Input

The software requires the following mandatory options:

  1. Input genotypes in VCF(.GZ)/BCF format with -I-/--input.
    • PLINK binary files (BED/BIM/FAM) are also supported by providing the root of the file name with the same -I/--input option and adding the --plink flag..
    • Loading in plink file is substantially faster than using the VCF format, but worth noticing that it can lead to different results due to the variants being coded as major/minor rather than REF/ALT (XP-CLR relies on allele frequencies).
  2. The lists of individuals in each group (one individual per line) with -A/--samplesA and -B/--samplesB.
    • PLINK samples are loaded as FID_IID. So if your sample in the FAM file is POP1 SAMP1 0 0 0 -9, the sample will be listed as POP1_SAMP1 in the group of individuals.
  3. The sequence to analyse with -C/--chr.

The VCF can optionally include a genetic distance key, that can be specified with the --gdistkey [NAME]. Alternatively, users can provide the recombination rate with the -R/--rrate option. For PLINK inputs, the software will automatically detect the presence of a genetic position in the dataset and use that; if the value is equal to 0, the script will compute the genetic position based on the physical position and the recombination rate. Ensure that there are not gaps in the genetic position (i.e. a 0 following a known genetic position).

The analysis can be further sped up using multithreading with the --threads/-t option, followed by the number of threads to use. If set to 0, the software will try to use all the threads available.

An example command and the list of available options can be seen using --help:

$ xpclrs --help
Compute the XP-CLR for a pair of populations from a VCF file.

Usage: xpclrs [OPTIONS] --input <VCF> --out <OUT> --samplesA <SAMPLES_A> --samplesB <SAMPLES_B> --chr <CHROM>

Options:
  -I, --input <VCF>           input VCF file
  -O, --out <OUT>             Output file name.
  -A, --samplesA <SAMPLES_A>  Samples in population A. Path to file with each ID on a line.
  -B, --samplesB <SAMPLES_B>  Samples in population B. Path to file with each ID on a line.
  -R, --rrate <RECRATE>       Recombination rate per base. [default: 1e-8]
  -L, --ld <LDCUTOFF>         LD cutoff. [default: 0.95]
  -m, --maxsnps <MAXSNPS>     Max SNPs in a window. [default: 200]
  -N, --minsnps <MINSNPS>     Min SNPs in a window. [default: 10]
      --size <SIZE>           Sliding window size. [default: 20000]
      --start <START>         Start position for the sliding windows. [default: 1]
      --stop <STOP>           Stop position for the sliding windows.
      --step <STEP>           Step size for the sliding windows. [default: 20000]
  -C, --chr <CHROM>           Chromosome to analyse.
      --gdistkey <DISTKEYS>   Key in INFO field providing the genetic position of each variant in the VCF file
  -t, --threads <NTHREADS>    Number of threads to use [default: 1]
  -f, --format <OUTFMT>       Format to save the output (csv, tsv, txt) [default: tsv] [possible values: tsv, txt, csv]
  -l, --log <LOG>             Number of threads to use [default: info] [possible values: info, debug]
  -h, --help                  Print help
  -V, --version               Print version

Demo data

Can test with the demo data in the original xpclr repository here.

Citation

If you use the tool, please cite:

Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010 Mar;20(3):393-402. doi: 10.1101/gr.100545.109. Epub 2010 Jan 19. PMID: 20086244; PMCID: PMC2840981. The original xpclr tool here Talenti A. XPCLRS: Fast Selection Signature Detection Using Cross-Population Composite Likelihood Ratio. bioRxiv 2026.02.27.708459. doi: 10.64898/2026.02.27.708459.

About

Rust implementation of the XP-CLR selection signature method

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors