A rust implementation of the XP-CLR method. This implementation achieves near identical results in a fraction of the run time. The software analyses chromosome 24 of the VarGoat dataset (777,865 total variants, 236,145 used for the analysis, with two groups of 32 and 22 individuals, respectively) in 00m:26s and using 77Mb of memory, versus 55m:20s and 321Mb of the original implementation.
The compilation of the software requires the following packages to be installed:
Then, install with cargo:
cargo install xpclrs
Check that the package is successfully installed with:
xpclrs --help
The software is also available as a docker container in dockerhub. You can install it by pulling the image with docker:
docker pull tale88/xpclrs:latest
Or with singularity:
singularity build xpclrs.sif tale88/xpclrs:latest
The software requires the following mandatory options:
- Input genotypes in VCF(.GZ)/BCF format with
-I-/--input.- PLINK binary files (BED/BIM/FAM) are also supported by providing the root of the file name with the same
-I/--inputoption and adding the--plinkflag.. - Loading in plink file is substantially faster than using the VCF format, but worth noticing that it can lead to different results due to the variants being coded as major/minor rather than REF/ALT (XP-CLR relies on allele frequencies).
- PLINK binary files (BED/BIM/FAM) are also supported by providing the root of the file name with the same
- The lists of individuals in each group (one individual per line) with
-A/--samplesAand-B/--samplesB.- PLINK samples are loaded as
FID_IID. So if your sample in the FAM file isPOP1 SAMP1 0 0 0 -9, the sample will be listed asPOP1_SAMP1in the group of individuals.
- PLINK samples are loaded as
- The sequence to analyse with
-C/--chr.
The VCF can optionally include a genetic distance key, that can be specified with the --gdistkey [NAME]. Alternatively, users can provide the recombination rate with the -R/--rrate option. For PLINK inputs, the software will automatically detect the presence of a genetic position in the dataset and use that; if the value is equal to 0, the script will compute the genetic position based on the physical position and the recombination rate. Ensure that there are not gaps in the genetic position (i.e. a 0 following a known genetic position).
The analysis can be further sped up using multithreading with the --threads/-t option, followed by the number of threads to use. If set to 0, the software will try to use all the threads available.
An example command and the list of available options can be seen using --help:
$ xpclrs --help
Compute the XP-CLR for a pair of populations from a VCF file.
Usage: xpclrs [OPTIONS] --input <VCF> --out <OUT> --samplesA <SAMPLES_A> --samplesB <SAMPLES_B> --chr <CHROM>
Options:
-I, --input <VCF> input VCF file
-O, --out <OUT> Output file name.
-A, --samplesA <SAMPLES_A> Samples in population A. Path to file with each ID on a line.
-B, --samplesB <SAMPLES_B> Samples in population B. Path to file with each ID on a line.
-R, --rrate <RECRATE> Recombination rate per base. [default: 1e-8]
-L, --ld <LDCUTOFF> LD cutoff. [default: 0.95]
-m, --maxsnps <MAXSNPS> Max SNPs in a window. [default: 200]
-N, --minsnps <MINSNPS> Min SNPs in a window. [default: 10]
--size <SIZE> Sliding window size. [default: 20000]
--start <START> Start position for the sliding windows. [default: 1]
--stop <STOP> Stop position for the sliding windows.
--step <STEP> Step size for the sliding windows. [default: 20000]
-C, --chr <CHROM> Chromosome to analyse.
--gdistkey <DISTKEYS> Key in INFO field providing the genetic position of each variant in the VCF file
-t, --threads <NTHREADS> Number of threads to use [default: 1]
-f, --format <OUTFMT> Format to save the output (csv, tsv, txt) [default: tsv] [possible values: tsv, txt, csv]
-l, --log <LOG> Number of threads to use [default: info] [possible values: info, debug]
-h, --help Print help
-V, --version Print version
Can test with the demo data in the original xpclr repository here.
If you use the tool, please cite:
Chen H, Patterson N, Reich D. Population differentiation as a test for selective sweeps. Genome Res. 2010 Mar;20(3):393-402. doi: 10.1101/gr.100545.109. Epub 2010 Jan 19. PMID: 20086244; PMCID: PMC2840981. The original xpclr tool here Talenti A. XPCLRS: Fast Selection Signature Detection Using Cross-Population Composite Likelihood Ratio. bioRxiv 2026.02.27.708459. doi: 10.64898/2026.02.27.708459.