-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi
I have received the following error:
[sga::overlap] parsing file reads.pp.ec.filter.pass-thread11.hits.gz
Error: Attempted to insert vertex into graph with a duplicate id: @NS500334:63:HF2WTBGXY:3:13401:17773:1683
All reads must have a unique identifier
with using the following commands:
#SGA-ICE.py `pwd` -t 12
#./runMe.sh
cd ec/
IN1=out_NtC_001879-1.final.ecOv.fq.fasta
IN2=out_NtC_001879-2.final.ecOv.fq.fasta
#
# Parameters
#
# The number of threads to use
CPU=12
# The minimum length of contigs to include in a scaffold
MIN_CONTIG_LENGTH=200
#
# Preprocessing
#
# Preprocess the data to remove ambiguous basecalls
cat out_NtC_001879-*.final.ecOv.fq.fasta > reads.pp.ec.fasta
#
# Primary (contig) assembly
#
# Index the corrected data.
sga index -a ropebwt -t $CPU reads.pp.ec.fasta
# Remove exact-match duplicates and reads with low-frequency k-mers
sga filter --homopolymer-check --low-complexity-check -t $CPU reads.pp.ec.fasta
# Compute the structure of the string graph
sga overlap -t $CPU reads.pp.ec.filter.pass.fa
However, checking FASTQ files I could not discover this duplication:
> grep "@NS500334:63:HF2WTBGXY:3:13401:17773:1683" out_NtC_001879-1.fq
@NS500334:63:HF2WTBGXY:3:13401:17773:1683 1:N:0:GATCAG
> grep "@NS500334:63:HF2WTBGXY:3:13401:17773:1683" out_NtC_001879-2.fq
@NS500334:63:HF2WTBGXY:3:13401:17773:1683 2:N:0:GATCAG
Did I do anything wrong?
Best wishes,
Micha;
Metadata
Metadata
Assignees
Labels
No labels