Skip to content

All reads must have a unique identifier #4

@mictadlo

Description

@mictadlo

Hi
I have received the following error:

[sga::overlap] parsing file reads.pp.ec.filter.pass-thread11.hits.gz
Error: Attempted to insert vertex into graph with a duplicate id: @NS500334:63:HF2WTBGXY:3:13401:17773:1683
All reads must have a unique identifier

with using the following commands:

#SGA-ICE.py `pwd` -t 12
#./runMe.sh

cd ec/

IN1=out_NtC_001879-1.final.ecOv.fq.fasta
IN2=out_NtC_001879-2.final.ecOv.fq.fasta

#
# Parameters
#

# The number of threads to use
CPU=12

# The minimum length of contigs to include in a scaffold
MIN_CONTIG_LENGTH=200

#
# Preprocessing
#

# Preprocess the data to remove ambiguous basecalls
cat out_NtC_001879-*.final.ecOv.fq.fasta > reads.pp.ec.fasta

#
# Primary (contig) assembly
#

# Index the corrected data.
sga index -a ropebwt -t $CPU reads.pp.ec.fasta

# Remove exact-match duplicates and reads with low-frequency k-mers
sga filter --homopolymer-check --low-complexity-check -t $CPU reads.pp.ec.fasta

# Compute the structure of the string graph
sga overlap -t $CPU reads.pp.ec.filter.pass.fa

However, checking FASTQ files I could not discover this duplication:

> grep "@NS500334:63:HF2WTBGXY:3:13401:17773:1683" out_NtC_001879-1.fq 
@NS500334:63:HF2WTBGXY:3:13401:17773:1683 1:N:0:GATCAG
> grep "@NS500334:63:HF2WTBGXY:3:13401:17773:1683" out_NtC_001879-2.fq 
@NS500334:63:HF2WTBGXY:3:13401:17773:1683 2:N:0:GATCAG

Did I do anything wrong?

Best wishes,

Micha;

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions