Segfault when running result2profile when converting clustering result to profile

I have downloaded the BOLD (https://boldsystems.org/) FASTA release, and have filtered it to retain only COI-5P sequences over 300bp, and then clustered it using `mmseqs linclust`:

```
mmseqs createdb bold.COI-5P.rmdup.m300.fa.gz BOLD.COI-5P.dedup/db
mmseqs linclust BOLD.COI-5P.dedup/db BOLD.COI-5P.clustered/db tmp --threads 32 --min-seq-id 0.99 -c 0.95
```

Now I am trying to convert the cluster result to a profile format for searches, but I get a segfault:

```
mmseqs createsubdb BOLD.COI-5P.clustered/db BOLD.COI-5P.dedup/db BOLD.COI-5P.clustered/repSeqdb
mmseqs createsubdb BOLD.COI-5P.clustered/db BOLD.COI-5P.dedup/db_ BOLD.COI-5P.clustered/repSeqdb_h
mmseqs result2profile BOLD.COI-5P.clustered/repSeqdb BOLD.COI-5P.dedup/db BOLD.COI-5P.clustered/db BOLD.COI-5P.clustered.profile/db --threads 16
```

Which gives:

```
result2profile BOLD.COI-5P.clustered/repSeqdb BOLD.COI-5P.dedup/db BOLD.COI-5P.clustered/db BOLD.COI-5P.clustered.profile/db --threads 16

MMseqs Version:           	01683a607f83878e95436632d73e1d7d9ae30955
Substitution matrix       	aa:blosum62.out,nucl:nucleotide.out
E-value threshold         	0.001
Mask profile              	1
Profile E-value threshold 	0.001
Compositional bias        	1
Compositional bias scale  	1
Global sequence weighting 	false
Allow deletions           	false
Filter MSA                	1
Use filter only at N seqs 	0
Maximum seq. id. threshold	0.9
Minimum seq. id.          	0.0
Minimum score per column  	-20
Minimum coverage          	0
Select N most diverse seqs	1000
Pseudo count mode         	0
Pseudo count a            	substitution:1.100,context:1.400
Pseudo count b            	substitution:4.100,context:5.800
Preload mode              	0
Gap open cost             	aa:11,nucl:5
Gap extension cost        	aa:1,nucl:2
Threads                   	16
Compressed                	0
Verbosity                 	3
Profile output mode       	0

Query database size: 6957508 type: Nucleotide
Target database size: 9478899 type: Nucleotide
fish: Job 1, '~/MMseqs2/build/bin/mmseqs resu…' terminated by signal SIGSEGV (Address boundary error)
```

I have tried both with the latest version on Conda, as well as compiling the latest commit from source, to no avail. Am I doing something wrong, or is this a bug that needs fixing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segfault when running result2profile when converting clustering result to profile #1077

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Segfault when running result2profile when converting clustering result to profile #1077

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions