Compression flag causes error with `--cluster-mode 2` but not `--cluster-mode 0`


To reproduce. On a fresh `c6id.large` instance:

```
wget https://dev.mmseqs.com/latest/mmseqs-linux-avx2.tar.gz
tar -xvzf mmseqs-linux-avx2.tar.gz
export PATH=$PWD/mmseqs/bin:$PATH
rm mmseqs-linux-avx2.tar.gz

wget https://dl.secondarymetabolites.org/mibig/mibig_prot_seqs_4.0.fasta

mmseqs createdb  mibig_prot_seqs_4.0.fasta mibig_db

mkdir -p example/

 mmseqs cluster mibig_db "example/example" tmp \
        --compressed 1 \
        --cluster-mode 2 
```
fails with the error below 

also, in brief:
```
Clustering mode: Greedy
9036 ZSTD_decompressStream Corrupted block detected
Error: Pre-clustering step died
Error: linclust died
```


but 

```
 mmseqs cluster mibig_db "example/example" tmp \
        --compressed 1 \
        --cluster-mode 0
```
works. 

This happens with both binary and compiled mmseqs


Error output:

```
Create directory tmp
cluster mibig_db example/example tmp --compressed 1 --cluster-mode 2
MMseqs Version:                         bd01c2229f027d8d8e61947f44d11ef1a7669212
Substitution matrix                     aa:blosum62.out,nucl:nucleotide.out
Seed substitution matrix                aa:VTML80.out,nucl:nucleotide.out
Sensitivity                             4
k-mer length                            0Target search mode                      0
k-score                                 seq:2147483647,prof:2147483647
Alphabet size                           aa:21,nucl:5
Max sequence length                     65535
Max results per query                   20
Split database                          0
Split mode                              2Split memory limit                      0Coverage threshold                      0.8
Coverage mode                           0
Compositional bias                      1
Compositional bias scale                1
Diagonal scoring                        true
Exact k-mer matching                    0
Mask residues                           1Mask residues probability               0.9Mask lower case residues                0Mask lower letter repeating N times     0Minimum diagonal score                  15
Selected taxa
Include identical seq. id.              false
Spaced k-mers                           1
Preload mode                            0
Pseudo count a                          substitution:1.100,context:1.400
Pseudo count b                          substitution:4.100,context:5.800
Spaced k-mer pattern
Local temporary path
Threads                                 2
Compressed                              1
Verbosity                               3
Add backtrace                           false
Alignment mode                          3
Alignment mode                          0
Allow wrapped scoring                   false
E-value threshold                       0.001
Seq. id. threshold                      0
Min alignment length                    0
Seq. id. mode                           0
Alternative alignments                  0
Max reject                              2147483647
Max accept                              2147483647
Score bias                              0
Realign hits                            false
Realign score bias                      -0.2
Realign max seqs                        2147483647
Correlation score weight                0
Gap open cost                           aa:11,nucl:5
Gap extension cost                      aa:1,nucl:2
Zdrop                                   40
Rescore mode                            0
Remove hits by seq. id. and coverage    false
Sort results                            0
Cluster mode                            2
Max connected component depth           1000
Similarity type                         2
Weight file name
Cluster Weight threshold                0.9
Set mode                                false
Single step clustering                  false
Cascaded clustering steps               3
Cluster reassign                        false
Remove temporary files                  false
Force restart with latest tmp           false
MPI runner
k-mers per sequence                     21
Scale k-mers per sequence               aa:0.000,nucl:0.200
Adjust k-mer length                     false
Shift hash                              67
Include only extendable                 false
Skip repeating k-mers                   false

Set cluster sensitivity to -s 6.000000
Set cluster iterations to 3
linclust mibig_db tmp/12627170530073326854/clu_redundancy tmp/12627170530073326854/linclust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 2 --compressed 1 -v 3 --cluster-weight-threshold 0.9 --set-mode 0 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' -a 0 --alignment-mode 3 --alignment-output-mode 0 --wrapped-scoring 0 -e 0.001 --min-seq-id 0 --min-aln-len 0 --seq-id-mode 0 --alt-ali 0 -c 0.8 --cov-mode 0 --max-seq-len 65535 --comp-bias-corr 1 --comp-bias-corr-scale 1 --max-rejected 2147483647 --max-accept 2147483647 --add-self-matches 0 --db-load-mode 0--pca substitution:1.100,context:1.400 --pcb substitution:4.100,context:5.800 --score-bias 0 --realign 0 --realign-score-bias -0.2 --realign-max-seqs 2147483647 --corr-score-weight 0 --gap-open aa:11,nucl:5 --gap-extend aa:1,nucl:2 --zdrop 40 --alph-size aa:13,nucl:5 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 -k 0 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --rescore-mode 0 --filter-hits 0 --sort-results 0 --remove-tmp-files 0 --force-reuse 0

kmermatcher mibig_db tmp/12627170530073326854/linclust/7507599336006465408/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --cov-mode 0 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 2 --compressed 1 -v 3 --cluster-weight-threshold 0.9

kmermatcher mibig_db tmp/12627170530073326854/linclust/7507599336006465408/pref --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --alph-size aa:13,nucl:5 --min-seq-id 0 --kmer-per-seq 21 --spaced-kmer-mode 1 --kmer-per-seq-scale aa:0.000,nucl:0.200 --adjust-kmer-len 0 --mask 0 --mask-prob 0.9 --mask-lower-case 0 --mask-n-repeat 0 --cov-mode 0 -k 0 -c 0.8 --max-seq-len 65535 --hash-shift 67 --split-memory-limit 0 --include-only-extendable 0 --ignore-multi-kmer 0 --threads 2 --compressed 1 -v 3 --cluster-weight-threshold 0.9

Database size: 46987 type: Aminoacid
Reduced amino acid alphabet: (A S T) (C) (D B N) (E Q Z) (F Y) (G) (H) (I V) (K R) (L J M) (P) (W) (X)

Generate k-mers list for 1 split
[=================================================================] 100.00% 46.99K 0s 621ms
Sort kmer 0h 0m 0s 97ms
Sort by rep. sequence 0h 0m 0s 18ms
Time for fill: 0h 0m 0s 11ms
Time for merging to pref: 0h 0m 0s 0ms
Time for processing: 0h 0m 0s 779ms
rescorediagonal mibig_db mibig_db tmp/12627170530073326854/linclust/7507599336006465408/pref tmp/12627170530073326854/linclust/7507599336006465408/pref_rescore1 --sub-mat 'aa:blosum62.out,nucl:nucleotide.out' --rescore-mode 0 --wrapped-scoring 0 --filter-hits 0 -e 0.001 -c 0.8 -a 0 --cov-mode 0 --min-seq-id 0.5 --min-aln-len 0 --seq-id-mode 0 --add-self-matches 0 --sort-results 0 --db-load-mode 0 --threads 2 --compressed 1 -v 3

[=================================================================] 100.00% 46.99K 0s 48ms
Time for merging to pref_rescore1: 0h 0m 0s 9ms
Time for processing: 0h 0m 0s 70ms
clust mibig_db tmp/12627170530073326854/linclust/7507599336006465408/pref_rescore1 tmp/12627170530073326854/linclust/7507599336006465408/pre_clust --cluster-mode 2 --max-iterations 1000 --similarity-type 2 --threads 2 --compressed 1 -v 3 --cluster-weight-threshold 0.9 --set-mode 0

Clustering mode: Greedy
9036 ZSTD_decompressStream Corrupted block detected
Error: Pre-clustering step died
Error: linclust died
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compression flag causes error with `--cluster-mode 2` but not `--cluster-mode 0` #1073

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compression flag causes error with --cluster-mode 2 but not --cluster-mode 0 #1073

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Compression flag causes error with `--cluster-mode 2` but not `--cluster-mode 0` #1073