Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,28 @@ awk -F"\t" 'NR==FNR {a[$1];next} ($3 in a) {print $1}' /path/to/original/gene/na
unicore gene-tree --realign --threshold 30 --name /path/to/hashed/gene/names tree
```

## Phylogenetic inference with 3Di MSA
If you want to infer the phylogenetic tree with the 3Di MSA or amino acid and 3Di MSA combined (partitioned), you can use `--msa-for-tree` option in the `tree` module to specify the MSA type for tree inference.

When `--msa-for-tree 1` or `--msa-for-tree 2` is specified, the module will build the concatenated MSA with 3Di or combined (amino acid and 3Di) MSA, respectively, and run the phylogenetic inference with the concatenated MSA.

When you use the 3Di or combined MSA for tree inference, you have to make sure you provide a right path to the nexus format file containing the 3Di rate matrices. You can provide it with an option `--rate-matrix-3di`.

Example command:
```
unicore tree db/proteome_db -t iqtree --msa-for-tree 1 --rate-matrix-3di rate_matrices/matrices.nex result tree
```

You can also specify which 3Di rate matrix to use by providing the name of the matrix with `--rate-matrix-3di-name` option. If you are using IQ-TREE, this should be the name of the rate matrix defined in the nexus file. If you are using RAxML, this should be the path to the rate matrix file in PAML format.

Example command:
```
unicore tree db/proteome_db -t iqtree --msa-for-tree 1 --rate-matrix-3di rate_matrices/matrices.nex --rate-matrix-3di-name GH_LLM_3DI result tree
```

If you are using `GH_AF_3DI` or `GH_LLM_3Di` matrix, please cite:
> Garg, Sriram G., and Georg KA Hochberg. "A general substitution matrix for structural phylogenetics." Molecular Biology and Evolution 42.6 (2025): msaf124. [doi.org/10.1093/molbev/msaf124](https://academic.oup.com/mbe/article/42/6/msaf124/8157654)

## Phylogenetic inference with partition model
After running the `tree` module, you can modify the RAxML-style partition file named `combined.fasta.partitions` to run the phylogenetic inference with partition model.

Expand Down
26 changes: 26 additions & 0 deletions rate_matrices/FOLDSEEK_3DI.nexus
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#nexus
begin models;
model 3DI =
0.011992313
0.008437599 0.053509031
0.069554168 0.018636955 0.010684044
0.017044607 0.018636955 0.0215826 0.050022332
0.069554168 0.013112664 0.010684044 0.07109646 0.043632459
0.098856949 0.001590694 0.001296079 0.035194913 0.001843543 0.017864982
0.024225404 0.076052086 0.010684044 0.143620377 0.007522972 0.012569516 0.002813795
0.024225404 0.037648099 0.0215826 0.143620377 0.007522972 0.017864982 0.008078758 0.119514031
0.004176869 0.00922586 0.00184211 0.035194913 0.00045177 0.006222293 0.000485146 0.0416262 0.01108926
0.017044607 0.006491168 0.007517123 0.024762578 0.043632459 0.10361504 0.002813795 0.003552863 0.005489521 0.000645798
0.017044607 0.001119187 0.000317611 0.017422554 0.000317858 0.00884371 0.000980032 0.003552863 0.003862339 0.005323544 0.001269667
0.001454785 0.004567082 0.0009119 0.017422554 0.00022364 0.001524805 8.36473E-05 0.029287528 0.005489521 0.361750658 0.000218912 0.001785052
0.140504827 0.004567082 0.003721205 0.101049 0.005293043 0.10361504 0.016319718 0.014498218 0.015761103 0.005323544 0.021142791 0.042248118 0.00063478
0.034431431 0.0264886 0.0215826 0.143620377 0.0306991 0.0512926 0.003999231 0.020606237 0.031838654 0.00374556 0.030050131 0.003605945 0.00063478 0.013670216
0.004176869 0.053509031 0.061966346 0.024762578 0.0306991 0.012569516 0.000980032 0.014498218 0.015761103 0.001304562 0.01487573 0.000216544 0.000446621 0.001658331 0.038802615
0.008437599 0.0264886 0.005288927 0.050022332 0.001843543 0.00884371 0.000980032 0.169864622 0.031838654 0.254521976 0.002564827 0.005125109 0.04313521 0.009618146 0.019208476 0.007408411
0.0489372 0.003213326 0.001296079 0.050022332 0.002620218 0.025391399 0.003999231 0.010200715 0.007802226 0.005323544 0.003645374 0.172402491 0.001822534 0.112688512 0.019208476 0.001277335 0.003470942
0.024225404 0.00922586 0.003721205 0.050022332 0.001297087 0.00884371 0.011482293 0.020606237 0.129924444 0.001854166 0.002564827 0.001785052 0.00063478 0.006767173 0.013514777 0.002580316 0.003470942 0.003763817
0.00593656 0.018636955 0.0215826 0.035194913 0.043632459 0.036088653 0.000341341 0.005049664 0.007802226 0.000454373 0.0607036 0.000152357 5.41795E-05 0.002356976 0.0783843 0.0610702 0.000850572 0.000922343 0.00061789

0.0489587 0.026488295 0.021582352 0.101047724 0.030698759 0.05129205 0.032966606 0.041625681 0.045251559 0.030875576 0.060702918 0.029724702 0.015023598 0.027614576 0.078383431 0.061069471 0.02013084 0.031026099 0.029541311 0.215995723;

end;
27 changes: 27 additions & 0 deletions rate_matrices/GH_AF_3DI.nexus
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@

#nexus
begin models;
model GH3DIAF =

0.206045
0.099742 5.400736
1.457346 0.436088 0.128820
0.080698 0.720726 2.116457 0.254751
6.112871 0.628550 0.785210 0.438587 3.269893
6.315523 0.126222 0.005220 0.400088 0.000100 2.247661
0.502964 6.966101 0.053012 0.772862 0.041468 0.131136 0.211629
1.053123 4.588735 4.956251 1.517048 0.099350 0.264919 0.822337 6.003063
0.015142 0.047967 0.004065 0.042708 0.000100 0.012162 0.003143 0.107788 0.039183
0.022323 0.245392 0.887613 0.023929 3.341014 7.242103 0.000100 0.010369 0.019796 0.000100
0.061828 2.101791 0.000100 0.038302 0.000100 0.018519 1.060336 0.000100 0.000100 0.008930 0.000100
0.000615 0.004749 0.000100 0.035820 0.000100 0.000100 0.000100 0.483191 0.040695 10.253329 2.766935 0.022626
6.309383 0.121769 0.078908 0.628010 0.007888 5.346546 0.786018 0.359712 0.340878 0.018811 0.046545 1.261582 0.272626
0.315525 1.031825 1.705683 2.597926 1.657803 1.217182 0.023281 0.157070 0.430942 0.019856 0.613644 0.001549 0.000100 0.276664
0.012345 5.992759 8.823413 0.024534 1.923030 0.349607 0.002102 0.012076 0.000100 0.000100 0.574042 0.000100 0.000100 0.007107 0.449860
0.112710 0.816173 0.018331 0.164966 0.004373 0.039618 0.058434 7.546948 0.254380 9.035512 0.000100 0.003265 2.120022 0.134079 0.059112 0.006286
0.393212 0.000100 0.007109 0.099923 0.002882 0.042536 0.026840 0.074493 0.037627 0.013724 0.005050 8.642995 0.004241 7.119415 0.030238 0.003855 0.028735
1.243085 0.695155 0.308668 0.850697 0.000100 0.140824 1.623621 0.870587 8.785918 0.014752 0.000100 0.000100 0.295570 7.288365 0.014823 0.000100 0.309913 1.376590
0.019982 0.530197 1.622344 0.115724 2.691451 0.767944 0.000100 0.004101 0.020893 0.000100 1.458258 0.000100 0.000100 0.017032 1.828515 1.431409 0.008965 0.000100 0.277948
0.033931 0.029677 0.022718 0.217063 0.024418 0.042391 0.009530 0.030112 0.027921 0.035267 0.054280 0.015301 0.006238 0.022522 0.106582 0.054584 0.025223 0.029034 0.008336 0.204872;

end;
26 changes: 26 additions & 0 deletions rate_matrices/GH_LLM_3DI.nexus
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#nexus
begin models;
model GH3DILLM =

0.231961
0.146826 5.090886
1.388719 0.532458 0.190308
0.110858 0.716943 2.125318 0.313970
6.388705 0.813365 0.828167 0.478477 3.327515
5.951036 0.080207 0.006321 0.335158 0.002561 2.778985
0.607573 6.904352 0.085029 0.875117 0.058443 0.193906 0.214237
1.113791 4.583128 5.234948 1.489843 0.125244 0.337944 0.822784 5.979297
0.009901 0.163928 0.009731 0.095737 0.000100 0.040187 0.001333 0.527276 0.038698
0.185462 0.241867 1.179789 0.053042 3.847074 7.247152 0.000100 0.015759 0.036690 0.000100
0.132103 2.588205 0.000100 0.118692 0.000100 0.080207 1.244845 0.018311 0.000100 0.012998 0.000100
0.000100 0.042084 0.000100 0.112098 0.000100 0.000100 0.004575 0.599114 0.044284 10.092991 3.275657 0.019718
6.179672 0.150465 0.069198 0.539983 0.009550 5.264532 0.769307 0.361928 0.414869 0.026380 0.092123 1.427081 0.417866
0.279075 1.258665 1.808066 1.860684 1.165969 1.082343 0.020568 0.179115 0.504681 0.015790 0.525574 0.009107 0.000100 0.186118
0.027139 6.152354 8.958557 0.063226 2.297381 0.389469 0.000326 0.086986 0.060145 0.000327 0.894820 0.000100 0.000100 0.006553 0.392745
0.113570 1.079416 0.029239 0.233041 0.001100 0.104413 0.076617 7.329984 0.338976 9.020506 0.000100 0.011688 2.219036 0.145406 0.087298 0.005651
0.769457 0.000100 0.017725 0.183073 0.001436 0.208658 0.035521 0.091606 0.049878 0.014484 0.000100 8.451390 0.000100 6.954269 0.043893 0.001126 0.034279
1.254860 0.794688 0.454326 0.673482 0.000100 0.227978 1.482768 0.873264 8.508743 0.024688 0.001574 0.000100 0.360825 7.563736 0.020832 0.003544 0.346383 1.696874
0.002510 0.412164 1.486427 0.162290 2.587540 0.648155 0.000100 0.001325 0.039870 0.000100 1.447960 0.000100 0.000100 0.000100 1.243411 1.340558 0.002979 0.002943 0.277948
0.021206 0.020366 0.016410 0.256866 0.024776 0.029869 0.005103 0.022589 0.019469 0.030696 0.037815 0.013012 0.005616 0.015834 0.120296 0.040434 0.017760 0.025094 0.003983 0.272807;

end;
72 changes: 72 additions & 0 deletions rate_matrices/matrices.nex
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
#NEXUS
Begin models;

Model GH_LLM_3DI =
0.231961
0.146826 5.090886
1.388719 0.532458 0.190308
0.110858 0.716943 2.125318 0.313970
6.388705 0.813365 0.828167 0.478477 3.327515
5.951036 0.080207 0.006321 0.335158 0.002561 2.778985
0.607573 6.904352 0.085029 0.875117 0.058443 0.193906 0.214237
1.113791 4.583128 5.234948 1.489843 0.125244 0.337944 0.822784 5.979297
0.009901 0.163928 0.009731 0.095737 0.000100 0.040187 0.001333 0.527276 0.038698
0.185462 0.241867 1.179789 0.053042 3.847074 7.247152 0.000100 0.015759 0.036690 0.000100
0.132103 2.588205 0.000100 0.118692 0.000100 0.080207 1.244845 0.018311 0.000100 0.012998 0.000100
0.000100 0.042084 0.000100 0.112098 0.000100 0.000100 0.004575 0.599114 0.044284 10.092991 3.275657 0.019718
6.179672 0.150465 0.069198 0.539983 0.009550 5.264532 0.769307 0.361928 0.414869 0.026380 0.092123 1.427081 0.417866
0.279075 1.258665 1.808066 1.860684 1.165969 1.082343 0.020568 0.179115 0.504681 0.015790 0.525574 0.009107 0.000100 0.186118
0.027139 6.152354 8.958557 0.063226 2.297381 0.389469 0.000326 0.086986 0.060145 0.000327 0.894820 0.000100 0.000100 0.006553 0.392745
0.113570 1.079416 0.029239 0.233041 0.001100 0.104413 0.076617 7.329984 0.338976 9.020506 0.000100 0.011688 2.219036 0.145406 0.087298 0.005651
0.769457 0.000100 0.017725 0.183073 0.001436 0.208658 0.035521 0.091606 0.049878 0.014484 0.000100 8.451390 0.000100 6.954269 0.043893 0.001126 0.034279
1.254860 0.794688 0.454326 0.673482 0.000100 0.227978 1.482768 0.873264 8.508743 0.024688 0.001574 0.000100 0.360825 7.563736 0.020832 0.003544 0.346383 1.696874
0.002510 0.412164 1.486427 0.162290 2.587540 0.648155 0.000100 0.001325 0.039870 0.000100 1.447960 0.000100 0.000100 0.000100 1.243411 1.340558 0.002979 0.002943 0.277948
0.021206 0.020366 0.016410 0.256866 0.024776 0.029869 0.005103 0.022589 0.019469 0.030696 0.037815 0.013012 0.005616 0.015834 0.120296 0.040434 0.017760 0.025094 0.003983 0.272807;


Model GH_AF_3DI =
0.206045
0.099742 5.400736
1.457346 0.436088 0.128820
0.080698 0.720726 2.116457 0.254751
6.112871 0.628550 0.785210 0.438587 3.269893
6.315523 0.126222 0.005220 0.400088 0.000100 2.247661
0.502964 6.966101 0.053012 0.772862 0.041468 0.131136 0.211629
1.053123 4.588735 4.956251 1.517048 0.099350 0.264919 0.822337 6.003063
0.015142 0.047967 0.004065 0.042708 0.000100 0.012162 0.003143 0.107788 0.039183
0.022323 0.245392 0.887613 0.023929 3.341014 7.242103 0.000100 0.010369 0.019796 0.000100
0.061828 2.101791 0.000100 0.038302 0.000100 0.018519 1.060336 0.000100 0.000100 0.008930 0.000100
0.000615 0.004749 0.000100 0.035820 0.000100 0.000100 0.000100 0.483191 0.040695 10.253329 2.766935 0.022626
6.309383 0.121769 0.078908 0.628010 0.007888 5.346546 0.786018 0.359712 0.340878 0.018811 0.046545 1.261582 0.272626
0.315525 1.031825 1.705683 2.597926 1.657803 1.217182 0.023281 0.157070 0.430942 0.019856 0.613644 0.001549 0.000100 0.276664
0.012345 5.992759 8.823413 0.024534 1.923030 0.349607 0.002102 0.012076 0.000100 0.000100 0.574042 0.000100 0.000100 0.007107 0.449860
0.112710 0.816173 0.018331 0.164966 0.004373 0.039618 0.058434 7.546948 0.254380 9.035512 0.000100 0.003265 2.120022 0.134079 0.059112 0.006286
0.393212 0.000100 0.007109 0.099923 0.002882 0.042536 0.026840 0.074493 0.037627 0.013724 0.005050 8.642995 0.004241 7.119415 0.030238 0.003855 0.028735
1.243085 0.695155 0.308668 0.850697 0.000100 0.140824 1.623621 0.870587 8.785918 0.014752 0.000100 0.000100 0.295570 7.288365 0.014823 0.000100 0.309913 1.376590
0.019982 0.530197 1.622344 0.115724 2.691451 0.767944 0.000100 0.004101 0.020893 0.000100 1.458258 0.000100 0.000100 0.017032 1.828515 1.431409 0.008965 0.000100 0.277948
0.033931 0.029677 0.022718 0.217063 0.024418 0.042391 0.009530 0.030112 0.027921 0.035267 0.054280 0.015301 0.006238 0.022522 0.106582 0.054584 0.025223 0.029034 0.008336 0.204872;

Model FOLDSEEK_3DI =
0.011992313
0.008437599 0.053509031
0.069554168 0.018636955 0.010684044
0.017044607 0.018636955 0.0215826 0.050022332
0.069554168 0.013112664 0.010684044 0.07109646 0.043632459
0.098856949 0.001590694 0.001296079 0.035194913 0.001843543 0.017864982
0.024225404 0.076052086 0.010684044 0.143620377 0.007522972 0.012569516 0.002813795
0.024225404 0.037648099 0.0215826 0.143620377 0.007522972 0.017864982 0.008078758 0.119514031
0.004176869 0.00922586 0.00184211 0.035194913 0.00045177 0.006222293 0.000485146 0.0416262 0.01108926
0.017044607 0.006491168 0.007517123 0.024762578 0.043632459 0.10361504 0.002813795 0.003552863 0.005489521 0.000645798
0.017044607 0.001119187 0.000317611 0.017422554 0.000317858 0.00884371 0.000980032 0.003552863 0.003862339 0.005323544 0.001269667
0.001454785 0.004567082 0.0009119 0.017422554 0.00022364 0.001524805 8.36473E-05 0.029287528 0.005489521 0.361750658 0.000218912 0.001785052
0.140504827 0.004567082 0.003721205 0.101049 0.005293043 0.10361504 0.016319718 0.014498218 0.015761103 0.005323544 0.021142791 0.042248118 0.00063478
0.034431431 0.0264886 0.0215826 0.143620377 0.0306991 0.0512926 0.003999231 0.020606237 0.031838654 0.00374556 0.030050131 0.003605945 0.00063478 0.013670216
0.004176869 0.053509031 0.061966346 0.024762578 0.0306991 0.012569516 0.000980032 0.014498218 0.015761103 0.001304562 0.01487573 0.000216544 0.000446621 0.001658331 0.038802615
0.008437599 0.0264886 0.005288927 0.050022332 0.001843543 0.00884371 0.000980032 0.169864622 0.031838654 0.254521976 0.002564827 0.005125109 0.04313521 0.009618146 0.019208476 0.007408411
0.0489372 0.003213326 0.001296079 0.050022332 0.002620218 0.025391399 0.003999231 0.010200715 0.007802226 0.005323544 0.003645374 0.172402491 0.001822534 0.112688512 0.019208476 0.001277335 0.003470942
0.024225404 0.00922586 0.003721205 0.050022332 0.001297087 0.00884371 0.011482293 0.020606237 0.129924444 0.001854166 0.002564827 0.001785052 0.00063478 0.006767173 0.013514777 0.002580316 0.003470942 0.003763817
0.00593656 0.018636955 0.0215826 0.035194913 0.043632459 0.036088653 0.000341341 0.005049664 0.007802226 0.000454373 0.0607036 0.000152357 5.41795E-05 0.002356976 0.0783843 0.0610702 0.000850572 0.000922343 0.00061789

0.0489587 0.026488295 0.021582352 0.101047724 0.030698759 0.05129205 0.032966606 0.041625681 0.045251559 0.030875576 0.060702918 0.029724702 0.015023598 0.027614576 0.078383431 0.061069471 0.02013084 0.031026099 0.029541311 0.215995723;

End;
Loading
Loading