Skip to content

cblaster remote error with large clusters #98

@HackenDirker

Description

@HackenDirker

Hi, I've been trying to run cblaster remote searches with full-sized BGCs (>100kb), and every time I do, it raises a ValueError saying no hits were found after searching for only a few minutes:

[12:01:29] INFO - Starting cblaster in remote mode
[12:01:29] INFO - Launching new search
[12:01:30] INFO - Request Identifier (RID): AZTWTGEW013
[12:01:30] INFO - Request Time Of Execution (RTOE): 12s
[12:01:42] INFO - Polling NCBI for completion status
[12:01:42] INFO - Checking search status...
[12:02:42] INFO - Checking search status...
[12:03:42] INFO - Checking search status...
[12:04:42] INFO - Checking search status...
[12:05:42] INFO - Checking search status...
Traceback (most recent call last):
  File "/home/hackenbd/.local/bin/cblaster", line 8, in <module>
    sys.exit(main())
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 432, in main
    cblaster(
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 318, in cblaster
    rid, results = remote.search(
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 368, in search
    poll(rid)
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 244, in poll
    if check(rid):
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 174, in check
    raise ValueError("Search completed, but found no hits")

A couple of reasons this is strange:

  1. It often doesn't seem to run long enough to complete an entire search. This makes me think that something is happening with the NCBI API that's cutting the search short and causing it to throw an error.

  2. The original organism that contains some of the clusters that have errored out is in NCBI. I.e., we should always get hits against at least that organism, and we aren't.

  3. I've used cblaster to look for the core biosynthetic proteins in some of these larger clusters, extracted +/-100kb from the hit locations, run antismash, and rerun cblaster locally with that region in the db. When I do it this way, cblaster has no trouble identifying the cluster as homologous. Again, this makes me think it's something to do with the NCBI API.

Would you happen to have any suggestions on how to make cblaster more reliable for remote searches of large clusters?

Thanks for your time!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions