cblaster remote error with large clusters

Hi, I've been trying to run cblaster remote searches with full-sized BGCs (>100kb), and every time I do, it raises a ValueError saying no hits were found after searching for only a few minutes:

```
[12:01:29] INFO - Starting cblaster in remote mode
[12:01:29] INFO - Launching new search
[12:01:30] INFO - Request Identifier (RID): AZTWTGEW013
[12:01:30] INFO - Request Time Of Execution (RTOE): 12s
[12:01:42] INFO - Polling NCBI for completion status
[12:01:42] INFO - Checking search status...
[12:02:42] INFO - Checking search status...
[12:03:42] INFO - Checking search status...
[12:04:42] INFO - Checking search status...
[12:05:42] INFO - Checking search status...
Traceback (most recent call last):
  File "/home/hackenbd/.local/bin/cblaster", line 8, in <module>
    sys.exit(main())
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 432, in main
    cblaster(
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 318, in cblaster
    rid, results = remote.search(
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 368, in search
    poll(rid)
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 244, in poll
    if check(rid):
  File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 174, in check
    raise ValueError("Search completed, but found no hits")
```

A couple of reasons this is strange:

1. It often doesn't seem to run long enough to complete an entire search. This makes me think that something is happening with the NCBI API that's cutting the search short and causing it to throw an error.

2. The original organism that contains some of the clusters that have errored out is in NCBI. I.e., we should always get hits against at least that organism, and we aren't.

3. I've used cblaster to look for the core biosynthetic proteins in some of these larger clusters, extracted +/-100kb from the hit locations, run antismash, and rerun cblaster locally with that region in the db. When I do it this way, cblaster has no trouble identifying the cluster as homologous. Again, this makes me think it's something to do with the NCBI API. 

Would you happen to have any suggestions on how to make cblaster more reliable for remote searches of large clusters?

Thanks for your time!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

cblaster remote error with large clusters #98

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

cblaster remote error with large clusters #98

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions