-
Notifications
You must be signed in to change notification settings - Fork 27
Description
Hi, I've been trying to run cblaster remote searches with full-sized BGCs (>100kb), and every time I do, it raises a ValueError saying no hits were found after searching for only a few minutes:
[12:01:29] INFO - Starting cblaster in remote mode
[12:01:29] INFO - Launching new search
[12:01:30] INFO - Request Identifier (RID): AZTWTGEW013
[12:01:30] INFO - Request Time Of Execution (RTOE): 12s
[12:01:42] INFO - Polling NCBI for completion status
[12:01:42] INFO - Checking search status...
[12:02:42] INFO - Checking search status...
[12:03:42] INFO - Checking search status...
[12:04:42] INFO - Checking search status...
[12:05:42] INFO - Checking search status...
Traceback (most recent call last):
File "/home/hackenbd/.local/bin/cblaster", line 8, in <module>
sys.exit(main())
File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 432, in main
cblaster(
File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/main.py", line 318, in cblaster
rid, results = remote.search(
File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 368, in search
poll(rid)
File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 244, in poll
if check(rid):
File "/home/hackenbd/.local/lib/python3.8/site-packages/cblaster/remote.py", line 174, in check
raise ValueError("Search completed, but found no hits")
A couple of reasons this is strange:
-
It often doesn't seem to run long enough to complete an entire search. This makes me think that something is happening with the NCBI API that's cutting the search short and causing it to throw an error.
-
The original organism that contains some of the clusters that have errored out is in NCBI. I.e., we should always get hits against at least that organism, and we aren't.
-
I've used cblaster to look for the core biosynthetic proteins in some of these larger clusters, extracted +/-100kb from the hit locations, run antismash, and rerun cblaster locally with that region in the db. When I do it this way, cblaster has no trouble identifying the cluster as homologous. Again, this makes me think it's something to do with the NCBI API.
Would you happen to have any suggestions on how to make cblaster more reliable for remote searches of large clusters?
Thanks for your time!