Skip to content

Processing Large Files #50

@CryogenicPlanet

Description

@CryogenicPlanet

From @rithvikmahin #24

Cause of issue: The sources found include PDF documents and academic papers that are very long, with over 1.5 million characters. SpaCy takes too long (over 30 seconds) to run nlp(text) and create a document object from the text and stalls the entire processing system.

Temporary solution: Created a timer that stops processing that document if it takes longer than 30 seconds and moves on to the next one.

Potential solution / TODO: Add a queue for all tweets that take longer than 30 seconds to process, and return a "Will provide the source later" statement to the user. Once the tweets are processed, return them to the user at any point in time later.

Metadata

Metadata

Labels

bad-nlpThe bot performed bad nlpbugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions