Crawler Architecture

Crawler works in the following way:

Extractor runs Crawler through the API using Crawler's ID.
a. API sends a request to the Orchestra.
b. Orchestra calls Crawler itself.
Crawler is checking job sites and looking for vacancies, in which search_word is found and writes them down in a raw_vacancy document.
Cron Job checks a raw_vacancy document and parsed_vacancy for new vacancies (status: "new").
a. If there are new records in raw_vacancy document, Cron sends a request to Orchestra to run Parser and to process vacancy.
b. If there are new records in parsed_vacancy document, Cron sends a request to Orchestra to run graph_macker and to process skills.

Provide feedback