Skip to content

Crawler Architecture

SB-GitHub Public edited this page Jan 15, 2018 · 6 revisions

Crawler works in the following way:

  1. Extractor runs Crawler through the API using Crawler's ID.
    a. API sends a request to the Orchestra.
    b. Orchestra calls Crawler itself.
  2. Crawler is checking job sites and looking for vacancies, in which search_word is found and writes them down in a raw_vacancy document.
  3. Cron Job checks a raw_vacancy document and parsed_vacancy for new vacancies (status: "new").
    a. If there are new records in raw_vacancy document, Cron sends a request to Orchestra to run Parser and to process vacancy.
    b. If there are new records in parsed_vacancy document, Cron sends a request to Orchestra to run graph_macker and to process skills.

Clone this wiki locally