helloworld-crawler

A web crawler that scrapes IT interview questions, ratings, and experiences from helloworld.rs, a popular Serbian IT job market platform.

Built this to help people prep for interviews by seeing what companies actually ask. It pulls company names, positions, interview questions, difficulty ratings, and more, then exports everything to CSV, JSON, or Excel.

Installation

Download

Grab the latest .exe from Releases. No Python or dependencies required.

From source

Requires Python 3.10+.

git clone https://github.com/dulait/helloworld-crawler.git
cd helloworld-crawler
pip install -e .

Run the CLI with python -m helloworld_crawler or the GUI with python entry_gui.py.

GUI

The GUI lets you configure everything visually, output folder, file formats, number of pages, proxies. Hit Start, watch the progress bar, get your files. Hit Stop at any time and it saves whatever it scraped so far.

CLI

python -m helloworld_crawler

python -m helloworld_crawler --pages 50 --format json

python -m helloworld_crawler --format xlsx --output ./data/results

python -m helloworld_crawler --proxy-file proxies.txt --concurrency 20

Option	Default	Description
`--pages N`	auto-detect	Number of pages to scrape
`--output PATH`	`./interview_data`	Output file path (without extension)
`--format`	`all`	`csv`, `json`, `xlsx`, or `all`
`--concurrency N`	`10`	Parallel requests
`--proxy-file PATH`	none	Path to a proxy list file
`--verbose`	off	Debug logging

Proxies

When you scrape a website, every request comes from your IP address. If you send a lot of requests, the site might notice and temporarily block you. A proxy is a middleman server, your request goes to the proxy first, and the proxy forwards it to the website. From the website's perspective, the request came from the proxy's IP, not yours.

This crawler supports rotating proxies, meaning each request can go through a different proxy server. This spreads the load across multiple IPs so no single one gets flagged.

You don't need proxies for small scrapes, but for the full site (~1300 pages) they're recommended.

In the GUI, paste your proxies directly into the text box (one per line).

In the CLI, create a text file:

http://1.2.3.4:8080
socks5://5.6.7.8:1080

Then pass it with --proxy-file proxies.txt.

Both HTTP and SOCKS5 proxies are supported. The crawler also rotates User-Agent headers on every request automatically, so each request looks like it's coming from a different browser.

What gets scraped

Each interview entry includes:

Company name and position
Interview questions
Date, rating, and recommendation
Metadata like difficulty, format (online/in-person), duration, and outcome

Building executables

If you're working on the project and want to build the .exe files locally:

pip install -e .[build]
python build.py          # builds both CLI and GUI
python build.py cli      # CLI only
python build.py gui      # GUI only

Executables end up in dist/. Pushing to main also triggers a GitHub Actions workflow that builds and publishes a new release automatically (version is bumped based on conventional commits).

Note

This crawler respects the rules set by helloworld.rs. The /iskustva path is allowed by the site's robots.txt. If the site ever asks for this to stop, the repo gets taken down, no questions asked. All data is used for educational purposes only.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
src/helloworld_crawler		src/helloworld_crawler
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
README.md		README.md
build.py		build.py
entry_cli.py		entry_cli.py
entry_gui.py		entry_gui.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

helloworld-crawler

Installation

Download

From source

GUI

CLI

Proxies

What gets scraped

Building executables

Note

About

Uh oh!

Releases 3

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

helloworld-crawler

Installation

Download

From source

GUI

CLI

Proxies

What gets scraped

Building executables

Note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages