Scrapes SEC 13F filings for a configurable universe of hedge funds, parses holdings, enriches with stock-level data, and surfaces cross-fund analytics through a Streamlit dashboard.
Structure mirrors clo_scraper: multi-module Python, SQLAlchemy ORM, analytics module, Click CLI. Retargeted from CLO trustee reports to institutional equity holdings.
pip install -r requirements.txt
cp .env.example .env # then edit .env and put your name + email in SEC_USER_AGENT
python cli.py init-db
python cli.py seed
python cli.py refresh # pulls 13Fs from EDGAR for every seeded fund
python cli.py map-tickers # OpenFIGI: CUSIP -> ticker
python cli.py enrich-stocks # yfinance: sector, market cap, last price
python cli.py rankings # sanity check
streamlit run dashboard.pyhf_tracker/
├── config.py # paths, DB URL, SEC user agent
├── db.py # SQLAlchemy engine + session
├── models.py # Fund, Filing, Stock, Holding
├── scrapers/
│ ├── edgar.py # 13F pull + parse
│ └── stock_data.py # OpenFIGI + yfinance enrichment
├── analytics/
│ └── fund_tracker.py # FundTracker (analog to ManagerTracker)
├── data/
│ └── funds.yaml # seed universe
├── cli.py # Click commands
└── dashboard.py # Streamlit app
SEC EDGAR ──13F XML──▶ scrapers/edgar.py ──▶ models.Holding (CUSIP only)
│
OpenFIGI ◀──────── scrapers/stock_data.py
yfinance ◀──────── scrapers/stock_data.py
│
▼
analytics/fund_tracker.py
│
▼
dashboard.py / cli.py
consensus_holdings(top_n)— most-held stocks across all funds, with fund count + total $ valuenew_positions(fund_id)/closed_positions(fund_id)— QoQ deltaconcentration(fund_id)— HHI + top-10 shareoverlap(fund_a, fund_b)— Jaccard + weighted portfolio overlaprankings()— funds sorted by 13F-reported value
- SEC rate limits: ≤10 req/s, and a User-Agent identifying you is required. Scraper sleeps 150 ms between calls.
- 13F value units: changed from "thousands of dollars" to "dollars" in mid-2022. Current code treats raw value as dollars. If you sync filings before Q3 2022, multiply by 1000 or add a date-conditional in
parse_information_table. - CIK accuracy: seed CIKs in
funds.yamlare best-effort; verify on EDGAR before trusting. - yfinance: unofficial, occasionally flaky. Swap for Financial Modeling Prep or Alpha Vantage if needed — schema is the same.
- Only long equity: 13F-HR captures long US-listed equity, ADRs, and certain options. It does not capture shorts, fixed income, derivatives notional, or international books. Fund-level "AUM" derived here is therefore reported-13F AUM, not true AUM.
- Add fund to universe: append to
data/funds.yaml, runpython cli.py seedthenrefresh --cik <CIK>. - Add analytics: extend
FundTrackerinanalytics/fund_tracker.py. - Add stock data source: drop a new module in
scrapers/and call from the CLI.