A Python tool to archive posts from any public subreddit using Reddit's API via PRAW.
- Download Reddit posts with rich metadata
- Organised JSON output with timestamps
- Environment-based configuration
- Modular architecture
git clone https://github.com/Binz120/rarchive.git
cd rarchive
pip install -r requirements.txt-
Create a Reddit App:
- Visit Reddit App Preferences
- Click "Create App" > choose "Script"
- Set redirect URI:
http://localhost:8080
-
Copy
.env.exampleto.envand fill in:REDDIT_CLIENT_ID=your_client_id REDDIT_CLIENT_SECRET=your_client_secret REDDIT_USER_AGENT=script:rarchive:v2.0 (by /u/yourusername)
python -m srcOr run directly:
python src/__main__.pyPosts are saved to the output/ directory as JSON files:
{
"subreddit": "python",
"sort_type": "new",
"fetched_at": "2024-01-15T10:30:00",
"post_count": 100,
"posts": [...]
}src/
├── __init__.py # Package init
├── __main__.py # Entry point
├── config.py # Configuration management
├── reddit_client.py # Reddit API wrapper
├── fetcher.py # Post fetching logic
├── formatter.py # JSON output formatting
└── models.py # Data classes
- Respects Reddit's rate limits (60 req/min)
- Do not collect personal/private data
- Follow Reddit API Terms