You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Download complete websites from the Wayback Machine for offline viewing.
Wayback-Archive is a Python tool that downloads archived websites from the Wayback Machine and reconstructs them for fully functional offline viewing. It preserves all assets -- HTML, CSS, JavaScript, images, and fonts --
rewrites URLs to relative paths, and cleans up Wayback Machine artifacts so the result looks like the original site.
General-purpose tools like wget --mirror or httrack can download live websites, but they do not understand Wayback Machine URL structures, cannot clean up archive artifacts, and lack the specialized asset recovery that Wayback-Archive provides.
Installation
Prerequisites
Python 3.8 or higher
pip
From Source
git clone https://github.com/GeiserX/Wayback-Archive.git
cd Wayback-Archive
# Optional: create a virtual environment
python3 -m venv venv
source venv/bin/activate # macOS/Linux# venv\Scripts\activate # Windows
pip install -r config/requirements.txt
As a Package
cd Wayback-Archive
pip install -e .
wayback-archive # Available as a CLI command after installation
Configuration
All options are set via environment variables. You can also use a .env file.
Required
Variable
Description
WAYBACK_URL
The Wayback Machine URL to download
Output
Variable
Default
Description
OUTPUT_DIR
./output
Output directory for downloaded files
Optimization
Variable
Default
Description
OPTIMIZE_HTML
true
Minify HTML
OPTIMIZE_IMAGES
false
Compress images
FALLBACK_IMAGE
None
Path to fallback image for missing images
MINIFY_JS
false
Minify JavaScript
MINIFY_CSS
false
Minify CSS
Content Removal
Variable
Default
Description
REMOVE_TRACKERS
true
Remove analytics and trackers
REMOVE_ADS
true
Remove advertisements
REMOVE_CLICKABLE_CONTACTS
true
Remove tel: and mailto: links
REMOVE_EXTERNAL_IFRAMES
false
Remove external iframes
Link Handling
Variable
Default
Description
REMOVE_EXTERNAL_LINKS_KEEP_ANCHORS
true
Remove external links, keep anchor text
REMOVE_EXTERNAL_LINKS_REMOVE_ANCHORS
false
Remove external links and anchor elements
MAKE_INTERNAL_LINKS_RELATIVE
true
Convert internal links to relative paths
ORIGINAL_URL_FALLBACK_ENABLED
true
Fallback to original URL if file not found on WayBack
Icon groups (social media, contacts) are preserved automatically
Button links with sppb-btn or btn classes are preserved
Set REMOVE_CLICKABLE_CONTACTS=false to keep tel: and mailto: links
jQuery or Libraries Not Loading
The tool includes automatic CDN fallback for critical libraries. If a file fails to download from the Wayback Machine, it will attempt to fetch it from a CDN.