A powerful web-based tool for converting DOCX files to HTML/JSON with real-time editing capabilities and fidelity comparison.
- All 3 Panels: Side-by-side comparison of Original DOCX | HTML | JSON
- Paired Views: Compare any two formats (Original vs HTML, Original vs JSON, HTML vs JSON)
- Single Views: Focus on individual formats
- 8 Different View Modes for comprehensive analysis
- Real-time HTML Preview: Edit JSON and see HTML changes instantly
- Live Mode: Auto-update preview as you type (500ms debounce)
- Manual Mode: Update preview on-demand
- JSON Formatting: Beautify/format JSON with one click
- Validation: Real-time JSON syntax validation
- Status Feedback: Visual indicators for success/errors
- Original DOCX Preview: View document as image (requires Poppler)
- HTML Rendering: See how document renders in browser
- JSON Structure: Explore simplified document structure
- Drag & Drop Upload: Easy file upload
- Responsive Design: Works on all screen sizes
- Beautiful UI: Modern gradient design with smooth animations
- Python 3.7 or higher
- pip package manager
- Microsoft Word (optional, for DOCX to image conversion on Windows)
- Poppler (optional, for PDF to image conversion)
-
Clone the repository
git clone https://github.com/Tailor-AUS/SimplifyDocx.git cd SimplifyDocx -
Install Python dependencies
pip install -e . pip install flask flask-cors mammoth docx2pdf pdf2image -
Install Poppler (Optional - for image preview)
Windows:
curl -L -o poppler.zip https://github.com/oschwartz10612/poppler-windows/releases/download/v24.08.0-0/Release-24.08.0-0.zip powershell -Command "Expand-Archive -Path poppler.zip -DestinationPath . -Force"macOS:
brew install poppler
Linux:
sudo apt-get install poppler-utils
-
Run the application
python app.py
-
Open your browser Navigate to: http://127.0.0.1:5000
-
Upload Document
- Drag & drop a DOCX file or click to browse
- Supports files up to 16MB
-
View Comparisons
- Switch between different view modes using tabs
- Compare original formatting with conversions
-
Live Editing
- Go to "β¨ Live JSON Editor" tab
- Edit JSON structure on the left
- Click "Enable Live Preview" for auto-updates
- Or click "Update Preview" for manual updates
- Watch HTML preview update in real-time on the right
- All 3 Panels: Original | HTML | JSON side-by-side
- Live JSON Editor: Edit JSON with live HTML preview
- Original vs HTML: Compare original document with HTML rendering
- Original vs JSON: Compare original with simplified structure
- HTML vs JSON: Compare HTML output with JSON structure
- Original Only: Full view of original DOCX
- HTML Only: Full view of HTML conversion
- JSON Only: Full view of JSON structure
app.py - Main application server
- File upload handling with security
- DOCX to HTML conversion (Mammoth)
- DOCX to JSON simplification (Simplify-Docx library)
- DOCX to image conversion (docx2pdf + pdf2image + Poppler)
- JSON to HTML live conversion endpoint
- CORS enabled for development
templates/index.html - Single-page application
- Responsive grid layout system
- 8 different view modes
- Real-time JSON editor with syntax validation
- Live preview with debounced updates
- Drag & drop file upload
- Status notifications and error handling
- Flask: Web framework
- Mammoth: DOCX to HTML conversion
- python-docx: DOCX parsing
- Simplify-Docx: Document structure simplification
- docx2pdf: DOCX to PDF conversion
- pdf2image: PDF to image conversion
- Poppler: PDF rendering engine
SimplifyDocx/
βββ app.py # Flask application
βββ templates/
β βββ index.html # Web interface
βββ src/ # Simplify-Docx library
β βββ simplify_docx/
βββ uploads/ # Temporary file storage (gitignored)
βββ poppler-24.08.0/ # Poppler binaries (gitignored)
βββ README.md # This file
βββ WEB_APP_README.md # Additional documentation
βββ setup.py # Python package setup
βββ .gitignore # Git ignore rules
Edit in app.py:
app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024 # 16MBEdit in templates/index.html:
updateTimeout = setTimeout(() => {
updatePreview();
}, 500); // 500ms delayapp.run(
debug=True, # Set to False in production
host='127.0.0.1', # Change to '0.0.0.0' for network access
port=5000, # Change port if needed
use_reloader=False, # Prevent double loading
threaded=True # Enable threading
)- Cause: Poppler not installed or not in PATH
- Solution: Install Poppler following instructions above
- Note: Image preview is optional; HTML and JSON views work without it
- Cause: docx2pdf requires Microsoft Word
- Solution: Install Microsoft Word or accept that image preview won't work
- Alternative: All other features work without Word
# Find process on port 5000
netstat -ano | findstr :5000
# Kill the process (Windows)
taskkill /F /PID <process_id>
# Kill the process (Mac/Linux)
kill -9 <process_id>- Ensure JSON is properly formatted
- Use "Format JSON" button to fix formatting
- Check browser console for detailed error messages
python app.pypip install gunicorn
gunicorn -w 4 -b 0.0.0.0:5000 app:app# Dockerfile example
FROM python:3.9
WORKDIR /app
COPY . .
RUN pip install -e .
RUN pip install flask flask-cors mammoth docx2pdf pdf2image gunicorn
EXPOSE 5000
CMD ["gunicorn", "-w", "4", "-b", "0.0.0.0:5000", "app:app"]This is a private repository. For internal contributions:
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
This project is based on microsoft/Simplify-Docx and includes significant enhancements.
Original Simplify-Docx library: MIT License Web application enhancements: MIT License
- Original Simplify-Docx library by Microsoft Research
- Mammoth.js for DOCX to HTML conversion
- Flask framework and community
- All open-source contributors
For questions or issues, please contact your development team lead.
- β¨ Live JSON Editor with real-time HTML preview
- π 8 different comparison view modes
- π¨ Modern responsive UI
- πΌοΈ Image preview support
- π Drag & drop file upload
- β‘ Real-time validation and feedback
Built with β€οΈ by the Tailor-AUS Team