fix(mdict): resolve database compatibility and non-ASCII filename issues#2
Open
sdy623 wants to merge 3 commits intoVimWei:mainfrom
Open
fix(mdict): resolve database compatibility and non-ASCII filename issues#2sdy623 wants to merge 3 commits intoVimWei:mainfrom
sdy623 wants to merge 3 commits intoVimWei:mainfrom
Conversation
BREAKING CHANGES: - Force rebuild of incompatible .mdx.db files created by other tools - Validate table structure before use (9 columns required) Changes: - Add database table structure validation on initialization - Implement automatic index rebuild for incompatible databases - Fix SQL injection vulnerabilities in lookup_indexes() and get_keys() - Add comprehensive error handling for sqlite3 operations - Validate query result tuple length before accessing indices - Replace string formatting with parameterized queries for security Database Schema Validation: - Check MDX_INDEX table exists before use - Verify 9-column structure: key_text, file_path, file_pos, compressed_size, decompressed_size, record_block_type, record_start, record_end, offset - Compare actual columns against expected schema - Auto-rebuild if structure mismatch detected Security Improvements: - Replace unsafe SQL string formatting with parameterized queries - Prevent SQL injection in keyword lookups - Sanitize wildcard queries (* → %) Bug Fixes: - Fix IndexError when accessing result[8] with incompatible databases - Handle databases created by mdict-utils, GoldenDict, or older versions - Gracefully handle corrupt or incomplete database entries - Support MDX files with non-ASCII characters in filename Error Handling: - Add try-catch blocks for sqlite3.Error - Log detailed warnings for incompatible table structures - Skip incomplete index entries instead of crashing - Provide informative error messages for debugging Compatibility: - Works with databases from multiple MDX tools - Maintains backward compatibility with existing code - Automatically upgrades old database formats Fixes: #<issue-number> Related: Non-ASCII filename support, database version conflicts
Add comprehensive headless API for MDX dictionary queries without GUI dependency,
enabling programmatic access to all dictionary features with clean Python interface.
Features:
- Expose core APIs: Dictionary, WordParser, mdx2html, mdx2pdf, mdx2img
- Support both simple queries and batch conversions
- Auto-fallback strategies (case-insensitive, hyphen removal, link following)
- Optional dependency groups: [gui], [conversion], [all]
- Command-line argument support for all example scripts
Core APIs:
- Dictionary: Main interface for MDX/MDD queries with automatic resource management
- WordParser: Parse input files with lesson markers and comments
- mdx2html: Convert word lists to HTML with CSS embedding and image support
- mdx2pdf: Generate PDFs with wkhtmltopdf integration
- mdx2img: Export to images (PNG/JPEG/WEBP) with optimization
Query Word Tools:
- query_word.py: Fast single-word query with complete HTML output
* Extract and embed dictionary internal CSS styles
* Auto-embed images as base64 from .mdd files
* Two-layer CSS system (dictionary + custom beautification)
* Standalone HTML files ready for offline use
- single_word_query.py: Advanced queries with multiple output modes
Batch Conversion:
- batch_conversion.py: Batch convert word lists to HTML/PDF/images
* Timestamped output files: YYYYMMDD-HHMMSS_{input_stem}.{ext}
* Prevent file overwrites with automatic naming
* Support custom PDF/image options
* Track invalid words for review
Example Scripts with CLI Arguments:
- basic_query.py: --mdx <path>
- batch_conversion.py: --mdx-file, --input-file, --output-dir
- custom_styles.py: --mdx <path>
- progress_callback.py: --mdx <path>
- query_word.py: <word> --mdx --output [--no-images]
- single_word_query.py: <word> --mdx [--mode simple|complete|custom-css]
Dependency Management:
- Core: No GUI dependencies required
- Optional [gui]: PySide6, markdown for GUI features
- Optional [conversion]: wkhtmltopdf for PDF/image generation
- Install flexibility: pip install .[gui] or .[conversion] or .[all]
CSS Integration:
- Automatic extraction of dictionary internal CSS via merge_css()
- Image embedding with embed_images() for standalone files
- Custom beautification styles layered on top
- Responsive design with mobile support
Documentation:
- HEADLESS_API.md: Complete API reference
- HEADLESS_LIBRARY_SUMMARY.md: Architecture and design decisions
- QUERY_WORD_UPDATE.md: Detailed query_word.py enhancements
- WORD_QUERY_GUIDE.md: Usage guide with examples
- QUERY_SUMMARY.md: Feature comparison matrix
- AUDIO_IMPLEMENTATION_SUMMARY.md: Audio support details
- DATABASE_COMPATIBILITY.md: Table structure compatibility guide
- NON_ASCII_FILENAME_SUPPORT.md: Non-ASCII filename handling
Breaking Changes:
- Restructure pyproject.toml with optional dependency groups
- Update __init__.py to expose headless APIs
- GUI components now optional, install with [gui] extra
Architecture:
- Clean separation of concerns (core, GUI, conversion)
- Context manager support for proper resource cleanup
- Type hints for better IDE support
- Comprehensive error handling and logging
Testing:
- All example scripts support command-line arguments
- Tested with ASCII and non-ASCII filenames
- Validated CSS extraction and image embedding
- Confirmed batch conversion with timestamped outputs
Migration Guide:
- Existing GUI functionality unchanged
- New headless features work alongside GUI
- No breaking changes for current GUI users
- Progressive enhancement approach
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
BREAKING CHANGES:
Changes:
Database Schema Validation:
Security Improvements:
Bug Fixes:
Error Handling:
Compatibility:
Fixes: #
Related: Non-ASCII filename support, database version conflicts