Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.
π Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image or pdf, get editable DrawIO (XML) or PPTX in seconds. Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.
Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:
Scan to join the Edit Banana community
π‘ If the QR code has expired, please submit an Issue to request an updated one.
To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 3 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.
β¨ Conversion Highlights:
- Preserves the layout logic, color matching, and element hierarchy of the original diagram
- 1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)
- Accurate text recognition, supporting direct subsequent editing and format adjustment
- All elements are independently selectable, supporting native DrawIO template replacement and layout optimization
- Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
- Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V).
-
High-Quality OCR:
- Azure Document Intelligence for precise text localization.
- Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
-
Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX (
$\int f(x) dx$ ). - Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
-
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
- Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
- Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.
- Input: Image (PNG/JPG) or PDF.
- Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
- Text Extraction (Parallel):
- Azure OCR detects text bounding boxes.
- High-res crops of text regions are sent to Mistral/LLM.
- Latex conversion for formulas.
- XML/PPTX Generation: Merging spatial data from our fine-tuned SAM3 and Text OCR.
sam3_workflow/
βββ config/ # Configuration files
βββ flowchart_text/ # OCR & Text Extraction Module
β βββ src/ # OCR Source Code (Azure, Mistral, Alignment)
β βββ main.py # OCR Entry point
βββ frontend/ # React Web Application
βββ input/ # [Manual] Input images directory
βββ models/ # [Manual] Model weights (SAM3)
βββ output/ # [Manual] Results directory
βββ sam3/ # SAM3 Model Library
βββ scripts/ # Utility Scripts
β βββ merge_xml.py # XML Merging & Orchestration
βββ main.py # CLI Entry point (Modular Pipeline)
βββ server_pa.py # FastAPI Backend Server (Service-based)
βββ requirements.txt # Python dependencies
Follow these steps to set up the project locally.
- Python 3.10+
- Node.js & npm (for the frontend)
- CUDA-capable GPU (Highly recommended)
git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Image2DrawIOAfter cloning, you must manually create the following resource directories (ignored by Git):
# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_outputDownload the required models and place them in the correct paths:
| Model | Download | Target Path |
|---|---|---|
| SAM 3 | https://modelscope.cn/models/facebook/sam3 | models/sam3.pt (or as configured) |
Note: For SAM 3 (or the specific segmentation checkpoint used), place the
.ptfile inmodels/and updateconfig.yaml.
Backend:
pip install -r requirements.txtFrontend:
cd frontend
npm install
cd ..- Config File: Copy the example config.
cp config/config.yaml.example config/config.yaml
- Environment Variables: Create a
.envfile in the root directory.AZURE_ENDPOINT=your_azure_endpoint AZURE_API_KEY=your_azure_key # Add other keys as needed
Start the Backend:
python server_pa.py
# Server runs at http://localhost:8000Start the Frontend:
cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173Open your browser, upload an image, and view the result in the embedded DrawIO editor.
To process a single image:
python main.py -i input/test_diagram.pngThe output XML will be saved in the output/ directory.
Customize the pipeline behavior in config/config.yaml:
- sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
- paths: Set input/output directories.
- dominant_color: Fine-tune color extraction sensitivity.
| Feature Module | Status | Description |
|---|---|---|
| Core Conversion Pipeline | β Completed | Full pipeline of segmentation, reconstruction and OCR |
| Intelligent Arrow Connection | Automatically associate arrows with target shapes | |
| DrawIO Template Adaptation | π Planned | Support custom template import |
| Batch Export Optimization | π Planned | Batch export to DrawIO files (.drawio) |
| Local LLM Adaptation | π Planned | Support local VLM deployment, independent of APIs |
Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):
- Fork this repository
- Create a feature branch (
git checkout -b feature/xxx) - Commit your changes (
git commit -m 'feat: add xxx') - Push to the branch (
git push origin feature/xxx) - Open a Pull Request
Bug Reports: Issues Feature Suggestions: Discussions
Thanks to all developers who have contributed to the project and promoted its iteration!
| Name/ID | |
|---|---|
| Chai Chengliang | ccl@bit.edu.cn |
| Zhang Chi | zc315@bit.edu.cn |
| Deng Qiyan | |
| Rao Sijing | |
| Yi Xiangjian | |
| Li Jianhui | |
| Shen Chaoyuan | |
| Zhang Junkai | |
| Han Junyi | |
| You Zirui | |
| Xu Haochen | |
| An Minghao | |
| Yu Mingjie | |
| Yu Xinjiang | |
| Chen Zhuofan | |
| Li Xiangkun |
This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).
π If this project helps you, please star it to show your support!
(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)








