🍌 Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Break free from static formats. Our platform empowers you to transform fixed content into fully manipulatable assets. Powered by SAM 3 and multimodal large models, it enables high-fidelity reconstruction that preserves the original diagram details and logical relationships.

Try It Now!

👆 Click above or https://editbanana.anxin6.cn/ to try Edit Banana online! Upload an image or pdf, get editable DrawIO (XML) or PPTX in seconds. Please note: Our GitHub repository currently trails behind our web-based service. For the most up-to-date features and performance, we recommend using our web platform.

💬 Join WeChat Group

Welcome to join our WeChat group to discuss and exchange ideas! Scan the QR code below to join:

Scan to join the Edit Banana community

💡 If the QR code has expired, please submit an Issue to request an updated one.

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

To demonstrate the high-fidelity conversion effect, we provides one-to-one comparisons between 3 scenarios of "original static formats" and "editable reconstruction results". All elements can be individually dragged, styled, and modified.

Scenario 1: Figures to Drawio(xml, svg, pptx)

Example No.	Original Static Diagram (Input · Non-editable)	DrawIO Reconstruction Result (Output · Fully Editable)
Example 1: Basic Flowchart
Example 2: Multi-level Architecture Diagram
Example 3: Technical Schematic
Example 4: Scientific Formula Diagram

Scenario 2: PDF to PPTX

Scenario 3: Human in the Loop Modification

✨ Conversion Highlights:

Preserves the layout logic, color matching, and element hierarchy of the original diagram

1:1 restoration of shape stroke/fill and arrow styles (dashed lines/thickness)

Accurate text recognition, supporting direct subsequent editing and format adjustment

All elements are independently selectable, supporting native DrawIO template replacement and layout optimization

Key Features

Advanced Segmentation: Using our fine-tuned SAM 3 (Segment Anything Model 3) for segmentation of diagram elements.
Fixed Multi-Round VLM Scanning: An extraction process guided by Multimodal LLMs (Qwen-VL/GPT-4V).
High-Quality OCR:
- Azure Document Intelligence for precise text localization.
- Fallback Mechanism: Automatically switches to VLM-based end-to-end OCR if Azure services are unreachable.
- Mistral Vision/MLLM for correcting text and converting mathematical formulas to LaTeX ($\int f(x) dx$).
- Crop-Guided Strategy: Extracts text/formula regions and sends high-res crops to LLMs for pixel-perfect recognition.
User System:
- Registration: New users receive 10 free credits.
- Credit System: Pay-per-use model prevents resource abuse.
Multi-User Concurrency: Built-in support for concurrent user sessions using a Global Lock mechanism for thread-safe GPU access and an LRU Cache (Least Recently Used) to persist image embeddings across requests, ensuring high performance and stability.
Web Interface: A React-based frontend + FastAPI backend for easy uploading and editing.

Architecture Pipeline

Input: Image (PNG/JPG) or PDF.
Segmentation (SAM3): Using our fine-tuned SAM3 mask decoder.
Text Extraction (Parallel):
- Azure OCR detects text bounding boxes.
- High-res crops of text regions are sent to Mistral/LLM.
- Latex conversion for formulas.
XML/PPTX Generation: Merging spatial data from our fine-tuned SAM3 and Text OCR.

Project Structure

sam3_workflow/
├── config/                 # Configuration files
├── flowchart_text/         # OCR & Text Extraction Module
│   ├── src/                # OCR Source Code (Azure, Mistral, Alignment)
│   └── main.py             # OCR Entry point
├── frontend/               # React Web Application
├── input/                  # [Manual] Input images directory
├── models/                 # [Manual] Model weights (SAM3)
├── output/                 # [Manual] Results directory
├── sam3/                   # SAM3 Model Library
├── scripts/                # Utility Scripts
│   └── merge_xml.py        # XML Merging & Orchestration
├── main.py                 # CLI Entry point (Modular Pipeline)
├── server_pa.py            # FastAPI Backend Server (Service-based)
└── requirements.txt        # Python dependencies

Installation & Setup

Follow these steps to set up the project locally.

1. Prerequisites

Python 3.10+
Node.js & npm (for the frontend)
CUDA-capable GPU (Highly recommended)

2. Clone Repository

git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Image2DrawIO

3. Initialize Directory Structure

After cloning, you must manually create the following resource directories (ignored by Git):

# Create input/output directories
mkdir -p input
mkdir -p output
mkdir -p sam3_output

4. Download Model Weights

Download the required models and place them in the correct paths:

Model	Download	Target Path
SAM 3	https://modelscope.cn/models/facebook/sam3	`models/sam3.pt` (or as configured)

Note: For SAM 3 (or the specific segmentation checkpoint used), place the .pt file in models/ and update config.yaml.

5. Install Dependencies

Backend:

pip install -r requirements.txt

Frontend:

cd frontend
npm install
cd ..

6. Configuration

Config File: Copy the example config.

cp config/config.yaml.example config/config.yaml

Environment Variables: Create a .env file in the root directory.

AZURE_ENDPOINT=your_azure_endpoint
AZURE_API_KEY=your_azure_key
# Add other keys as needed

Usage

1. Web Interface (Recommended)

Start the Backend:

python server_pa.py
# Server runs at http://localhost:8000

Start the Frontend:

cd frontend
npm install
npm run dev
# Frontend runs at http://localhost:5173

Open your browser, upload an image, and view the result in the embedded DrawIO editor.

2. Command Line Interface (CLI)

To process a single image:

python main.py -i input/test_diagram.png

The output XML will be saved in the output/ directory.

Configuration `config.yaml`

Customize the pipeline behavior in config/config.yaml:

sam3: Adjust score thresholds, NMS (Non-Maximum Suppression) thresholds, max iteration loops.
paths: Set input/output directories.
dominant_color: Fine-tune color extraction sensitivity.

📌 Development Roadmap

Feature Module	Status	Description
Core Conversion Pipeline	✅ Completed	Full pipeline of segmentation, reconstruction and OCR
Intelligent Arrow Connection	⚠️ In Development	Automatically associate arrows with target shapes
DrawIO Template Adaptation	📍 Planned	Support custom template import
Batch Export Optimization	📍 Planned	Batch export to DrawIO files (.drawio)
Local LLM Adaptation	📍 Planned	Support local VLM deployment, independent of APIs

🤝 Contribution Guidelines

Contributions of all kinds are welcome (code submissions, bug reports, feature suggestions):

Fork this repository
Create a feature branch (git checkout -b feature/xxx)
Commit your changes (git commit -m 'feat: add xxx')
Push to the branch (git push origin feature/xxx)
Open a Pull Request

Bug Reports: Issues Feature Suggestions: Discussions

🤩 Contributors

Thanks to all developers who have contributed to the project and promoted its iteration!

Name/ID	Email
Chai Chengliang	ccl@bit.edu.cn
Zhang Chi	zc315@bit.edu.cn
Deng Qiyan
Rao Sijing
Yi Xiangjian
Li Jianhui
Shen Chaoyuan
Zhang Junkai
Han Junyi
You Zirui
Xu Haochen
An Minghao
Yu Mingjie
Yu Xinjiang
Chen Zhuofan
Li Xiangkun

📄 License

This project is open-source under the Apache License 2.0, allowing commercial use and secondary development (with copyright notice retained).

🌟 Star History

🌟 If this project helps you, please star it to show your support!

(https://www.star-history.com/#bit-datalab/edit-banana&type=date&legend=top-left)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🍌 Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Try It Now!

💬 Join WeChat Group

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

Scenario 1: Figures to Drawio(xml, svg, pptx)

Scenario 2: PDF to PPTX

Scenario 3: Human in the Loop Modification

Key Features

Architecture Pipeline

Project Structure

Installation & Setup

1. Prerequisites

2. Clone Repository

3. Initialize Directory Structure

4. Download Model Weights

5. Install Dependencies

6. Configuration

Usage

1. Web Interface (Recommended)

2. Command Line Interface (CLI)

Configuration `config.yaml`

📌 Development Roadmap

🤝 Contribution Guidelines

🤩 Contributors

📄 License

🌟 Star History

About

Uh oh!

Releases

Packages

Contributors 6

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
config		config
flowchart_text		flowchart_text
modules		modules
prompts		prompts
sam3		sam3
sam3_service		sam3_service
scripts		scripts
static		static
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
server_pa.py		server_pa.py

BIT-DataLab/Edit-Banana

Folders and files

Latest commit

History

Repository files navigation

🍌 Edit Banana

Universal Content Re-Editor: Make the Uneditable, Editable

Try It Now!

💬 Join WeChat Group

📸 Effect Demonstration

High-Definition Input-Output Comparison (3 Typical Scenarios)

Scenario 1: Figures to Drawio(xml, svg, pptx)

Scenario 2: PDF to PPTX

Scenario 3: Human in the Loop Modification

Key Features

Architecture Pipeline

Project Structure

Installation & Setup

1. Prerequisites

2. Clone Repository

3. Initialize Directory Structure

4. Download Model Weights

5. Install Dependencies

6. Configuration

Usage

1. Web Interface (Recommended)

2. Command Line Interface (CLI)

Configuration config.yaml

📌 Development Roadmap

🤝 Contribution Guidelines

🤩 Contributors

📄 License

🌟 Star History

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Configuration `config.yaml`

Packages