|
| 1 | +== Camel Example Spring Boot and Docling |
| 2 | + |
| 3 | +This example shows how to work with Apache Camel and the Docling component using Spring Boot to convert documents to Markdown format. |
| 4 | + |
| 5 | +The example demonstrates document conversion from various formats (PDF, DOCX, PPTX, HTML, MD) to Markdown using the Docling service. |
| 6 | + |
| 7 | +=== Features |
| 8 | + |
| 9 | +* Document to Markdown Conversion - Automatically converts PDF, DOCX, PPTX, HTML, and MD files to Markdown format |
| 10 | +* Metadata Extraction - Optional route to extract structured data as JSON from documents |
| 11 | +* File-based Processing - Watch directories for new files and process them automatically |
| 12 | +* Java DSL Routes - Routes defined using Camel Java DSL |
| 13 | + |
| 14 | +=== Prerequisites |
| 15 | + |
| 16 | +Before running this example, you need: |
| 17 | + |
| 18 | +1. A running Docling-Serve instance at http://0.0.0.0:5001 |
| 19 | + |
| 20 | +---- |
| 21 | +docker run -p 5001:5001 -e DOCLING_SERVE_ENABLE_UI=1 ghcr.io/docling-project/docling-serve:latest |
| 22 | +---- |
| 23 | + |
| 24 | +=== How to run |
| 25 | + |
| 26 | +You can run this example using: |
| 27 | + |
| 28 | +---- |
| 29 | +mvn spring-boot:run |
| 30 | +---- |
| 31 | + |
| 32 | +=== Usage |
| 33 | + |
| 34 | +==== Converting Documents to Markdown |
| 35 | + |
| 36 | +1. Start the application |
| 37 | +2. Place any PDF, DOCX, PPTX, HTML, or MD file in the `src/main/resources/documents/` directory |
| 38 | +3. The application will: |
| 39 | + - Detect the new file |
| 40 | + - Convert it to Markdown using Docling |
| 41 | + - Save the converted file to `src/main/resources/output/` directory with `.md` extension |
| 42 | + - Delete the original file from `src/main/resources/documents/` |
| 43 | + |
| 44 | +==== Extracting Structured Metadata |
| 45 | + |
| 46 | +1. Place documents in `src/main/resources/documents/extract/` |
| 47 | +2. The application will: |
| 48 | + - Extract structured data from the document |
| 49 | + - Save the metadata as JSON to `src/main/resources/output/metadata/` |
| 50 | + |
| 51 | +=== Project Structure |
| 52 | + |
| 53 | +---- |
| 54 | +docling/ |
| 55 | +└── src/ |
| 56 | + └── main/ |
| 57 | + ├── java/ |
| 58 | + │ └── org/apache/camel/example/springboot/docling/ |
| 59 | + │ ├── DoclingServeApplication.java |
| 60 | + │ └── DoclingServeRoute.java |
| 61 | + └── resources/ |
| 62 | + ├── application.properties |
| 63 | + ├── documents/ # Input directory - place documents here |
| 64 | + │ └── extract/ # Optional: place documents here for metadata extraction |
| 65 | + └── output/ # Output directory for converted Markdown files |
| 66 | + └── metadata/ # Output directory for extracted metadata JSON files |
| 67 | +---- |
| 68 | + |
| 69 | +=== Configuration |
| 70 | + |
| 71 | +Edit `src/main/resources/application.properties` to configure: |
| 72 | + |
| 73 | +* `docling.serve.url` - URL of the Docling service |
| 74 | +* `documents.directory` - Input directory for documents to convert (default: src/main/resources/documents) |
| 75 | +* `output.directory` - Output directory for converted files (default: src/main/resources/output) |
| 76 | + |
| 77 | +=== Routes |
| 78 | + |
| 79 | +The application includes two routes defined in `src/main/java/org/apache/camel/example/springboot/docling/DoclingServeRoute.java`: |
| 80 | + |
| 81 | +==== document-to-markdown-converter |
| 82 | + |
| 83 | +* Watches: `src/main/resources/documents/` directory |
| 84 | +* Accepts: PDF, DOCX, PPTX, HTML, MD files |
| 85 | +* Action: Converts to Markdown |
| 86 | +* Output: `src/main/resources/output/{filename}.md` |
| 87 | + |
| 88 | +==== document-metadata-extractor |
| 89 | + |
| 90 | +* Watches: `src/main/resources/documents/extract/` directory |
| 91 | +* Accepts: PDF, DOCX, PPTX files |
| 92 | +* Action: Extracts structured data as JSON |
| 93 | +* Output: `src/main/resources/output/metadata/{filename}.json` |
| 94 | + |
| 95 | +=== Using Camel components |
| 96 | + |
| 97 | +Apache Camel provides 200+ components which you can use to integrate and route messages between many systems |
| 98 | +and data formats. To use any of these Camel components, add the component as a dependency to your project. |
| 99 | + |
| 100 | +=== Help and contributions |
| 101 | + |
| 102 | +If you hit any problem using Camel or have some feedback, then please |
| 103 | +https://camel.apache.org/support.html[let us know]. |
| 104 | + |
| 105 | +We also love contributors, so |
| 106 | +https://camel.apache.org/contributing.html[get involved] :-) |
| 107 | + |
| 108 | +The Camel riders! |
0 commit comments