alexbrandsen/pdf-to-nested-xml
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Python script to convert a folder of Dutch archaeological reports (PDFs) to XML files nested by section, chapter, heading. Requires: - pdf-extract (https://github.com/CrossRef/pdfextract) - pdftohtml (http://pdftohtml.sourceforge.net/)