Tessera is a command-line publishing tool for manuscripts written in Word or LibreOffice. It reads DOCX and ODT files, keeps named styles as semantic structure, and builds EPUB, LaTeX, and PDF artifacts from the same source.
The important part is what Tessera refuses to throw away. If an author marks a paragraph as Poem, Letter, or Epigraph, or marks inline text as Foreign - Latin or Direct Thought, that meaning survives the conversion. It does not get flattened into "some italic text with a bit of indentation".
Install the CLI and build the embedded demo:
go install github.com/balyakin/tessera/cmd/tessera@latest
tessera demo --output tessera-demoThat command writes an EPUB to:
tessera-demo/dist/semantic-demo.epub
The demo path is deliberately small: it builds an EPUB without TeX Live, Docker, or network access after the binary is installed.
For an EPUB-only build:
tessera build book.docx --to epub --output dist --lintFor both print and ebook output:
tessera build book.docx --output dist --lintLocal PDF builds use LuaLaTeX by default. If your machine is not set up for TeX, the Docker image includes the runtime pieces:
docker run --rm -v "$PWD:/work" ghcr.io/balyakin/tessera:latest \
build /work/examples/semantic-demo.docx --output /work/dist --lintRun this when something on the machine looks suspicious:
tessera doctorIt checks for PDF and EPUB tooling and prints the next useful command instead of leaving you to guess at missing tools.
Most manuscript converters are good at moving text from one container to another. They are much weaker when a publisher has used styles as meaning:
| Manuscript intent | Generic conversion often becomes | Tessera keeps it as |
|---|---|---|
| A poem | indented paragraphs | verse in IR, LaTeX, and EPUB semantics |
| A letter | ordinary body text | a letter block |
| An epigraph | styled quotation | an epigraph role |
| Latin text | italics | language-aware foreign text |
| Direct thought | italics | a distinct thought role |
That difference matters later. A book designer can adjust the LaTeX macro for every poem. An EPUB workflow can lint semantic XHTML. A CI job can compare canonical IR instead of guessing whether two rendered files mean the same thing.
Tessera does not try to infer meaning from visual formatting. The source of truth is the style name in the manuscript, plus the explicit mapping in tessera.toml.
In Word or LibreOffice, a manuscript can use ordinary named styles:
Paragraph style: Poem
Character style: Foreign - Latin
Character style: Direct Thought
Tessera preserves those names as roles, then renders them as explicit output semantics.
LaTeX:
\begin{verse}
\textlatin{veritas}
\semThought{a private thought}
\end{verse}EPUB XHTML:
<blockquote epub:type="z3998:poem">
<i xml:lang="la">veritas</i>
<i epub:type="z3998:thought">a private thought</i>
</blockquote>Use inspect when you receive a manuscript from an author or editor:
tessera inspect book.docxIt lists paragraph styles, character styles, detected metadata, and whether each style maps to a known role. Unknown styles include suggested TOML snippets, so the fix is usually a small config change rather than a hunt through generated XHTML.
Start a config file with:
tessera init --output tessera.tomlThe default mapping covers common English and Russian style names. Project-specific names belong in [paragraph_styles] and [character_styles].
tessera build book.docx --output dist --lint
tessera build book.odt --to epub --output dist --dump-ir dist/book.ir.json
tessera inspect book.docx
tessera lint dist/book.epub
tessera doctor
tessera version --format json
tessera completion bashBuild, inspect, lint, and version output JSON for scripts:
tessera build book.docx --to epub --output dist --format jsonname: books
on: [push]
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: balyakin/tessera@v1
with:
input: examples/semantic-demo.docx
output: dist
args: --lint
- uses: actions/upload-artifact@v4
with:
name: tessera-artifacts
path: distTessera is also a small Go API around the same parser and renderers used by the CLI:
package main
import "github.com/balyakin/tessera/pkg/tessera"
func main() {
doc, issues, err := tessera.ParseFile("book.docx", tessera.Options{})
if err != nil {
panic(err)
}
_ = issues
epubBytes, renderIssues, err := tessera.RenderEPUB(doc, tessera.Options{Reproducible: true})
if err != nil {
panic(err)
}
_ = renderIssues
_ = epubBytes
}The IR can be marshaled as canonical JSON for golden tests, debugging, or external tools:
data, err := tessera.MarshalIR(doc)Requirements:
- Go 1.22 or newer.
- LuaLaTeX or XeLaTeX for local PDF builds.
epubcheckif you want external EPUB validation in addition to Tessera's built-in lint pass.
Useful local commands:
make build
make test
make lint
make coverThe example manuscripts are generated from source fixtures:
go run ./internal/demo/generateTessera is early, but the core shape is already in place: DOCX and ODT parsing, semantic IR, EPUB output, LaTeX output, CLI commands, Docker packaging, a GitHub Action, and tests around the parser and renderers.
The project is intentionally narrow. It is not a GUI, a SaaS uploader, a Markdown converter, or a visual style inference engine. It is a publishing pipeline for manuscripts where named styles carry the book's structure.
MIT. See LICENSE.