Skip to content

Issue with PDFs containing Arabic script/RTL script #101

@florisre

Description

@florisre

Current behavior:

The text is not selected where it is in the document. Click & pull to select results in the following selection:
Current behavior
Right-clicking the selection and copying it to the clipboard results in the following output:

د ه د ا ب ش ب ه ا ی ش ا ع ر ا ن و ن و ی س ن د گ ا ن د ر ا ن ج م ن ف ر ه س گ ی ا ب ر ا ن ۹ آ ل م ا

Correct behavior:

Chromium's pdfium (I hope that is actually what's displaying PDFs in Chroium), and thus all Chromium-based browsers I have tried, do handle this correctly:
Chromium's behavior
The selected text copies correctly as:


ده د
شبهای شاعران ونویسندگان اب
درانجمن فرهسگی ابران ۹آلمان 

Bigger scope

This issue is prominent and related to how RTL-documents are handled in PDF standards. Also see this contribution over at Adobe community and this discussion of the issue over at tesseract.

For further evaluation, I have attached the first page of the document shown in the screenshots here: https://bwsyncandshare.kit.edu/s/ZwQ7zyWXmKLHpdH

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions