Skip to content

Image is re-rendered before comparison even when image DPI is higher than 72 #101

@dkarsai

Description

@dkarsai

Hi, I'm hoping I can get some help with a problem I'm having. The library might be working as intended, but I'm unsure what is happening under the hood exactly, so any input is welcome.

I noticed mask placement based on regexes was sometimes failing. I found a suspicious line in the logs:

Re-Render document for OCR at 300 DPI as current resolution is only 72 DPI

I found this strange because the images I'm comparing are around 200 DPI.

I found in the code where the self.DPI is set to 72 (e.g.: DocTest/CompareImage.py:476 load_image_into_array function), but I was unable to decipher what's the purpose of this is.

I looked at the output of OCR (text extracted with Get Text From Document keyword) and found that the recognized text is indeed incorrect.
Example:
A70578524 was recognized as A/03/8524

When extracting the text with Get Text From Document keyword, the same line reappeared:

Re-Render document for OCR at 300 DPI as current resolution is only 72 DPI

So I set increase_resolution=false to prevent re-rendering and the output OCR was now as expected: 'A70578524'

I experimented some more and set MINIMUM_OCR_RESOLUTION to 72 in DocTest/CompareImage.py:34.
This prevented re-rendering from being triggered when using the Compare image keyword and all masks were placed correctly.

So I think the re-rendering is introducing some issue with identifying the text correctly with OCR, causing the masks to not get applied.

Why is the DPI of the image being set to 72 in the code? Why won't it recognize the correct DPI of the image and not re-render?
Implementing the ability to set MINIMUM_OCR_RESOLUTION when calling the Compare Images keyword would provide a solution for the issue, but I did not want to suggest any changes until I understand the issue completely.

Thank you in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions