No announcement yet.

Corrections of hidden OCR texts

  • Filter
  • Time
  • Show
Clear All
new posts

  • FAQ Corrections of hidden OCR texts

    I have a PDF file with a scanned image of a newspaper article and with hidden text produced by OCR. (This is the same as would be produced by scanning an image and applying OCR, but with the option to not modify the image.) I want to correct the hidden text without modifying the image.

    I used a 30 day trial of Phantom PDF to try this out. I found I could do it with the Opacity tool, but it is fiddly. This is the process I used.

    First I needed to make the hidden text visible and grey out the image so it doesn't obscure the text. To do this, open the file and select Edit, Edit Object, Text (in the drop down menu). Select all the text objects (the hidden text), (Ctrl, click and drag over the image) and change opacity for all text to full opacity. Then select Edit, Edit Object, Image (in the drop down menu). Select all the image objects, (Ctrl, click and drag over the image) and change opacity for all images to light grey so the text is not obscured by the image.

    Next select Edit, Edit Text and the text can be edited. Edit, Add Text can be used to insert missed words. It might need some manipulation to get the best size and position for the inserted text.

    Next reverse the opacity changes to make the hidden text invisible and the image text fully opaque. Select Edit, Edit Object, Text. Select all the text objects (the hidden text), and change opacity for all text to no opacity (transparent). Then select Edit, Edit Object, Image. Select all the image objects, and change opacity for all images back to full opacity.

    I could have used Edit, SpellCheck, which will find some mistakes and allow easy correction, but not, for example, missing words. Also,the edited text remains visible, so the opacity manipulation is still required.

    The View, Text Viewer option gives a clear view of the hidden text, but I couldn't find any option to edit the text in that view. All the edit functions are disabled.

    I originally wrote this post to find out how to do it, but now that I have worked this much out I thought I would post it for others to see (since I couldn't see how to delete the post). Should the prefix be FAQ?

    Is there a better way to do this?

    Last edited by Hovic; 09-13-2015, 06:12 AM.

  • #2
    Hi Hovic,
    Sorry, Text viewer does not allow to edit text, it is just a viewer.
    I have a question, if you are in the Edit Text mode, you can only edit Text, it will not modify the image. Why do you need to set the image opacity again?
    Maybe it is a specific file, would you mind sending us the file which has the issue for internal test?
    If it is inconvenient to upload here, you may email the PDF document to Thank you in advance.
    Last edited by lyndi_wu; 09-12-2016, 09:04 AM.


    • #3
      Hi lyndi_wu,
      Thanks for the reply.

      The idea is to reveal the hidden text, edit it, then conceal it again. Changing the opacity is just to improve visibility for the editing process so that the hidden text is visible and not obscured by the image. Does that make it clear?

      Rather than send you the file, here is a link. I don't think there are copyright issues, but this avoids them anyway, I think.

      This file is typical of the files provided by Trove, a service which is a digital repository of, amongst other things, Australian newspapers up to 1954. (If you explore this site, you will find that facility to edit the article text on the web site, but the changes are not reflected in the PDF, hence the need to modify the hidden text.)



      • #4
        Hi Hovic,thanks for finding the way to change the hidden OCR texts and share with other users by posting the instructions in this thread in our forum.I also have added "FAQ"Prefix for your post already.
        Last edited by Lisa_lee; 09-20-2015, 08:32 AM.


        • #5
          Hi Hovic, i agree with lyndi_wu thought but u also tryout this,
          Select the "Tools>Advanced Editing>TouchUp Text Tool", after that select the text which you want to edit.
          Right-click on the selection and take it to the properties dialog. After that select black as the stroke and fill color. This will make the text visible.