No announcement yet.

Apply ocr

  • Filter
  • Time
  • Show
Clear All
new posts

  • Apply ocr

    I have some pages scanned and on some pages there are images with old text (old medieval) and just below the figure a text describing it, like an art book describing the figures, what happens is that the ocr damages (distorts) the text of the images.
    I would like to know if it is possible to apply the ocr only in the text that describe the images and ignore the text in the images.
    (batch process).
    original file:
    after ocr

    Thank you in advance!

  • #2
    syfysym ,I am sorry that currently Foxit PhantomPDF still cannot support to only OCR part of the scanned page. Regarding this situation,I have submitted the suggestion "Supports to select part of the scanned page and right-click to OCR" as a new feature request to our product management team's reference with suggestion ID#PHANTOM-5695. For your current workaround, please choose to check the option "Find All suspect (Show all OCR results that may need to be changed.)" in Select OCR engine dialog box when you try to OCR the PDF file, then Foxit PhantomPDF will bring up the "OCR Suspects" dialog box,please choose to set all of those characters on images into "Not Text" in the OCR suspects dialog box to keep those texts retained as image-based texts on images.
    Attached Files


    • #3
      I would also like to support the feature functionality of being able to OCR say a single image on a page.
      I currently have a PDF with both images and regular vector text, and running OCR will not actually OCR the image (which contains some text in the image).


      • #4
        @SR May I know how did you run OCR for PDF file? please provide detailed steps and sample files for reference.
        Besides, please try to select "Editable Text" mode in OCR dialog box and see result. thanks.


        • #5
          amanda_liang I thought I spelt it out pretty clearly already:
          1. Open PDF file
          2. Press Quick OCR
          3. Realize that it did not correctly OCR a picture with words
          4. Give up 😒😞
          5. Make my own manual Typewriter Comments instead
          Also, I do not like using Editable Text mode because then I cannot see the original, and then I cannot tell whether the OCR made a mistake or not.
          FYI, the picture is a assembly diagram drawing with callout balloons. The OCR fails to recognize the balloon numbers (they are plenty large enough text font size that it should work).


          • #6
            @SR,Thanks for your response.For this situation you mentioned,please help to provide with us the following information,so that we will do a closer testing on our part.
            1:Please click on "Help"tab>"About Foxit PDF Editor" or "About Foxit PhantomPDF" to check its version number.
            2:Please send us the original scanned PDF file and the PDF file which has been performed OCR in it.
            If these PDF files are not convenient to upload on the forum, please help to submit a ticket to send us the documents to our Foxit Support team. When you submit a ticket from our ticket submission center, please help to include this forum thread link.