Announcement

Collapse
No announcement yet.

OCR Persian/Farsi PDF files

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Feature OCR Persian/Farsi PDF files

    hi, i tried to convert pdf to word by foxit phantom but the result is terrible. Google has good OCR but it needs to convert PDF files to image files and it needs internet connection. but if you have pdf file with more pages, it isn't good idea.

    Do you have any OCR Language Add-ons for Persian/farsi language? Thanks

    i attached one pdf page to try it.
    Attached Files

  • #2
    Radmehr

    Sorry that currently it is not supported to OCR Persian/Fasi PDF files. I've forwarded your request as feature request to Foxit product management team for processing. Suggestion ID#: PHANTOM-9144.

    Thank you.

    Comment


    • #3
      Originally posted by cherry View Post
      Radmehr

      Sorry that currently it is not supported to OCR Persian/Farsi PDF files. I've forwarded your request as feature request to Foxit product management team for processing. Suggestion ID#: PHANTOM-9144.

      Thank you.
      another request from 2016!!:

      i hope this will be solved soon.

      suggest:
      if foxit phantom adds 4 letters (گ، چ، پ، ژ) to OCR Arabic Add-on, it will be more better. (Arabic letters like these persian letters: ک،ج،ب،ز)
      some letters need change Between arabic and persian: (ar: ي / ك to fa: ی / ک)



      check here:
      Code:
      https://ai.glossika.com/blog/how-to-tell-the-difference-between-arabic-persian-kurdish
      foxit phantom has problem with BOLD letters for this PDF file. if this will be solved, result will be more good.

      Thanks.
      Attached Files
      Last edited by Radmehr; 08-19-2020, 06:59 PM.

      Comment


      • #4
        Radmehr ID#PHANTOM-9144 and ID#PHANTOM-1745 are similarly, both of them are being checked by our team and do not support for now. Any updates will inform you. tks.
        To adds 4 letters (گ، چ، پ، ژ) to OCR Arabic Add-on, it was submitted to our team for developing by ID#PHANTOM-14888, any updates will inform you. tks
        Lastly, can you please provide much detailed info of "foxit phantom has problem with BOLD letters"? we would like to test original pdf file and see result. tks

        Comment


        • #5
          Originally posted by amanda_liang View Post
          Radmehr
          Lastly, can you please provide much detailed info of "foxit phantom has problem with BOLD letters"? we would like to test original pdf file and see result. tks
          please check Original PDF, Foxit result and Online service result (Arabic for both results):
          Attached Files

          Comment


          • #6
            Radmehr Thanks for your files. May I know what's version of Foxit PhantomPDF do you use? I have test your PDF file and OCR it by version 10.0 Foxit PhantomPDF, the result is fine and bold font remained, as attached pic shows. Can you please check and tell us much info? tks
            Attached Files

            Comment


            • #7
              Originally posted by amanda_liang View Post
              Radmehr Thanks for your files. May I know what's version of Foxit PhantomPDF do you use? I have test your PDF file and OCR it by version 10.0 Foxit PhantomPDF, the result is fine and bold font remained, as attached pic shows. Can you please check and tell us much info? tks
              hi, i use V 10.0 Foxit Phantom, too. i checked for some BOLD words, only. i use Colour for each word between 3 files: Original text, Online result and your result.
              i found some difference between my result (#5 post) and your result. do you use "Recognize Text" to edit and fix them?
              Attached Files

              Comment


              • #8
                Radmehr ,Thanks for your response.I didn't check the "Find All Suspect (Show all OCR results that may ned to be changed)" option in OCR settings.Please refer to attached screenshot (OCR settings.jpg) to know what OCR settings that I have made in Foxit PhantomPDF to OCR the PDF file and the ''OCRed by Foxit PhantomPDF v10.pdf'' PDF file is the OCRed result PDF file.(Unzip the attached 'OCRed by Foxit PhantomPDF v10.zip' file to get the OCR result PDF file 'OCRed by Foxit PhantomPDF v10.pdf')
                Please help to open the result PDF file to see if the OCR result still has the bold problem you mentioned in it? If not,please help to set OCR settings in Foxit PhantomPDF as shown in the screenshot (OCR settings.jpg) to OCR the PDF file.

                If you still think the OCR result is not good in the PDF file (OCRed by Foxit PhantomPDF v10.pdf),please help to point out which content in the OCRed PDF file is still significantly different from the content in original PDF file.It's better you could make screenshots to illustrate the difference between them.
                Attached Files
                Last edited by Lisa_lee; 09-01-2020, 11:56 AM.

                Comment


                • #9
                  [QUOTE=Lisa_lee;n179380]Radmehr

                  "You are not authorized to create or remove attachments." Why?!

                  please check this site for more information:
                  Code:
                  ​​​​​​https://www.persianlanguageonline.com/alphabet​

                  Comment


                  • #10
                    Radmehr Do you mean that you are able to upload attachment on forum? If so,it should now be working on your side. We notice that your account was filtered to another usergroup ,now we already changed your account to the correct usergroup. Please let us know if it works for you.

                    Comment


                    • #11
                      Originally posted by Lisa_lee View Post
                      Radmehr

                      If you still think the OCR result is not good in the PDF file (OCRed by Foxit PhantomPDF v10.pdf),please hep to point out which content in the OCRed PDF file is still significantly different from the content in original PDF file.It's better you could make screenshots to illustrate the difference between them.
                      there are differences betwwn them, please check some of them in docs and pdf files:

                      this site may help you more:
                      Code:
                      ​​​​​​https://www.persianlanguageonline.com/alphabet​
                      is it impossible to use Experiences of online services to improve Persian/Farsi OCR for foxit phantom? (i know that it's not free)
                      Attached Files

                      Comment


                      • #12
                        Radmehr ,I am sorry that I still do not quite understand the issue "problem with BOLD letters for this PDF file" that you mentioned yet since I don't know
                        Arabic and Arabic characters.Regarding this issue, could you please provide us with the following information:
                        1:For the problem with bold letters,do you mean that the image-based bold letters in the original scanned PDF file will be changed into non-bold letters in OCRed PDF file?
                        If not, please describe in details exactly about what changes were made to those bold letters in OCRed PDF file?
                        2:Please send us one original scanned PDF file that to be OCRed.
                        3:Please open the scanned PDF file mentioned in step 2 in Foxit PhantomPDF,click on "Convert>OCR>Current file" to bring up the“select an OCR engine" dialog box,please take a screenshot about this dialog box to show us your OCR settings for this PDF file in Foxit PhantomPDF.
                        4:Please send the OCRed PDF file.
                        5:Please open the original scanned PDF file and the OCRed PDF file in two Foxit PhantomPDF windows side by side to compare them intuitively (refer to this article to know how to do that),then please use the pencil tool under the "Comment"tab in Foxit PhantomPDF to mark the differences between the two PDF files.

                        Regarding your question "is it impossible to use Experiences of online services to improve Persian/Farsi OCR for foxit phantom?",I am sorry that it is not possible since our Persian/Farsi OCR plugin is inbuilt in Foxit PhantomPDF.
                        Any problem regarding OCR,please post thread on Foxit forum or write to Foxit support team.


                        Comment

                        Working...
                        X