PDA

View Full Version : 【Question】 PDFs with poor OCR'd text layer


cadyellow
December 9th, 2008, 10:45 AM
Hi,
Does IFilter's ability to find words within PDFs depend on the goodness of the OCR that was done at the time the PDF's text layer was created, i.e., the quality of the existing text layer?

I ask because I got a Canon scanner last year and planned to scan a huge # of documents, but success from the scan project is so far unrealized since it depends on being able to get good search results. That's been a complete failure--searching generally does NOT find words that are in the PDF document. I found out why when I saw the hidden text layer that got created by Canon's OCR utility and discovered that the majority of words either don't appear at all or are so misspelled that search engines could not be expected to find them. So if IFilter depends on such a poor text layer, it's not going to be able to do any better job than the search tools I've already got.

emily
December 9th, 2008, 07:30 PM
Hello,
Thank you for your interest in Foxit PDF IFilter.
Foxit PDF IFilter does not extract the text from the OCR now, but we will deal with it as a high priority.Thanks.