PDFs with poor OCR'd text layer
Hi,
Does IFilter's ability to find words within PDFs depend on the goodness of the OCR that was done at the time the PDF's text layer was created, i.e., the quality of the existing text layer?
I ask because I got a Canon scanner last year and planned to scan a huge # of documents, but success from the scan project is so far unrealized since it depends on being able to get good search results. That's been a complete failure--searching generally does NOT find words that are in the PDF document. I found out why when I saw the hidden text layer that got created by Canon's OCR utility and discovered that the majority of words either don't appear at all or are so misspelled that search engines could not be expected to find them. So if IFilter depends on such a poor text layer, it's not going to be able to do any better job than the search tools I've already got.
|