View Full Version : amount of extracted text limits?
robix73
September 4th, 2008, 02:59 AM
Hello
has the foxit pdf filter a limit about the amount of text that can be extract from a single pdf files?. To me it seems that the limit is situated at ~150mb of text per pdf files.
Am I wrong?
AmyLin
September 8th, 2008, 08:40 PM
Hello,
Thank you for your feedback.
Foxit didn't limit the file size. I think the they are control by index service.
Most of the registry entries for Indexing Service are found under the key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Contro l\
ContentIndex.
The DefaultColumnFile entry is found under the key
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Contro l\
ContentIndexCommon.
And I think "MaxTextFilterBytes" is relate to your issue.
Could you help to go to the following link go get more informations about it:
http://msdn.microsoft.com/en-us/library/ms692119(VS.85).aspx
robix73
November 25th, 2008, 09:26 AM
Hello
as explained in another post, the limits are not from the foxit pdf filter, but they are due to sharepoint registry entries.Foxit Pdf filter is not guilty.
Regards
emily
November 25th, 2008, 06:02 PM
Hello,
This is caused by the default set of maximum document size in MOSS, the
default size is 16MB.
Could you help to try the following steps to chang the maximum document
size.
You need to add the key for MAXDOWNLOADSIZE for the MOSS search.
1. run Regedit.exe.
Start-> run->type \"regedit\", click\"ok\".
2. Locate the following registry subkey:
HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\Office Server\\12.0
\\Search\\Global\\Gathering Manager
3. Click \"edit\"->new->DWORD Value, and named it \"maxdownloadsize\"
4. Double-click, choose\"Decimal\"and type the Value data you want
5. Restart the server .start->run, type\"cmd\"->type \"iisreset\"
6. Start a full crawl before search.
Best Regards,
Emily:1_01:
robix73
November 26th, 2008, 01:18 AM
Emily you're right but that's not enough
With only your suggestion, the best results is that only 10 or maybe 11 mb of text will be indexed, no matter if maxdownloadsize is setted to 4096 (4Gb file size) or tha maxgrowfactor is setted to 16 (and the max text theoric extracted and allowed in this situation is 4096x16).
I explained on another forum all the step i've done and now my box can index over 90Mb of data from a single file.
Here you are the link
http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=2894310&SiteID=1
The registry keys involved are others.
They are:
DedicatedFilterProcessMemoryQuota
FilterProcessMemoryQuota
FolderHighPriority
CB_ChunkBufferSizeInMegaBytes
CB_MinBytesReservedForDoc
RobotThreadsNumber
And all of these modifications will affect not only the Moss behaviour versus Pdf files but Moss behaviour versus all filetype (Pdf Doc Docx txt rtf xls and so on)
I Hope that foxit pdf filter team will test these keys and that my suggestion are valueable for all the users.
Sorry for my poor english
Au revoir
vBulletin® v3.8.4, Copyright ©2000-2009, Jelsoft Enterprises Ltd.