PDF files

Top  Previous  Next

Monitor the text content of PDF files

With a special PDF-Plugin, WebSite-Watcher can extract and convert the text content of a PDF file into a HTML page. Then PDF files can be checked and handled like normal web pages and WebSite-Watcher is able to highlight changes in the text. This PDF-Plugin is assigned automatically when adding new PDF documents to the bookmark list.

 

Unfortunately it's not always possible to extract text from a PDF file. The PDF-Plugins in WebSite-Watcher use different solutions/methods to extract text. Should one method fail, there's still a good chance that the other methods can extract text from the PDF file.

Monitor PDF files without a Plugin

If no Plugin is used to monitor PDF files, then they are handled as binary files. Dependent from the server log, one or a combination of the following checking methods is used:

 

Check the PDF document by file date
Check the PDF document by file size
Check parts of the file content

 

In that case, it's not possible to highlight text changes or to use the filter system to ignore unwanted content.