PDF files

Top  Previous  Next

Monitor the text content of PDF files

With a special PDF-Plugin, WebSite-Watcher can extract and convert the text content of a PDF file into a HTML page. Then PDF files can be checked and handled like normal web pages and WebSite-Watcher is able to highlight changes in the text. This PDF-Plugin is assigned automatically when adding new PDF documents to the bookmark list.

 

Unfortunately it's not always possible to extract text from a PDF file. The PDF-Plugin in WebSite-Watcher uses two different solutions to extract text:

 

1.Internal conversion routines
2.Text extraction via the IFilter system
IFilter is the system used by Windows Search to search through files in proprietary formats. An IFilter for PDF files is usually installed automatically with your PDF-Reader, eg. with Adobe Reader.

 

Should one of both solutions fail, there's still a good chance that the other solution can extract text from the PDF file.

Third party tools to convert PDF files

Alternatively to the internal solution described above, WebSite-Watcher can also use external tools to convert PDF files into a web page. Here are the steps how to select a Plugin for an external conversion tool:

 

1.Open bookmark properties
2.Select the "Advanced" tab
3.Select "Plugin" on the left side
4.Click the "Select Plugin" button
5.Select one of the available PDF-Plugins

 

For that external solution it's required to install an additional PDF-Tool.

Monitor PDF files without a Plugin

If no Plugin is used to monitor PDF files, then they are handled as binary files. Dependent from the server log, one or a combination of the following checking methods is used:

 

Check the PDF document by file date
Check the PDF document by file size
Check parts of the file content

 

In that case, it's not possible to highlight text changes or to use the filter system to ignore unwanted content.




Translate document: