Understanding searching and document indexing

You can search for any documents archived in your repository (or any other repositories to which you have access), whether the documents are email messages or files. When you search for a document, your query is checked against an index of words that is updated each time a document is archived.

Indexing the contents of a document involves cataloging the document words to prepare them for later searching. Separators (such as punctuation) between words are ignored during indexing. Note that there is a time delay from when files are archived to when they are indexed. Documents archived less than an hour ago may or may not appear in query or search results depending on the system’s configuration.

You can search the contents of a document only if the contents have been indexed. You can search for other kinds of files only by using external identifying information.

Indexed document types

In addition to email messages, the following files are indexed:

Plain text files

Rich text files (.rtf)

HTML (HyperText Markup Language) files

Files used by the following Microsoft Office programs: Word, Excel, PowerPoint, and Access

PDF (Portable Document Format) files viewed with Adobe Acrobat Reader

Zip files

Embedded messages (RFC 822 messages)

NOTE:

Email message formatting has no bearing on indexing. Only the words you see in your email client are indexing candidates. Invisible source-code words, such as HTML markup tags, are ignored.

NOTE:

For zip files and embedded messages, the content inside the files is expanded and indexed. We support indexing of MS Office files for MS Office 2007 and prior releases.

Message MIME types (advanced users)

An email message can contain message parts of possibly different MIME (Multipurpose Internet Mail Extensions) Content-Types. The following Content-Types are indexed and each corresponds to one of the indexed document types:

text/xml

text/plain

text/html

application/rtf

application/msword

application/vnd.ms-excel

application/vnd.ms-powerpoint

application/msaccess

application/pdf

application/zip

12 IAP overview

Page 12
Image 12
HP Integrated Archive Platform manual Understanding searching and document indexing, Indexed document types