HP Integrated Archive Platform manual Understanding searching and document indexing

Page 12

Understanding searching and document indexing

You can search for any documents archived in your repository (or any other repositories to which you have access), whether the documents are email messages or files. When you search for a document, your query is checked against an index of words that is updated each time a document is archived.

Indexing the contents of a document involves cataloging the document words to prepare them for later searching. Separators (such as punctuation) between words are ignored during indexing. Note that there is a time delay from when files are archived to when they are indexed. Documents archived less than an hour ago may or may not appear in query or search results depending on the system’s configuration.

You can search the contents of a document only if the contents have been indexed. You can search for other kinds of files only by using external identifying information.

Indexed document types

In addition to email messages, the following files are indexed:

Plain text files

Rich text files (.rtf)

HTML (HyperText Markup Language) files

Files used by the following Microsoft Office programs: Word, Excel, PowerPoint, and Access

PDF (Portable Document Format) files viewed with Adobe Acrobat Reader

Zip files

Embedded messages (RFC 822 messages)

NOTE:

Email message formatting has no bearing on indexing. Only the words you see in your email client are indexing candidates. Invisible source-code words, such as HTML markup tags, are ignored.

NOTE:

For zip files and embedded messages, the content inside the files is expanded and indexed. We support indexing of MS Office files for MS Office 2007 and prior releases.

Message MIME types (advanced users)

An email message can contain message parts of possibly different MIME (Multipurpose Internet Mail Extensions) Content-Types. The following Content-Types are indexed and each corresponds to one of the indexed document types:

text/xml

text/plain

text/html

application/rtf

application/msword

application/vnd.ms-excel

application/vnd.ms-powerpoint

application/msaccess

application/pdf

application/zip

12 IAP overview

Image 12
Contents HP Integrated Archive Platform User Guide Page Contents Index Figures Tables Document conventions and symbols Intended audiencePrerequisites Related documentationSubscription service HP technical supportOther web sites TIPUser Guide About this guide Understanding document archiving EAs applicationsApplication What You Can Do Understanding searching and document indexing Indexed document typesMessage Mime types advanced users Office 2007 supported file extensions and Mime types Office 2007 supported features Type Property Microsoft Word, PowerPoint ExcelOffice 2007 supported properties Modified Forward to Using the toolbar Logging in and outUnderstanding the user interface Common tasks Search basicsCompleting simple searches IAP Web Interface tasksTask Reference Completing advanced searches Simple SearchAdvanced Search page email content type Additional advanced search query fields Query Field Matches in the DocumentFolder As path c\abc\xyzQuery Results page email content type Displaying query or search resultsQuery results navigation bar Saving query or search criteria BarsSaving query or search results Save CriteriaSending query or search results Save ResultsAccessing saved criteria Accessing saved resultsExporting query or search results Copying saved results to a quarantine repository Deleting quarantine repositoriesSearching audit log repositories To search for multiple items, use the advanced search formAdvanced Search page document content type Logged actions and descriptions Logged Action DescriptionQuery Field Matches Troubleshooting Changing your passwordTroubleshooting topics include Changing your languageProblems exporting results Unable to display saved resultsIAP Web Interface Query expressions Word charactersLetters and digits in different character sets Word characters and separatorsRegular expression definition of English word characters Letters and digits definedMatching words Supported character setsSupported character Description Set Matching similar words Matching word sequencesFuzzy words Measuring word similarityProximity word sequences Matching word sequences in attachmentsExample 1. Separators are ignored Example 2. Sequence is not intuitiveExcel spreadsheet Boolean query expressions Boolean query expressionsSyntax Matches Nested Boolean query expressions Query expression examplesFollowing are examples of query expressions Query expression examples Query expression Finds documents withQuery expression syntax and matching Index See IAP User Guide