HP Integrated Archive Platform manual Understanding searching and document indexing

Page 12

Understanding searching and document indexing

You can search for any documents archived in your repository (or any other repositories to which you have access), whether the documents are email messages or ﬁles. When you search for a document, your query is checked against an index of words that is updated each time a document is archived.

Indexing the contents of a document involves cataloging the document words to prepare them for later searching. Separators (such as punctuation) between words are ignored during indexing. Note that there is a time delay from when ﬁles are archived to when they are indexed. Documents archived less than an hour ago may or may not appear in query or search results depending on the system’s conﬁguration.

You can search the contents of a document only if the contents have been indexed. You can search for other kinds of ﬁles only by using external identifying information.

Indexed document types

In addition to email messages, the following ﬁles are indexed:

•Plain text ﬁles

•Rich text ﬁles (.rtf)

•HTML (HyperText Markup Language) ﬁles

•Files used by the following Microsoft Ofﬁce programs: Word, Excel, PowerPoint, and Access

•PDF (Portable Document Format) ﬁles viewed with Adobe Acrobat Reader

•Zip ﬁles

•Embedded messages (RFC 822 messages)

NOTE:

Email message formatting has no bearing on indexing. Only the words you see in your email client are indexing candidates. Invisible source-code words, such as HTML markup tags, are ignored.

NOTE:

For zip ﬁles and embedded messages, the content inside the ﬁles is expanded and indexed. We support indexing of MS Ofﬁce ﬁles for MS Ofﬁce 2007 and prior releases.

Message MIME types (advanced users)

An email message can contain message parts of possibly different MIME (Multipurpose Internet Mail Extensions) Content-Types. The following Content-Types are indexed and each corresponds to one of the indexed document types:

•text/xml

•text/plain

•text/html

•application/rtf

•application/msword

•application/vnd.ms-excel

•application/vnd.ms-powerpoint

•application/msaccess

•application/pdf

•application/zip

12 IAP overview

Image 12

Contents HP Integrated Archive Platform User Guide Page Contents Index Figures Tables Document conventions and symbols Intended audiencePrerequisites Related documentationSubscription service HP technical supportOther web sites TIPUser Guide About this guide Understanding document archiving EAs applicationsApplication What You Can Do Understanding searching and document indexing Indexed document typesMessage Mime types advanced users Ofﬁce 2007 supported ﬁle extensions and Mime types Ofﬁce 2007 supported features Type Property Microsoft Word, PowerPoint ExcelOfﬁce 2007 supported properties Modiﬁed Forward to Using the toolbar Logging in and outUnderstanding the user interface Common tasks Search basicsCompleting simple searches IAP Web Interface tasksTask Reference Completing advanced searches Simple SearchAdvanced Search page email content type Additional advanced search query ﬁelds Query Field Matches in the DocumentFolder As path c\abc\xyzQuery Results page email content type Displaying query or search resultsQuery results navigation bar Saving query or search criteria BarsSaving query or search results Save CriteriaSending query or search results Save ResultsAccessing saved criteria Accessing saved resultsExporting query or search results Copying saved results to a quarantine repository Deleting quarantine repositoriesSearching audit log repositories To search for multiple items, use the advanced search formAdvanced Search page document content type Logged actions and descriptions Logged Action DescriptionQuery Field Matches Troubleshooting Changing your passwordTroubleshooting topics include Changing your languageProblems exporting results Unable to display saved resultsIAP Web Interface Query expressions Word charactersLetters and digits in different character sets Word characters and separatorsRegular expression deﬁnition of English word characters Letters and digits deﬁnedMatching words Supported character setsSupported character Description Set Matching similar words Matching word sequencesFuzzy words Measuring word similarityProximity word sequences Matching word sequences in attachmentsExample 1. Separators are ignored Example 2. Sequence is not intuitiveExcel spreadsheet Boolean query expressions Boolean query expressionsSyntax Matches Nested Boolean query expressions Query expression examplesFollowing are examples of query expressions Query expression examples Query expression Finds documents withQuery expression syntax and matching Index See IAP User Guide