HP RISS Components Letters and digits in different character sets, Word characters and separators

Page 34

Word characters and separators

Word characters include all uppercase and lowercase letters, digits, and the following additional characters:

_ (underscore)

# (number/pound/hash sign)

& (ampersand)

All other characters are separators (except in queries, wildcards ? and *, and special query characters ~, ", -, and !).

However, && by itself is not a word. It is a Boolean operator. When combined with at least one more word character, && can be part of a word. For example, a&&b is a word.

Query analysis and document indexing are not case-sensitive. Uppercase and lowercase letters are treated the same.

Regular expression definition of English word characters

The following regular expression provides, in succinct form, a complete specification of English word characters (except for treatment of && as a non-word):

[ A-Za-z0-9_#& ]+

Letters and digits in different character sets

Topics include:

Letters and digits defined, page 34

Letters and digits in files, page 34

Letters and digits defined

All letters and digits are word characters. What RISS considers a letter or digit depends on the character set encoding used. For US ASCII encoding, letters are uppercase and lowercase English letters (A-Z, a-z). For ISO 8859-1 (Latin-1) encoding, used for Western European languages, accented letters are included. Most ideographic characters, such as those used in Asian languages, are also considered letters.

Whatever the language and encoding used for a particular document (file or email message), RISS maps encoded characters to the Unicode 2.0 standard. The Unicode 2.0 standard is then used to determine if a given character is a letter or a digit (or neither):

A letter is any Unicode character in one of the following Unicode categories: Ll (lowercase letter), Lu (uppercase letter), Lt (title case letter), Lm (modifier letter), or Lo (other letter).

A digit is any Unicode character whose Unicode name contains the word DIGIT, provided it is not in the range \u2000 (en quad = en space) through \u2FFF (ideographic description - future).

Letters and digits in files

Although all letters and digits are word characters, their treatment in files (including email message attachments) depends on the character encoding used. You can search for any words in email message bodies and headers, regardless of the encoding.

You can search for words in files (including email body, header, attachments, and indexed documents) provided the character encoding is one the following:

34 Query expression syntax and matching

Image 34
Contents HP Reference Information Storage System User Guide Version Page Contents Riss Outlook Interface IndexFigures Tables Prerequisites Intended audienceRelated documentation Document conventions Document conventions and symbolsHP technical support TIPOther web sites Subscription serviceProviding feedback About this guide RIM applications Riss and RIMUnderstanding document archiving Application What You Can DoIndexed document types Understanding searching and document indexingMessage Mime types advanced users User Guide Riss overview Understanding the user interface Using the toolbarLogging in and out User interface topics includeCommon tasks Search basicsRiss Web Interface tasks Completing simple searchesTask Reference Search using the following fields on the Advanced Search Completing advanced searchesAdditional advanced search query fields Query Field Matches in the DocumentQuery Results page email content type Displaying query or search resultsQuery results navigation bar Saving query or search criteria Saving query or search resultsSave Results Sending query or search results Exporting query or search resultsFile Download dialog box Accessing saved criteria Accessing saved resultsCopying saved results to a quarantine repository Deleting quarantine repositoriesSearching audit log repositories To search for multiple items, use the advanced search formAdvanced Search page document content type Logged Action Description Logged actions and descriptionsQuery Field Matches Problems exporting results TroubleshootingChanging your password Changing your languageFolder Options dialog box Query expressions Word charactersRegular expression definition of English word characters Letters and digits in different character setsWord characters and separators Letters and digits definedSupported character sets Matching wordsMatching similar words Supported character Description SetMeasuring word similarity Matching word sequencesFuzzy words Simple word sequencesBoolean query expressions Boolean query expressionsSyntax Matches Following are examples of query expressions Nested Boolean query expressionsQuery expression examples Query expression examplesInstalling the Outlook plug-in or OWA Setting up the Riss Outlook InterfaceSupported Outlook versions Riss Outlook user interface objects Archived email messagesRiss Search Results folder Objects DescriptionSearching for archived documents Accessing exported resultsRiss Outlook Interface tasks Cache Manager icons Using Cache ManagerDisplaying archived email attachments Icon DescriptionSetting offline cache options User account settingsRiss Outlook Interface administrator tasks Offline Cache Options dialog box Archive Options panel, Options dialog box Enabling offline cacheInformation on configuring EFS Setting host information Clearing offline cacheDisplaying the About options Riss Information dialog boxTroubleshooting Problems logging About dialog boxIndex See Mime Reference Information Storage SystemRIM
Related manuals
Manual 148 pages 28.04 Kb