Fortinet 5.0 Patch 6 Language support

Fortinet 682 FortiWeb 5.0 Patch 6 Administration Guide

Language support

Features such as Recursive URL Decoding, input rules, and attack signatures can detect

attacks and data leaks even when multiple languages are used as an evasion technique.

When configuring FortiWeb, regardless of the display language (see “Global web UI & CLI

settings” on page 51), the simplest case is to configure with only US-ASCII characters. All

features, including queries to external servers, support it.

If you want to configure FortiWeb using another language/encoding, or support clients using

another language or multiple languages, sometimes characters such as ñ, é, symbols, and

ideographs such as 新 are valid input. Support varies by the nature of the item being configured.

For example, by definition, host names cannot contain special characters. DNS standards

predate many standards for internationalization. Because of this, the web UI and CLI will reject

input if it contains non-ASCII encoded characters when configuring the host name. This means

that languages other than English are not supported unless encoded as an RFC 3490

international domain name (IDN) prefixed with xn--. However, other configuration items, such

as names and comments, often support the language of your choice.

To use your preferred languages in those cases, use an encoding that supports it.

For best results:

• for regular expressions that must match HTTP requests, use the same encoding as your

HTTP clients

• for other features, use UTF-8 encoding, or use only the characters whose encoded values

are the same in UTF-8 (for example, US-ASCII characters are usually encoded using the

same byte-wise values in ISO 8859-1, Windows code page 1252, Shift-JIS and others;

however, ideographs such as 新 may be garbled or interpreted as the wrong character when

viewed as another encoding)

For example, with Shift-JIS, backslashes ( \ ) could be inadvertently interpreted as yen symbols

( ¥ ) and vice versa. A regular expression intended to match HTTP requests containing money

values with a yen symbol therefore may not work if the symbol is entered using the wrong

encoding. Likewise, simplified Chinese characters might only be understandable if the page is

interpreted as GB2312. Test your expressions. If you enter a regular expression using another

encoding, or if an HTTP client sends a request in an encoding other than UTF-8, remember that

matches may not be what you initially expect.

Regular expressions are especially impacted. Matching engines on FortiWeb use the UTF-8

character values. If you need to match multiple possible languages from clients, especially for

attack signatures, make sure you construct a regular expression that matches all alternative

values.

For example, the Latin letter C is not encoded using the same byte-wise value as the

similar-looking Cyrillic letter С. A human being can read a Spanish phrase written with that

Cyrillic character, because they are visually similar. But a regular expressions will not match

unless written to match both numerical values: one for the Latin character, and one for the

HTTP clients may send requests in encodings that are not UTF-8. Encodings vary by the

client’s operating system or input language.

If you input the configuration in English, the client’s request may match regardless of encoding:

due to US-ASCII predating most other encodings, byte-wise, the values for English characters

tend to have identical numerical values in many encoding types. For example, English words

may be readable regardless of interpreting a web page as either ISO 8859-1 or as GB2312.

For other languages (especially non-Latin alphabets such as Cyrillic and Thai), match the

client’s encoding exactly.