Fortinet 682 FortiWeb 5.0 Patch 6 Administration Guide
Language support
Features such as Recursive URL Decoding, input rules, and attack signatures can detect
attacks and data leaks even when multiple languages are used as an evasion technique.
When configuring FortiWeb, regardless of the display language (see “Global web UI & CLI
settings” on page 51), the simplest case is to configure with only US-ASCII characters. All
features, including queries to external servers, support it.
If you want to configure FortiWeb using another language/encoding, or support clients using
another language or multiple languages, sometimes characters such as ñ, é, symbols, and
ideographs such as are valid input. Support varies by the nature of the item being configured.
For example, by definition, host names cannot contain special characters. DNS standards
predate many standards for internationalization. Because of this, the web UI and CLI will reject
input if it contains non-ASCII encoded characters when configuring the host name. This means
that languages other than English are not supported unless encoded as an RFC 3490
international domain name (IDN) prefixed with xn--. However, other configuration items, such
as names and comments, often support the language of your choice.
To use your preferred languages in those cases, use an encoding that supports it.
For best results:
for regular expressions that must match HTTP requests, use the same encoding as your
HTTP clients
for other features, use UTF-8 encoding, or use only the characters whose encoded values
are the same in UTF-8 (for example, US-ASCII characters are usually encoded using the
same byte-wise values in ISO 8859-1, Windows code page 1252, Shift-JIS and others;
however, ideographs such as may be garbled or interpreted as the wrong character when
viewed as another encoding)
For example, with Shift-JIS, backslashes ( \ ) could be inadvertently interpreted as yen symbols
( ¥ ) and vice versa. A regular expression intended to match HTTP requests containing money
values with a yen symbol therefore may not work if the symbol is entered using the wrong
encoding. Likewise, simplified Chinese characters might only be understandable if the page is
interpreted as GB2312. Test your expressions. If you enter a regular expression using another
encoding, or if an HTTP client sends a request in an encoding other than UTF-8, remember that
matches may not be what you initially expect.
Regular expressions are especially impacted. Matching engines on FortiWeb use the UTF-8
character values. If you need to match multiple possible languages from clients, especially for
attack signatures, make sure you construct a regular expression that matches all alternative
values.
For example, the Latin letter C is not encoded using the same byte-wise value as the
similar-looking Cyrillic letter С. A human being can read a Spanish phrase written with that
Cyrillic character, because they are visually similar. But a regular expressions will not match
unless written to match both numerical values: one for the Latin character, and one for the
HTTP clients may send requests in encodings that are not UTF-8. Encodings vary by the
client’s operating system or input language.
If you input the configuration in English, the client’s request may match regardless of encoding:
due to US-ASCII predating most other encodings, byte-wise, the values for English characters
tend to have identical numerical values in many encoding types. For example, English words
may be readable regardless of interpreting a web page as either ISO 8859-1 or as GB2312.
For other languages (especially non-Latin alphabets such as Cyrillic and Thai), match the
client’s encoding exactly.