Unicode support was introduced to Perl with Perl 5.6. Although it is still not completely adherent in the Unicode spec, Unicode support has matured significantly under Perl 5.8. You can now use Unicode reliably with file I/O and with regular expressions. With regular expressions, the pattern will adapt to the data and will automatically switch to the correct Unicode character scheme.
Perl's Unicode implementation falls into the following categories:
Strings and patterns may contain characters that have an ordinal value larger than 255.
Identifiers within a Perl program may contain Unicode alphanumeric characters.
Regular expressions match characters and not bytes.
Character classes in regular expressions match characters and not bytes.
Named Unicode properties and block ranges may be used as character classes with the \p and \P constructs.
\X matches any extended Unicode sequence.
tr// matches characters instead of bytes.
Case translation operators use the Unicode case translation tables when provided character input.
Most operators that deal with positions or lengths in a string switch to using character positions.
pack( ) and unpack( ) do not change.
Bit operators work on characters.
scalar reverse( ) reverses characters and not bytes.
Copyright © 2002 O'Reilly & Associates. All rights reserved.