String(3pm) User Contributed Perl Documentation String(3pm)
NAME
Unicode::String - String of Unicode characters (UTF-16BE)
SYNOPSIS
use Unicode::String qw(utf8 latin1 utf16be);
$u = utf8("string");
$u = latin1("string");
$u = utf16be("\0s\0t\0r\0i\0n\0g");
print $u->utf32be; # 4 byte characters
print $u->utf16le; # 2 byte characters + surrogates
print $u->utf8; # 1-4 byte characters
DESCRIPTION
A "Unicode::String" object represents a sequence of Unicode characters. Methods are pro-
vided to convert between various external formats (encodings) and "Unicode::String"
objects, and methods are provided for common string manipulations.
The functions utf32be(), utf32le(), utf16be(), utf16le(), utf8(), utf7(), latin1(),
uhex(), uchr() can be imported from the "Unicode::String" module and will work as con-
structors initializing strings of the corresponding encoding.
The "Unicode::String" objects overload various operators, which means that they in most
cases can be treated like plain strings.
Internally a "Unicode::String" object is represented by a string of 2 byte numbers in net-
work byte order (big-endian). This representation is not visible by the API provided, but
it might be useful to know in order to predict the efficiency of the provided methods.
METHODS
Class methods
The following class methods are available:
Unicode::String->stringify_as
Unicode::String->stringify_as( $enc )
This method is used to specify which encoding will be used when "Unicode::String"
objects are implicitly converted to and from plain strings.
If an argument is provided it sets the current encoding. The argument should have one
of the following: "ucs4", "utf32", "utf32be", "utf32le", "ucs2", "utf16", "utf16be",
"utf16le", "utf8", "utf7", "latin1" or "hex". The default is "utf8".
The stringify_as() method returns a reference to the current encoding function.
$us = Unicode::String->new
$us = Unicode::String->new( $initial_value )
This is the object constructor. Without argument, it creates an empty "Uni-
code::String" object. If an $initial_value argument is given, it is decoded according
to the specified stringify_as() encoding, UTF-8 by default.
In general it is recommended to import and use one of the encoding specific construc-
tor functions instead of invoking this method.
Encoding methods
These methods get or set the value of the "Unicode::String" object by passing strings in
the corresponding encoding. If a new value is passed as argument it will set the value of
the "Unicode::String", and the previous value is returned. If no argument is passed then
the current value is returned.
To illustrate the encodings we show how the 2 character sample string of "m" (micro meter)
is encoded for each one.
$us->utf32be
$us->utf32be( $newval )
The string passed should be in the UTF-32 encoding with bytes in big endian order.
The sample "m" is "\0\0\0\xB5\0\0\0m" in this encoding.
Alternative names for this method are utf32() and ucs4().
$us->utf32le
$us->utf32le( $newval )
The string passed should be in the UTF-32 encoding with bytes in little endian order.
The sample "m" is is "\xB5\0\0\0m\0\0
perl v5.8.8 2005-10-26 String(3pm)
Generated by $Id: phpMan.php,v 4.49 2006/02/26 13:18:18 chedong Exp $ Author: Che Dong
On Apache
Under GNU General Public License
2010-03-12 18:10 @38.107.191.82 Crawled by CCBot/1.0 (+http://www.commoncrawl.org/bot.html)