|
|||||||||
| Home >> All >> gnu >> java >> [ lang overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
gnu.java.lang
Interface CharData

- public interface CharData
This contains the info about the unicode characters, that
java.lang.Character needs. It is generated automatically from
../doc/unicode/UnicodeData-4.0.0.txt and
../doc/unicode/SpecialCasing-4.0.0.txt, by some
perl scripts. These Unicode definition files can be found on the
http://www.unicode.org website.
JDK 1.5 uses Unicode version 4.0.0.
The data is stored as string constants, but Character will convert these
Strings to their respective char[] components. The fields
are stored in arrays of 17 elements each, one element per Unicode plane.
BLOCKS stores the offset of a block of 2SHIFT
characters within DATA. The DATA field, in turn, stores
information about each character in the low order bits, and an offset
into the attribute tables UPPER, LOWER,
NUM_VALUE, and DIRECTION. Notice that the
attribute tables are much smaller than 0xffff entries; as many characters
in Unicode share common attributes. Numbers that are too large to fit
into NUM_VALUE as 16 bit chars are stored in LARGENUMS and a number N is
stored in NUM_VALUE such that (-N - 3) is the offset into LARGENUMS for
the particular character. The DIRECTION table also contains a field for
detecting characters with multi-character uppercase expansions.
Next, there is a listing for TITLE exceptions (most characters
just have the same title case as upper case). Finally, there are two
tables for multi-character capitalization, UPPER_SPECIAL
which lists the characters which are special cased, and
UPPER_EXPAND, which lists their expansion.
| Field Summary | |
static java.lang.String[] |
BLOCKS
The mapping of character blocks to their location in DATA. |
static java.lang.String[] |
DATA
Information about each character. |
static java.lang.String[] |
DIRECTION
This is the attribute table for computing the directionality class of a character, as well as a marker of characters with a multi-character capitalization. |
static int[] |
LARGENUMS
The array containing the numeric values that are too large to be stored as chars in NUM_VALUE. |
static java.lang.String[] |
LOWER
This is the attribute table for computing the lowercase representation of a character. |
static java.lang.String[] |
NUM_VALUE
This is the attribute table for computing the numeric value of a character. |
static int[] |
SHIFT
The character shift amount to look up the block offset. |
static java.lang.String |
SOURCE
The Unicode definition file that was parsed to build this database. |
static java.lang.String |
TITLE
This is the listing of titlecase special cases (all other characters can use UPPER to determine their titlecase). |
static java.lang.String[] |
UPPER
This is the attribute table for computing the single-character uppercase representation of a character. |
static java.lang.String |
UPPER_EXPAND
This is the listing of special case multi-character uppercase sequences. |
static java.lang.String |
UPPER_SPECIAL
This is a listing of characters with multi-character uppercase sequences. |
| Field Detail |
SOURCE
public static final java.lang.String SOURCE
- The Unicode definition file that was parsed to build this database.
- See Also:
- Constant Field Values
SHIFT
public static final int[] SHIFT
- The character shift amount to look up the block offset. In other words,
(char) (BLOCKS.value[ch >> SHIFT[p]] + ch)is the index wherechis described inDATAifchis in Unicode planep. Note thatpis simply the integer division of ch and 0x10000.
BLOCKS
public static final java.lang.String[] BLOCKS
- The mapping of character blocks to their location in
DATA. Each entry has been adjusted so that the 16-bit sum with the desired character gives the actual index intoDATA.
LARGENUMS
public static final int[] LARGENUMS
- The array containing the numeric values that are too large to be stored as
chars in NUM_VALUE. NUM_VALUE in this case will contain a negative integer
N such that LARGENUMS[-N - 3] contains the correct numeric value.
DATA
public static final java.lang.String[] DATA
- Information about each character. The low order 5 bits form the
character type, the next bit is a flag for non-breaking spaces, and the
next bit is a flag for mirrored directionality. The high order 9 bits
form the offset into the attribute tables. Note that this limits the
number of unique character attributes to 512, which is not a problem
as of Unicode version 4.0.0, but may soon become one.
NUM_VALUE
public static final java.lang.String[] NUM_VALUE
- This is the attribute table for computing the numeric value of a
character. The value is -1 if Unicode does not define a value, -2
if the value is not a positive integer, otherwise it is the value.
Note that this is a signed value, but stored as an unsigned char
since this is a String literal.
UPPER
public static final java.lang.String[] UPPER
- This is the attribute table for computing the single-character uppercase
representation of a character. The value is the signed difference
between the character and its uppercase version. Note that this is
stored as an unsigned char since this is a String literal. When
capitalizing a String, you must first check if a multi-character uppercase
sequence exists before using this character.
LOWER
public static final java.lang.String[] LOWER
- This is the attribute table for computing the lowercase representation
of a character. The value is the signed difference between the
character and its lowercase version. Note that this is stored as an
unsigned char since this is a String literal.
DIRECTION
public static final java.lang.String[] DIRECTION
- This is the attribute table for computing the directionality class
of a character, as well as a marker of characters with a multi-character
capitalization. The direction is taken by performing a signed shift
right by 2 (where a result of -1 means an unknown direction, such as
for undefined characters). The lower 2 bits form a count of the
additional characters that will be added to a String when performing
multi-character uppercase expansion. This count is also used, along with
the offset in UPPER_SPECIAL, to determine how much of UPPER_EXPAND to use
when performing the case conversion. Note that this information is stored
as an unsigned char since this is a String literal.
TITLE
public static final java.lang.String TITLE
- This is the listing of titlecase special cases (all other characters
can use
UPPERto determine their titlecase). The listing is a sorted sequence of character pairs; converting the first character of the pair to titlecase produces the second character.- See Also:
- Constant Field Values
UPPER_SPECIAL
public static final java.lang.String UPPER_SPECIAL
- This is a listing of characters with multi-character uppercase sequences.
A character appears in this list exactly when it has a non-zero entry
in the low-order 2-bit field of DIRECTION. The listing is a sorted
sequence of pairs (hence a binary search on the even elements is an
efficient way to lookup a character). The first element of a pair is the
character with the expansion, and the second is the index into
UPPER_EXPAND where the expansion begins. Use the 2-bit field of
DIRECTION to determine where the expansion ends.
- See Also:
- Constant Field Values
UPPER_EXPAND
public static final java.lang.String UPPER_EXPAND
- This is the listing of special case multi-character uppercase sequences.
Characters listed in UPPER_SPECIAL index into this table to find their
uppercase expansion. Remember that you must also perform special-casing
on two single-character sequences in the Turkish locale, which are not
covered here in CharData.
- See Also:
- Constant Field Values
|
|||||||||
| Home >> All >> gnu >> java >> [ lang overview ] | PREV CLASS NEXT CLASS | ||||||||
SUMMARY: JAVADOC | SOURCE | DOWNLOAD | NESTED | FIELD | CONSTR | METHOD |
DETAIL: FIELD | CONSTR | METHOD | ||||||||
JAVADOC