Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

gnu.java.lang
Interface CharData  view CharData download CharData.java


public interface CharData

This contains the info about the unicode characters, that java.lang.Character needs. It is generated automatically from ../doc/unicode/UnicodeData-4.0.0.txt and ../doc/unicode/SpecialCasing-4.0.0.txt, by some perl scripts. These Unicode definition files can be found on the http://www.unicode.org website. JDK 1.5 uses Unicode version 4.0.0. The data is stored as string constants, but Character will convert these Strings to their respective char[] components. The fields are stored in arrays of 17 elements each, one element per Unicode plane. BLOCKS stores the offset of a block of 2SHIFT characters within DATA. The DATA field, in turn, stores information about each character in the low order bits, and an offset into the attribute tables UPPER, LOWER, NUM_VALUE, and DIRECTION. Notice that the attribute tables are much smaller than 0xffff entries; as many characters in Unicode share common attributes. Numbers that are too large to fit into NUM_VALUE as 16 bit chars are stored in LARGENUMS and a number N is stored in NUM_VALUE such that (-N - 3) is the offset into LARGENUMS for the particular character. The DIRECTION table also contains a field for detecting characters with multi-character uppercase expansions. Next, there is a listing for TITLE exceptions (most characters just have the same title case as upper case). Finally, there are two tables for multi-character capitalization, UPPER_SPECIAL which lists the characters which are special cased, and UPPER_EXPAND, which lists their expansion.


Field Summary
static java.lang.String[] BLOCKS
          The mapping of character blocks to their location in DATA.
static java.lang.String[] DATA
          Information about each character.
static java.lang.String[] DIRECTION
          This is the attribute table for computing the directionality class of a character, as well as a marker of characters with a multi-character capitalization.
static int[] LARGENUMS
          The array containing the numeric values that are too large to be stored as chars in NUM_VALUE.
static java.lang.String[] LOWER
          This is the attribute table for computing the lowercase representation of a character.
static java.lang.String[] NUM_VALUE
          This is the attribute table for computing the numeric value of a character.
static int[] SHIFT
          The character shift amount to look up the block offset.
static java.lang.String SOURCE
          The Unicode definition file that was parsed to build this database.
static java.lang.String TITLE
          This is the listing of titlecase special cases (all other characters can use UPPER to determine their titlecase).
static java.lang.String[] UPPER
          This is the attribute table for computing the single-character uppercase representation of a character.
static java.lang.String UPPER_EXPAND
          This is the listing of special case multi-character uppercase sequences.
static java.lang.String UPPER_SPECIAL
          This is a listing of characters with multi-character uppercase sequences.
 

Field Detail

SOURCE

public static final java.lang.String SOURCE
The Unicode definition file that was parsed to build this database.

See Also:
Constant Field Values

SHIFT

public static final int[] SHIFT
The character shift amount to look up the block offset. In other words, (char) (BLOCKS.value[ch >> SHIFT[p]] + ch) is the index where ch is described in DATA if ch is in Unicode plane p. Note that p is simply the integer division of ch and 0x10000.


BLOCKS

public static final java.lang.String[] BLOCKS
The mapping of character blocks to their location in DATA. Each entry has been adjusted so that the 16-bit sum with the desired character gives the actual index into DATA.


LARGENUMS

public static final int[] LARGENUMS
The array containing the numeric values that are too large to be stored as chars in NUM_VALUE. NUM_VALUE in this case will contain a negative integer N such that LARGENUMS[-N - 3] contains the correct numeric value.


DATA

public static final java.lang.String[] DATA
Information about each character. The low order 5 bits form the character type, the next bit is a flag for non-breaking spaces, and the next bit is a flag for mirrored directionality. The high order 9 bits form the offset into the attribute tables. Note that this limits the number of unique character attributes to 512, which is not a problem as of Unicode version 4.0.0, but may soon become one.


NUM_VALUE

public static final java.lang.String[] NUM_VALUE
This is the attribute table for computing the numeric value of a character. The value is -1 if Unicode does not define a value, -2 if the value is not a positive integer, otherwise it is the value. Note that this is a signed value, but stored as an unsigned char since this is a String literal.


UPPER

public static final java.lang.String[] UPPER
This is the attribute table for computing the single-character uppercase representation of a character. The value is the signed difference between the character and its uppercase version. Note that this is stored as an unsigned char since this is a String literal. When capitalizing a String, you must first check if a multi-character uppercase sequence exists before using this character.


LOWER

public static final java.lang.String[] LOWER
This is the attribute table for computing the lowercase representation of a character. The value is the signed difference between the character and its lowercase version. Note that this is stored as an unsigned char since this is a String literal.


DIRECTION

public static final java.lang.String[] DIRECTION
This is the attribute table for computing the directionality class of a character, as well as a marker of characters with a multi-character capitalization. The direction is taken by performing a signed shift right by 2 (where a result of -1 means an unknown direction, such as for undefined characters). The lower 2 bits form a count of the additional characters that will be added to a String when performing multi-character uppercase expansion. This count is also used, along with the offset in UPPER_SPECIAL, to determine how much of UPPER_EXPAND to use when performing the case conversion. Note that this information is stored as an unsigned char since this is a String literal.


TITLE

public static final java.lang.String TITLE
This is the listing of titlecase special cases (all other characters can use UPPER to determine their titlecase). The listing is a sorted sequence of character pairs; converting the first character of the pair to titlecase produces the second character.

See Also:
Constant Field Values

UPPER_SPECIAL

public static final java.lang.String UPPER_SPECIAL
This is a listing of characters with multi-character uppercase sequences. A character appears in this list exactly when it has a non-zero entry in the low-order 2-bit field of DIRECTION. The listing is a sorted sequence of pairs (hence a binary search on the even elements is an efficient way to lookup a character). The first element of a pair is the character with the expansion, and the second is the index into UPPER_EXPAND where the expansion begins. Use the 2-bit field of DIRECTION to determine where the expansion ends.

See Also:
Constant Field Values

UPPER_EXPAND

public static final java.lang.String UPPER_EXPAND
This is the listing of special case multi-character uppercase sequences. Characters listed in UPPER_SPECIAL index into this table to find their uppercase expansion. Remember that you must also perform special-casing on two single-character sequences in the Turkish locale, which are not covered here in CharData.

See Also:
Constant Field Values