Docjar: A Java Source and Docuemnt Enginecom.*    java.*    javax.*    org.*    all    new    plug-in

Quick Search    Search Deep

org.apache.xml.utils
Class FastStringBuffer  view FastStringBuffer download FastStringBuffer.java

java.lang.Object
  extended byorg.apache.xml.utils.FastStringBuffer

public class FastStringBuffer
extends java.lang.Object

Bare-bones, unsafe, fast string buffer. No thread-safety, no parameter range checking, exposed fields. Note that in typical applications, thread-safety of a StringBuffer is a somewhat dubious concept in any case.

Note that Stree and DTM used a single FastStringBuffer as a string pool, by recording start and length indices within this single buffer. This minimizes heap overhead, but of course requires more work when retrieving the data.

FastStringBuffer operates as a "chunked buffer". Doing so reduces the need to recopy existing information when an append exceeds the space available; we just allocate another chunk and flow across to it. (The array of chunks may need to grow, admittedly, but that's a much smaller object.) Some excess recopying may arise when we extract Strings which cross chunk boundaries; larger chunks make that less frequent.

The size values are parameterized, to allow tuning this code. In theory, Result Tree Fragments might want to be tuned differently from the main document's text.

%REVIEW% An experiment in self-tuning is included in the code (using nested FastStringBuffers to achieve variation in chunk sizes), but this implementation has proven to be problematic when data may be being copied from the FSB into itself. We should either re-architect that to make this safe (if possible) or remove that code and clean up for performance/maintainability reasons.


Field Summary
private static int CARRY_WS
          Manifest constant: Carry trailing whitespace of one chunk as leading whitespace of the next chunk.
(package private) static boolean DEBUG_FORCE_FIXED_CHUNKSIZE
           
(package private) static int DEBUG_FORCE_INIT_BITS
           
(package private)  char[][] m_array
          Field m_array holds the string buffer's text contents, using an array-of-arrays.
(package private)  int m_chunkBits
          Field m_chunkBits sets our chunking strategy, by saying how many bits of index can be used within a single chunk before flowing over to the next chunk.
(package private)  int m_chunkMask
          Field m_chunkMask is m_chunkSize-1 -- in other words, m_chunkBits worth of low-order '1' bits, useful for shift-and-mask addressing within the chunks.
(package private)  int m_chunkSize
          Field m_chunkSize establishes the maximum size of one chunk of the array as 2**chunkbits characters.
(package private)  int m_firstFree
          Field m_firstFree is an index into m_array[m_lastChunk][], pointing to the first character in the Chunked Array which is not part of the FastStringBuffer's current content.
(package private)  FastStringBuffer m_innerFSB
          Field m_innerFSB, when non-null, is a FastStringBuffer whose total length equals m_chunkSize, and which replaces m_array[0].
(package private)  int m_lastChunk
          Field m_lastChunk is an index into m_array[], pointing to the last chunk of the Chunked Array currently in use.
(package private)  int m_maxChunkBits
          Field m_maxChunkBits affects our chunk-growth strategy, by saying what the largest permissible chunk size is in this particular FastStringBuffer hierarchy.
(package private)  int m_rebundleBits
          Field m_rechunkBits affects our chunk-growth strategy, by saying how many chunks should be allocated at one size before we encapsulate them into the first chunk of the next size up.
(package private) static char[] SINGLE_SPACE
           
static int SUPPRESS_BOTH
          Manifest constant: Suppress both leading and trailing whitespace.
static int SUPPRESS_LEADING_WS
          Manifest constant: Suppress leading whitespace.
static int SUPPRESS_TRAILING_WS
          Manifest constant: Suppress trailing whitespace.
 
Constructor Summary
  FastStringBuffer()
          Construct a FastStringBuffer, using a default allocation policy.
private FastStringBuffer(FastStringBuffer source)
          Encapsulation c'tor.
  FastStringBuffer(int initChunkBits)
          Construct a FastStringBuffer, using default maxChunkBits and rebundleBits values.
  FastStringBuffer(int initChunkBits, int maxChunkBits)
          Construct a FastStringBuffer, using a default rebundleBits value.
  FastStringBuffer(int initChunkBits, int maxChunkBits, int rebundleBits)
          Construct a FastStringBuffer, with allocation policy as per parameters.
 
Method Summary
 void append(char value)
          Append a single character onto the FastStringBuffer, growing the storage if necessary.
 void append(char[] chars, int start, int length)
          Append part of the contents of a Character Array onto the FastStringBuffer, growing the storage if necessary.
 void append(FastStringBuffer value)
          Append the contents of another FastStringBuffer onto this FastStringBuffer, growing the storage if necessary.
 void append(java.lang.String value)
          Append the contents of a String onto the FastStringBuffer, growing the storage if necessary.
 void append(java.lang.StringBuffer value)
          Append the contents of a StringBuffer onto the FastStringBuffer, growing the storage if necessary.
 char charAt(int pos)
          Get a single character from the string buffer.
private  void getChars(int srcBegin, int srcEnd, char[] dst, int dstBegin)
          Copies characters from this string into the destination character array.
protected  java.lang.String getOneChunkString(int startChunk, int startColumn, int length)
           
 java.lang.String getString(int start, int length)
           
(package private)  java.lang.StringBuffer getString(java.lang.StringBuffer sb, int start, int length)
           
(package private)  java.lang.StringBuffer getString(java.lang.StringBuffer sb, int startChunk, int startColumn, int length)
          Internal support for toString() and getString().
 boolean isWhitespace(int start, int length)
           
 int length()
          Get the length of the list.
 void reset()
          Discard the content of the FastStringBuffer, and most of the memory that was allocated by it, restoring the initial state.
static void sendNormalizedSAXcharacters(char[] ch, int start, int length, org.xml.sax.ContentHandler handler)
          Directly normalize and dispatch the character array.
(package private) static int sendNormalizedSAXcharacters(char[] ch, int start, int length, org.xml.sax.ContentHandler handler, int edgeTreatmentFlags)
          Internal method to directly normalize and dispatch the character array.
 int sendNormalizedSAXcharacters(org.xml.sax.ContentHandler ch, int start, int length)
          Sends the specified range of characters as one or more SAX characters() events, normalizing the characters according to XSLT rules.
 void sendSAXcharacters(org.xml.sax.ContentHandler ch, int start, int length)
          Sends the specified range of characters as one or more SAX characters() events.
 void sendSAXComment(org.xml.sax.ext.LexicalHandler ch, int start, int length)
          Sends the specified range of characters as sax Comment.
 void setLength(int l)
          Directly set how much of the FastStringBuffer's storage is to be considered part of its content.
private  void setLength(int l, FastStringBuffer rootFSB)
          Subroutine for the public setLength() method.
 int size()
          Get the length of the list.
 java.lang.String toString()
          Note that this operation has been somewhat deoptimized by the shift to a chunked array, as there is no factory method to produce a String object directly from an array of arrays and hence a double copy is needed.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEBUG_FORCE_INIT_BITS

static final int DEBUG_FORCE_INIT_BITS
See Also:
Constant Field Values

DEBUG_FORCE_FIXED_CHUNKSIZE

static boolean DEBUG_FORCE_FIXED_CHUNKSIZE

SUPPRESS_LEADING_WS

public static final int SUPPRESS_LEADING_WS
Manifest constant: Suppress leading whitespace. This should be used when normalize-to-SAX is called for the first chunk of a multi-chunk output, or one following unsuppressed whitespace in a previous chunk.

See Also:
sendNormalizedSAXcharacters(char[],int,int,org.xml.sax.ContentHandler,int) 55 , Constant Field Values

SUPPRESS_TRAILING_WS

public static final int SUPPRESS_TRAILING_WS
Manifest constant: Suppress trailing whitespace. This should be used when normalize-to-SAX is called for the last chunk of a multi-chunk output; it may have to be or'ed with SUPPRESS_LEADING_WS.

See Also:
Constant Field Values

SUPPRESS_BOTH

public static final int SUPPRESS_BOTH
Manifest constant: Suppress both leading and trailing whitespace. This should be used when normalize-to-SAX is called for a complete string. (I'm not wild about the name of this one. Ideas welcome.)

See Also:
sendNormalizedSAXcharacters(char[],int,int,org.xml.sax.ContentHandler,int) 55 , Constant Field Values

CARRY_WS

private static final int CARRY_WS
Manifest constant: Carry trailing whitespace of one chunk as leading whitespace of the next chunk. Used internally; I don't see any reason to make it public right now.

See Also:
Constant Field Values

m_chunkBits

int m_chunkBits
Field m_chunkBits sets our chunking strategy, by saying how many bits of index can be used within a single chunk before flowing over to the next chunk. For example, if m_chunkbits is set to 15, each chunk can contain up to 2^15 (32K) characters


m_maxChunkBits

int m_maxChunkBits
Field m_maxChunkBits affects our chunk-growth strategy, by saying what the largest permissible chunk size is in this particular FastStringBuffer hierarchy.


m_rebundleBits

int m_rebundleBits
Field m_rechunkBits affects our chunk-growth strategy, by saying how many chunks should be allocated at one size before we encapsulate them into the first chunk of the next size up. For example, if m_rechunkBits is set to 3, then after 8 chunks at a given size we will rebundle them as the first element of a FastStringBuffer using a chunk size 8 times larger (chunkBits shifted left three bits).


m_chunkSize

int m_chunkSize
Field m_chunkSize establishes the maximum size of one chunk of the array as 2**chunkbits characters. (Which may also be the minimum size if we aren't tuning for storage)


m_chunkMask

int m_chunkMask
Field m_chunkMask is m_chunkSize-1 -- in other words, m_chunkBits worth of low-order '1' bits, useful for shift-and-mask addressing within the chunks.


m_array

char[][] m_array
Field m_array holds the string buffer's text contents, using an array-of-arrays. Note that this array, and the arrays it contains, may be reallocated when necessary in order to allow the buffer to grow; references to them should be considered to be invalidated after any append. However, the only time these arrays are directly exposed is in the sendSAXcharacters call.


m_lastChunk

int m_lastChunk
Field m_lastChunk is an index into m_array[], pointing to the last chunk of the Chunked Array currently in use. Note that additional chunks may actually be allocated, eg if the FastStringBuffer had previously been truncated or if someone issued an ensureSpace request.

The insertion point for append operations is addressed by the combination of m_lastChunk and m_firstFree.


m_firstFree

int m_firstFree
Field m_firstFree is an index into m_array[m_lastChunk][], pointing to the first character in the Chunked Array which is not part of the FastStringBuffer's current content. Since m_array[][] is zero-based, the length of that content can be calculated as (m_lastChunk<

m_innerFSB

FastStringBuffer m_innerFSB
Field m_innerFSB, when non-null, is a FastStringBuffer whose total length equals m_chunkSize, and which replaces m_array[0]. This allows building a hierarchy of FastStringBuffers, where early appends use a smaller chunkSize (for less wasted memory overhead) but later ones use a larger chunkSize (for less heap activity overhead).


SINGLE_SPACE

static final char[] SINGLE_SPACE
Constructor Detail

FastStringBuffer

public FastStringBuffer(int initChunkBits,
                        int maxChunkBits,
                        int rebundleBits)
Construct a FastStringBuffer, with allocation policy as per parameters.

For coding convenience, I've expressed both allocation sizes in terms of a number of bits. That's needed for the final size of a chunk, to permit fast and efficient shift-and-mask addressing. It's less critical for the inital size, and may be reconsidered.

An alternative would be to accept integer sizes and round to powers of two; that really doesn't seem to buy us much, if anything.


FastStringBuffer

public FastStringBuffer(int initChunkBits,
                        int maxChunkBits)
Construct a FastStringBuffer, using a default rebundleBits value. NEEDSDOC @param initChunkBits NEEDSDOC @param maxChunkBits


FastStringBuffer

public FastStringBuffer(int initChunkBits)
Construct a FastStringBuffer, using default maxChunkBits and rebundleBits values.

ISSUE: Should this call assert initial size, or fixed size? Now configured as initial, with a default for fixed.


FastStringBuffer

public FastStringBuffer()
Construct a FastStringBuffer, using a default allocation policy.


FastStringBuffer

private FastStringBuffer(FastStringBuffer source)
Encapsulation c'tor. After this is called, the source FastStringBuffer will be reset to use the new object as its m_innerFSB, and will have had its chunk size reset appropriately. IT SHOULD NEVER BE CALLED EXCEPT WHEN source.length()==1<<(source.m_chunkBits+source.m_rebundleBits) NEEDSDOC @param source

Method Detail

size

public final int size()
Get the length of the list. Synonym for length().


length

public final int length()
Get the length of the list. Synonym for size().


reset

public final void reset()
Discard the content of the FastStringBuffer, and most of the memory that was allocated by it, restoring the initial state. Note that this may eventually be different from setLength(0), which see.


setLength

public final void setLength(int l)
Directly set how much of the FastStringBuffer's storage is to be considered part of its content. This is a fast but hazardous operation. It is not protected against negative values, or values greater than the amount of storage currently available... and even if additional storage does exist, its contents are unpredictable. The only safe use for our setLength() is to truncate the FastStringBuffer to a shorter string.


setLength

private final void setLength(int l,
                             FastStringBuffer rootFSB)
Subroutine for the public setLength() method. Deals with the fact that truncation may require restoring one of the innerFSBs NEEDSDOC @param l NEEDSDOC @param rootFSB


toString

public final java.lang.String toString()
Note that this operation has been somewhat deoptimized by the shift to a chunked array, as there is no factory method to produce a String object directly from an array of arrays and hence a double copy is needed. By using ensureCapacity we hope to minimize the heap overhead of building the intermediate StringBuffer.

(It really is a pity that Java didn't design String as a final subclass of MutableString, rather than having StringBuffer be a separate hierarchy. We'd avoid a lot of double-buffering.)


append

public final void append(char value)
Append a single character onto the FastStringBuffer, growing the storage if necessary.

NOTE THAT after calling append(), previously obtained references to m_array[][] may no longer be valid.... though in fact they should be in this instance.


append

public final void append(java.lang.String value)
Append the contents of a String onto the FastStringBuffer, growing the storage if necessary.

NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.


append

public final void append(java.lang.StringBuffer value)
Append the contents of a StringBuffer onto the FastStringBuffer, growing the storage if necessary.

NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.


append

public final void append(char[] chars,
                         int start,
                         int length)
Append part of the contents of a Character Array onto the FastStringBuffer, growing the storage if necessary.

NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.


append

public final void append(FastStringBuffer value)
Append the contents of another FastStringBuffer onto this FastStringBuffer, growing the storage if necessary.

NOTE THAT after calling append(), previously obtained references to m_array[] may no longer be valid.


isWhitespace

public boolean isWhitespace(int start,
                            int length)

getString

public java.lang.String getString(int start,
                                  int length)

getOneChunkString

protected java.lang.String getOneChunkString(int startChunk,
                                             int startColumn,
                                             int length)

getString

java.lang.StringBuffer getString(java.lang.StringBuffer sb,
                                 int start,
                                 int length)

getString

java.lang.StringBuffer getString(java.lang.StringBuffer sb,
                                 int startChunk,
                                 int startColumn,
                                 int length)
Internal support for toString() and getString(). PLEASE NOTE SIGNATURE CHANGE from earlier versions; it now appends into and returns a StringBuffer supplied by the caller. This simplifies m_innerFSB support.

Note that this operation has been somewhat deoptimized by the shift to a chunked array, as there is no factory method to produce a String object directly from an array of arrays and hence a double copy is needed. By presetting length we hope to minimize the heap overhead of building the intermediate StringBuffer.

(It really is a pity that Java didn't design String as a final subclass of MutableString, rather than having StringBuffer be a separate hierarchy. We'd avoid a lot of double-buffering.)


charAt

public char charAt(int pos)
Get a single character from the string buffer.


sendSAXcharacters

public void sendSAXcharacters(org.xml.sax.ContentHandler ch,
                              int start,
                              int length)
                       throws org.xml.sax.SAXException
Sends the specified range of characters as one or more SAX characters() events. Note that the buffer reference passed to the ContentHandler may be invalidated if the FastStringBuffer is edited; it's the user's responsibility to manage access to the FastStringBuffer to prevent this problem from arising.

Note too that there is no promise that the output will be sent as a single call. As is always true in SAX, one logical string may be split across multiple blocks of memory and hence delivered as several successive events.


sendNormalizedSAXcharacters

public int sendNormalizedSAXcharacters(org.xml.sax.ContentHandler ch,
                                       int start,
                                       int length)
                                throws org.xml.sax.SAXException
Sends the specified range of characters as one or more SAX characters() events, normalizing the characters according to XSLT rules.


sendNormalizedSAXcharacters

static int sendNormalizedSAXcharacters(char[] ch,
                                       int start,
                                       int length,
                                       org.xml.sax.ContentHandler handler,
                                       int edgeTreatmentFlags)
                                throws org.xml.sax.SAXException
Internal method to directly normalize and dispatch the character array. This version is aware of the fact that it may be called several times in succession if the data is made up of multiple "chunks", and thus must actively manage the handling of leading and trailing whitespace. Note: The recursion is due to the possible recursion of inner FSBs.


sendNormalizedSAXcharacters

public static void sendNormalizedSAXcharacters(char[] ch,
                                               int start,
                                               int length,
                                               org.xml.sax.ContentHandler handler)
                                        throws org.xml.sax.SAXException
Directly normalize and dispatch the character array.


sendSAXComment

public void sendSAXComment(org.xml.sax.ext.LexicalHandler ch,
                           int start,
                           int length)
                    throws org.xml.sax.SAXException
Sends the specified range of characters as sax Comment.

Note that, unlike sendSAXcharacters, this has to be done as a single call to LexicalHandler#comment.


getChars

private void getChars(int srcBegin,
                      int srcEnd,
                      char[] dst,
                      int dstBegin)
Copies characters from this string into the destination character array.