Save This Page
Home » poi-src-3.2-FINAL-20081019 » org.apache » poi » hwpf » extractor » [javadoc | source]
org.apache.poi.hwpf.extractor
public class: WordExtractor [javadoc | source]
java.lang.Object
   org.apache.poi.POITextExtractor
      org.apache.poi.POIOLE2TextExtractor
         org.apache.poi.hwpf.extractor.WordExtractor
Class to extract the text from a Word Document. You should use either getParagraphText() or getText() unless you have a strong reason otherwise.
Fields inherited from org.apache.poi.POITextExtractor:
document
Constructor:
 public WordExtractor(InputStream is) throws IOException 
    Create a new Word Extractor
    Parameters:
    is - InputStream containing the word file
 public WordExtractor(POIFSFileSystem fs) throws IOException 
    Create a new Word Extractor
    Parameters:
    fs - POIFSFileSystem containing the word file
 public WordExtractor(HWPFDocument doc) throws IOException 
    Create a new Word Extractor
    Parameters:
    doc - The HWPFDocument to extract from
Method from org.apache.poi.hwpf.extractor.WordExtractor Summary:
getParagraphText,   getText,   getTextFromPieces,   main
Methods from org.apache.poi.POIOLE2TextExtractor:
getDocSummaryInformation,   getSummaryInformation
Methods from org.apache.poi.POITextExtractor:
getText
Methods from java.lang.Object:
equals,   getClass,   hashCode,   notify,   notifyAll,   toString,   wait,   wait,   wait
Method from org.apache.poi.hwpf.extractor.WordExtractor Detail:
 public String[] getParagraphText() 
    Get the text from the word file, as an array with one String per paragraph
 public String getText() 
    Grab the text, based on the paragraphs. Shouldn't include any crud, but slightly slower than getTextFromPieces().
 public String getTextFromPieces() 
    Grab the text out of the text pieces. Might also include various bits of crud, but will work in cases where the text piece -> paragraph mapping is broken. Fast too.
 public static  void main(String[] args) throws IOException 
    Command line extractor, so people will stop moaning that they can't just run this.