This class can be used to extract text from a PowerPoint file.
Can optionally also get the notes from one.
| Method from org.apache.poi.hslf.extractor.PowerPointExtractor Detail: |
public void close() throws IOException {
_hslfshow.close();
_hslfshow = null;
_show = null;
_slides = null;
}
Shuts down the underlying streams |
public String getNotes() {
return getText(false,true);
}
Fetches all the notes text from the slideshow, but not the slide text |
public String getText() {
return getText(slidesByDefault,notesByDefault);
}
Fetches all the slide text from the slideshow,
but not the notes, unless you've called
setSlidesByDefault() and setNotesByDefault()
to change this |
public String getText(boolean getSlideText,
boolean getNoteText) {
StringBuffer ret = new StringBuffer();
if(getSlideText) {
for(int i=0; i< _slides.length; i++) {
Slide slide = _slides[i];
TextRun[] runs = slide.getTextRuns();
for(int j=0; j< runs.length; j++) {
TextRun run = runs[j];
if(run != null) {
String text = run.getText();
ret.append(text);
if(! text.endsWith("\n")) {
ret.append("\n");
}
}
}
}
if(getNoteText) {
ret.append("\n");
}
}
if(getNoteText) {
// Not currently using _notes, as that can have the notes of
// master sheets in. Grab Slide list, then work from there,
// but ensure no duplicates
HashSet seenNotes = new HashSet();
for(int i=0; i< _slides.length; i++) {
Notes notes = _slides[i].getNotesSheet();
if(notes == null) { continue; }
Integer id = new Integer(notes._getSheetNumber());
if(seenNotes.contains(id)) { continue; }
seenNotes.add(id);
TextRun[] runs = notes.getTextRuns();
if(runs != null && runs.length > 0) {
for(int j=0; j< runs.length; j++) {
TextRun run = runs[j];
String text = run.getText();
ret.append(text);
if(! text.endsWith("\n")) {
ret.append("\n");
}
}
}
}
}
return ret.toString();
}
Fetches text from the slideshow, be it slide text or note text.
Because the final block of text in a TextRun normally have their
last \n stripped, we add it back |
public static void main(String[] args) throws IOException {
if(args.length < 1) {
System.err.println("Useage:");
System.err.println("\tPowerPointExtractor [-notes] < file >");
System.exit(1);
}
boolean notes = false;
String file;
if(args.length > 1) {
notes = true;
file = args[1];
} else {
file = args[0];
}
PowerPointExtractor ppe = new PowerPointExtractor(file);
System.out.println(ppe.getText(true,notes));
ppe.close();
}
Basic extractor. Returns all the text, and optionally all the notes |
public void setNotesByDefault(boolean notesByDefault) {
this.notesByDefault = notesByDefault;
}
Should a call to getText() return notes text?
Default is no |
public void setSlidesByDefault(boolean slidesByDefault) {
this.slidesByDefault = slidesByDefault;
}
Should a call to getText() return slide text?
Default is yes |