HSLF is the POI Project's pure Java implementation of the Powerpoint file format.
+HSSF provides a way to read powerpoint presentations, and extract text from it. + It also provides some (currently limited) edit capabilities. +
+The quick guide documentation provides + information on using this API. Comments and fixes gratefully accepted on the POI + dev mailing lists.
+ + +For basic text extraction, make use of
+org.apache.poi.extractor.PowerPointExtractor
. It accepts a file or an input
+stream. The getText()
method can be used to get the text from the slides,
+from the notes, or from both.
+
To get specific bits of text, first create a org.apache.poi.usermodel.SlideShow
+(from a org.apache.poi.HSLFSlideShow
, which accepts a file or an input
+stream). Use getSlides()
and getNotes()
to get the slides and notes.
+These can be queried to get their page ID (though they should be returned
+in the right order). You can also call getTextRuns()
on these, to get their
+blocks of text. From the TextRun
, you can extract the text, and check
+what type of text it is (eg Body, Title)
+
It is possible to change the text via TextRun.setText(String)
. However, if
+the length of the text is changed, things will break because PowerPoint has
+internal file references in byte offsets, which are not yet all updated when
+the size changes.
+
org.apache.poi.hslf.HSLFSlideShow
+ Handles reading in and writing out files. Generates a tree of the records
+ in the file
+ org.apache.poi.hslf.usermode.SlideShow
+ Builds up model entries from the records, and presents a user facing
+ view of the file
+ org.apache.poi.hslf.extractor.PowerPointExtractor
+ Uses the model code to allow extraction of text from files
+