poi/src/documentation/content/xdocs/hslf/quick-guide.xml

<?xml version="1.0" encoding="UTF-8"?>
<!-- Copyright (C) 2004 The Apache Software Foundation. All rights reserved. -->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">

<document>
    <header>
        <title>POI-HSLF - A Quick Guide</title>
        <subtitle>Overview</subtitle>
        <authors>
            <person name="Nick Burch" email="nick at torchbox dot com"/>
        </authors>
    </header>

    <body>
        <section><title>Basic Text Extraction</title>
        <p>For basic text extraction, make use of 
<code>org.apache.poi.extractor.PowerPointExtractor</code>. It accepts a file or an input
stream. The <code>getText()</code> method can be used to get the text from the slides, and the <code>getNotes()</code> method can be used to get the text
from the notes. Finally, <code>getText(true,true)</code> will get the text
from both.
		</p>
		</section>
		
		<section><title>Specific Text Extraction</title>
		<p>To get specific bits of text, first create a <code>org.apache.poi.usermodel.SlideShow</code>
(from a <code>org.apache.poi.HSLFSlideShow</code>, which accepts a file or an input
stream). Use <code>getSlides()</code> and <code>getNotes()</code> to get the slides and notes.
These can be queried to get their page ID (though they should be returned
in the right order).</p>
		<p>You can then call <code>getTextRuns()</code> on these, to get 
their blocks of text. (One TextRun normally holds all the text in a 
given area of the page, eg in the title bar, or in a box).
From the <code>TextRun</code>, you can extract the text, and check
what type of text it is (eg Body, Title). You can allso call
<code>getRichTextRuns()</code>, which will return the 
<code>RichTextRun</code>s that make up the <code>TextRun</code>. A 
<code>RichTextRun</code> is made up of a sequence of text, all having the
same character and paragraph formatting.
		</p>
		</section>
		
        <section><title>Poor Quality Text Extraction</title>
        <p>If speed is the most important thing for you, you don't care
		about getting duplicate blocks of text, you don't care about 
		getting text from master sheets, and you don't care about getting
		old text, then 
		<code>org.apache.poi.extractor.QuickButCruddyTextExtractor</code>
		might be of use.</p>
		<p>QuickButCruddyTextExtractor doesn't use the normal record 
		parsing code, instead it uses a tree structure blind search 
		method to get all text holding records. You will get all the text,
		including lots of text you normally wouldn't ever want. However,
		you will get it back very very fast!</p>
		<p>There are two ways of getting the text back. 
		<code>getTextAsString()</code> will return a single string with all
		the text in it. <code>getTextAsVector()</code> will return a 
		vector of strings, one for each text record found in the file.
		</p>
		</section>

		<section><title>Changing Text</title>
		<p>It is possible to change the text via 
		<code>TextRun.setText(String)</code> or
		<code>RichTextRun.setText(String)</code>. It is not yet possible
		to add additional TextRuns or RichTextRuns.</p>
		<p>When calling <code>TextRun.setText(String)</code>, all
		the text will end up with the same formatting. When calling
		<code>RichTextRun.setText(String)</code>, the text will retain
		the old formatting of that <code>RichTextRun</code>.
		</p>
		</section>

		<section><title>Adding Slides</title>
		<p>You may add new slides by calling
		<code>SlideShow.createSlide()</code>, which will add a new slide
		to the end of the SlideShow. It is not currently possible to
		re-order slides, nor to add new text to slides (currently only
		adding Escher objects to new slides is supported).
		</p>
		</section>
		
		<section><title>Guide to key classes</title>
		<ul>
		<li><code>org.apache.poi.hslf.HSLFSlideShow</code>
		Handles reading in and writing out files. Calls 
		<code>org.apache.poi.hslf.record.record</code> to build a tree
		of all the records in the file, which it allows access to.
  		</li>
		<li><code>org.apache.poi.hslf.record.record</code>
		Base class of all records. Also provides the main record generation
		code, which will build up a tree of records for a file.
  		</li>
  		<li><code>org.apache.poi.hslf.usermodel.SlideShow</code>
  Builds up model entries from the records, and presents a user facing
  view of the file
  		</li>
  		<li><code>org.apache.poi.hslf.model.Slide</code>
  A user facing view of a Slide in a slidesow. Allows you to get at the 
  Text of the slide, and at any drawing objects on it.
  		</li>
  		<li><code>org.apache.poi.hslf.model.TextRun</code>
  Holds all the Text in a given area of the Slide, and will
  contain one or more <code>RichTextRun</code>s.
  		</li>
  		<li><code>org.apache.poi.hslf.usermodel.RichTextRun</code>
  Holds a run of text, all having the same character and
  paragraph stylings. It is possible to modify text, and/or text stylings.
  		</li>
  		<li><code>org.apache.poi.hslf.extractor.PowerPointExtractor</code>
  Uses the model code to allow extraction of text from files
		</li>
		<li><code>org.apache.poi.extractor.QuickButCruddyTextExtractor</code>
  Uses the record code to extract all the text from files very fast, 
  but including deleted text (and other bits of Crud).
		</li>
		</ul>
		</section>
	</body>
</document>
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`<?xml version="1.0" encoding="UTF-8"?>`
			`<!-- Copyright (C) 2004 The Apache Software Foundation. All rights reserved. -->`
			`<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">`

			`<document>`
			`<header>`
			`<title>POI-HSLF - A Quick Guide</title>`
			`<subtitle>Overview</subtitle>`
			`<authors>`
			`<person name="Nick Burch" email="nick at torchbox dot com"/>`
			`</authors>`
			`</header>`

			`<body>`
			`<section><title>Basic Text Extraction</title>`
			`<p>For basic text extraction, make use of`
			`<code>org.apache.poi.extractor.PowerPointExtractor</code>. It accepts a file or an input`
A few small updates to the HSLF useage docs, and adding some initial documentation on the PowerPoint file format git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353707 13f79535-47bb-0310-9956-ffa450edef68 2005-06-09 09:12:59 -04:00			`stream. The <code>getText()</code> method can be used to get the text from the slides, and the <code>getNotes()</code> method can be used to get the text`
			`from the notes. Finally, <code>getText(true,true)</code> will get the text`
			`from both.`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`</p>`
			`</section>`

			`<section><title>Specific Text Extraction</title>`
			`<p>To get specific bits of text, first create a <code>org.apache.poi.usermodel.SlideShow</code>`
			`(from a <code>org.apache.poi.HSLFSlideShow</code>, which accepts a file or an input`
			`stream). Use <code>getSlides()</code> and <code>getNotes()</code> to get the slides and notes.`
			`These can be queried to get their page ID (though they should be returned`
Update docs with info on new code git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@387014 13f79535-47bb-0310-9956-ffa450edef68 2006-03-19 13:43:40 -05:00			`in the right order).</p>`
			`<p>You can then call <code>getTextRuns()</code> on these, to get`
			`their blocks of text. (One TextRun normally holds all the text in a`
			`given area of the page, eg in the title bar, or in a box).`
			`From the <code>TextRun</code>, you can extract the text, and check`
			`what type of text it is (eg Body, Title). You can allso call`
			`<code>getRichTextRuns()</code>, which will return the`
			`<code>RichTextRun</code>s that make up the <code>TextRun</code>. A`
			`<code>RichTextRun</code> is made up of a sequence of text, all having the`
			`same character and paragraph formatting.`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`</p>`
			`</section>`

A few small updates to the HSLF useage docs, and adding some initial documentation on the PowerPoint file format git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353707 13f79535-47bb-0310-9956-ffa450edef68 2005-06-09 09:12:59 -04:00			`<section><title>Poor Quality Text Extraction</title>`
			`<p>If speed is the most important thing for you, you don't care`
			`about getting duplicate blocks of text, you don't care about`
			`getting text from master sheets, and you don't care about getting`
			`old text, then`
			`<code>org.apache.poi.extractor.QuickButCruddyTextExtractor</code>`
			`might be of use.</p>`
			`<p>QuickButCruddyTextExtractor doesn't use the normal record`
			`parsing code, instead it uses a tree structure blind search`
			`method to get all text holding records. You will get all the text,`
			`including lots of text you normally wouldn't ever want. However,`
			`you will get it back very very fast!</p>`
			`<p>There are two ways of getting the text back.`
			`<code>getTextAsString()</code> will return a single string with all`
			`the text in it. <code>getTextAsVector()</code> will return a`
			`vector of strings, one for each text record found in the file.`
			`</p>`
			`</section>`

documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`<section><title>Changing Text</title>`
A few small updates to the HSLF useage docs, and adding some initial documentation on the PowerPoint file format git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353707 13f79535-47bb-0310-9956-ffa450edef68 2005-06-09 09:12:59 -04:00			`<p>It is possible to change the text via`
Update docs with info on new code git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@387014 13f79535-47bb-0310-9956-ffa450edef68 2006-03-19 13:43:40 -05:00			`<code>TextRun.setText(String)</code> or`
			`<code>RichTextRun.setText(String)</code>. It is not yet possible`
			`to add additional TextRuns or RichTextRuns.</p>`
			`<p>When calling <code>TextRun.setText(String)</code>, all`
			`the text will end up with the same formatting. When calling`
			`<code>RichTextRun.setText(String)</code>, the text will retain`
			`the old formatting of that <code>RichTextRun</code>.`
			`</p>`
			`</section>`

			`<section><title>Adding Slides</title>`
			`<p>You may add new slides by calling`
			`<code>SlideShow.createSlide()</code>, which will add a new slide`
			`to the end of the SlideShow. It is not currently possible to`
			`re-order slides, nor to add new text to slides (currently only`
			`adding Escher objects to new slides is supported).`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`</p>`
			`</section>`

			`<section><title>Guide to key classes</title>`
			`<ul>`
			`<li><code>org.apache.poi.hslf.HSLFSlideShow</code>`
A few small updates to the HSLF useage docs, and adding some initial documentation on the PowerPoint file format git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353707 13f79535-47bb-0310-9956-ffa450edef68 2005-06-09 09:12:59 -04:00			`Handles reading in and writing out files. Calls`
			`<code>org.apache.poi.hslf.record.record</code> to build a tree`
			`of all the records in the file, which it allows access to.`
			`</li>`
			`<li><code>org.apache.poi.hslf.record.record</code>`
			`Base class of all records. Also provides the main record generation`
			`code, which will build up a tree of records for a file.`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`</li>`
Update docs with info on new code git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@387014 13f79535-47bb-0310-9956-ffa450edef68 2006-03-19 13:43:40 -05:00			`<li><code>org.apache.poi.hslf.usermodel.SlideShow</code>`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`Builds up model entries from the records, and presents a user facing`
			`view of the file`
Update docs with info on new code git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@387014 13f79535-47bb-0310-9956-ffa450edef68 2006-03-19 13:43:40 -05:00			`</li>`
			`<li><code>org.apache.poi.hslf.model.Slide</code>`
			`A user facing view of a Slide in a slidesow. Allows you to get at the`
			`Text of the slide, and at any drawing objects on it.`
			`</li>`
			`<li><code>org.apache.poi.hslf.model.TextRun</code>`
			`Holds all the Text in a given area of the Slide, and will`
			`contain one or more <code>RichTextRun</code>s.`
			`</li>`
			`<li><code>org.apache.poi.hslf.usermodel.RichTextRun</code>`
			`Holds a run of text, all having the same character and`
			`paragraph stylings. It is possible to modify text, and/or text stylings.`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`</li>`
			`<li><code>org.apache.poi.hslf.extractor.PowerPointExtractor</code>`
			`Uses the model code to allow extraction of text from files`
Update docs with info on new code git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@387014 13f79535-47bb-0310-9956-ffa450edef68 2006-03-19 13:43:40 -05:00			`</li>`
			`<li><code>org.apache.poi.extractor.QuickButCruddyTextExtractor</code>`
			`Uses the record code to extract all the text from files very fast,`
			`but including deleted text (and other bits of Crud).`
documentation for powerpoint support git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353702 13f79535-47bb-0310-9956-ffa450edef68 2005-05-28 15:28:22 -04:00			`</li>`
			`</ul>`
			`</section>`
			`</body>`
A few small updates to the HSLF useage docs, and adding some initial documentation on the PowerPoint file format git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353707 13f79535-47bb-0310-9956-ffa450edef68 2005-06-09 09:12:59 -04:00			`</document>`