f2d371df00
git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@685263 13f79535-47bb-0310-9956-ffa450edef68
89 lines
3.8 KiB
XML
89 lines
3.8 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
====================================================================
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
====================================================================
|
|
-->
|
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
|
|
|
<document>
|
|
<header>
|
|
<title>POI-HWPF - A Quick Guide</title>
|
|
<subtitle>Overview</subtitle>
|
|
<authors>
|
|
<person name="Nick Burch" email="nick at torchbox dot com"/>
|
|
</authors>
|
|
</header>
|
|
|
|
<body>
|
|
<p>HWPF is still in early development. It is in the <link
|
|
href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">
|
|
scratchpad section of the SVN.</link> You will need to ensure you
|
|
either have a recent SVN checkout, or a recent SVN nightly build
|
|
(including the scratchpad jar!)</p>
|
|
|
|
<section><title>Basic Text Extraction</title>
|
|
<p>For basic text extraction, make use of
|
|
<code>org.apache.poi.hwpf.extractor.WordExtractor</code>. It accepts an input
|
|
stream or a <code>HWPFDocument</code>. The <code>getText()</code>
|
|
method can be used to
|
|
get the text from all the paragraphs, or <code>getParagraphText()</code>
|
|
can be used to fetch the text from each paragraph in turn. The other
|
|
option is <code>getTextFromPieces()</code>, which is very fast, but
|
|
tends to return things that aren't text from the page. YMMV.
|
|
</p>
|
|
</section>
|
|
|
|
<section><title>Specific Text Extraction</title>
|
|
<p>To get specific bits of text, first create a
|
|
<code>org.apache.poi.hwpf.HWPFDocument</code>. Fetch the range
|
|
with <code>getRange()</code>, then get paragraphs from that. You
|
|
can then get text and other properties.
|
|
</p>
|
|
</section>
|
|
|
|
<section><title>Headers and Footers</title>
|
|
<p>To get at the headers and footers of a word document, first create a
|
|
<code>org.apache.poi.hwpf.HWPFDocument</code>. Next, you need to create a
|
|
<code>org.apache.poi.hwpf.usermodel.HeaderStores</code>, passing it your
|
|
HWPFDocument. Finally, the HeaderStores gives you access to the headers and
|
|
footers, including first / even / odd page ones if defined in your
|
|
document. Additionally, HeaderStores provides a method for removing
|
|
any macros in the text, which is helpful as many headers and footers
|
|
do end up with macros in them.</p>
|
|
</section>
|
|
|
|
<section><title>Changing Text</title>
|
|
<p>It is possible to change the text via
|
|
<code>insertBefore()</code> and <code>insertAfter()</code>
|
|
on a <code>Range</code> object (either a <code>Range</code>,
|
|
<code>Paragraph</code> or <code>CharacterRun</code>).
|
|
It is also possible to delete a <code>Range</code>.
|
|
This code will work in many, but not all cases, and patches to
|
|
improve it are gratefully received!
|
|
</p>
|
|
</section>
|
|
|
|
<section><title>Further Examples</title>
|
|
<p>For now, the best source of additional examples is in the unit
|
|
tests. <link
|
|
href="http://svn.apache.org/viewvc/poi/trunk/src/scratchpad/testcases/org/apache/poi/hwpf/">
|
|
Browse the HWPF unit tests.</link>
|
|
</p>
|
|
</section>
|
|
</body>
|
|
</document>
|