more HWPF documentation

git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1155227 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Sergey Vladimirov 2011-08-09 06:47:01 +00:00
parent f121ef99a2
commit 15b45e1091

View File

@ -48,15 +48,63 @@
either have a recent SVN checkout, or a recent SVN nightly build
(including the scratchpad jar!)</p>
<p>Source in the
<em>org.apache.poi.hwpf.model</em> tree is the old legacy code refactored
into an object model. Source code in the
<em>org.apache.poi.hwpf.extractor</em> tree is a wrapper of this to
facilitate easy extraction of interesting things (eg the Text).
Source code in the <em>org.apache.poi.hdf</em> tree is the old legacy
code.
<p>
Source code in the
<em>org.apache.poi.hdf</em>
tree is the old legacy code. Source in the
<em>org.apache.poi.hwpf.model</em>
tree is the old legacy code refactored into an new object model. Those packages contains
Java representation of internal Word format structure. This code is "internal", it shall not
be used by your code. Because of backward-compatibility some API still has references to
those packages. They are subject to be deprecated and removed. Code from
<em>org.apache.poi.hwpf.usermodel</em>
package is actual public and user-friendly (as much as possible) API to access document
parts. Source code in the
<em>org.apache.poi.hwpf.extractor</em>
tree is a wrapper of this to facilitate easy extraction of interesting things (eg the Text),
and
<em>org.apache.poi.hwpf.converter</em>
package contains Word-to-HTML and Word-to-FO converters (latest can be used to generate PDF
from Word files when using with
<a href="http://xmlgraphics.apache.org/fop/">Apache FOP</a>
). Also there is a small file-structure-dumping utility in
<em>org.apache.poi.hwpf.dev</em>
package, primally for developing purposes.
</p>
<p>
The main entry point to HWPF is HWPFDocument. Currently it has a lot of references both to
internal interfaces (
<em>org.apache.poi.hwpf.model</em>
package) and public API (
<em>org.apache.poi.hwpf.usermodel</em>
) package. It is possible that it will be split into two different interfaces (like WordFile
and WordDocument) in later versions.
</p>
<p>Word document can be considered as very long single text buffer. HWPF API provides "pointers"
to document parts, like sections, paragraphs and character runs. Usually user will iterates
over main document part sections, paragraphs from sections and character runs from
paragraph. Each such interface is a pointer to document text subrange along with additional
properties (and they all extends same Range parent class). There is additional Range
implementations like Table, TableRow, TableCell, etc. Some structures like Bookmark or Field
can also provide subranges pointers.
</p>
<p>Changing file content usually requires a lot of synchronized changes in those structures like
updating property boundaries, position handlers, etc. Because of that HWPF API shall be
considered as not thread safe. In addition, there is a "one pointer" rule for changing
content. It means you should not use two different Range instances at one time. More
precisely, if you are changing file content using some range pointer, all other range
pointers except parents' ones become invalid. For example if you obtain overall range (1),
paragraph range (2) from overall range and character run range (3) from paragraph range and
change text of paragraph, character run range is now invalid and should not be used, but
overall range pointer still valid. Each time you obtaining range (pointer) new instance is
created. It means if you obtained two range pointers and changed document text using first
range pointer, second one became invalid.
</p>
</section>
<section>
<title>XWPF Patches Required!</title>