HDF -> HWPF : HDF directory is obsolete
git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353299 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
b0604e9210
commit
8ff11e457e
@ -1,12 +0,0 @@
|
||||
<?xml version="1.0"?>
|
||||
<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../dtd/book-cocoon-v10.dtd">
|
||||
<book software="POI Project" title="HDF" copyright="@year@ POI Project">
|
||||
<menu label="Jakarta POI">
|
||||
<menu-item label="Top" href="../index.html"/>
|
||||
</menu>
|
||||
<menu label="HWPF">
|
||||
<menu-item label="Overview" href="index.html"/>
|
||||
<menu-item label="HWPF Format" href="docoverview.html"/>
|
||||
<menu-item label="HWPF Project plan" href="projectplan.html"/>
|
||||
</menu>
|
||||
</book>
|
@ -1,94 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
||||
|
||||
<document>
|
||||
<header>
|
||||
<title>HDF</title>
|
||||
<subtitle>Word file format</subtitle>
|
||||
<authors>
|
||||
<person name="S. Ryan Ackley" email="sackley@cfl.rr.com"/>
|
||||
</authors>
|
||||
</header>
|
||||
|
||||
<body>
|
||||
<section><title>The Word 97 File Format in semi-plain English</title>
|
||||
|
||||
<p>The purpose of this document is to give a brief high level overview of the
|
||||
HDF document format. This document does not go into in-depth technical
|
||||
detail and is only meant as a supplement to the Microsoft Word 97 Binary
|
||||
File Format freely available at <link href="http://wotsit.org">Wotsit.org</link>.</p>
|
||||
<p>The OLE file format is not discussed in this document. It is assumed that
|
||||
the reader has a working knowledge of the POIFS API. </p>
|
||||
|
||||
<section><title>Word file structure</title>
|
||||
<p>A Word file is made up of the document text and data structures
|
||||
containing formatting information about the text. Of course, this is a
|
||||
very simplified illustration. There are fields and macros and other
|
||||
things that have not been considered. At this stage, HDF is mainly
|
||||
concerned with formatted text.</p>
|
||||
</section>
|
||||
<section><title>Reading Word files</title>
|
||||
<p>The entry point for HDF's reading of a Word file is the File Information
|
||||
Block (FIB). This structure is the entry point for the locations and size
|
||||
of a document's text and data structures. The FIB is located at the
|
||||
beginning of the main stream.</p>
|
||||
<section><title>Text</title>
|
||||
<p>The document's text is also located in the main stream. Its starting
|
||||
location is given as FIB.fcMin and its length is given in bytes by
|
||||
FIB.ccpText. These two values are not very useful in getting the text
|
||||
because of unicode. There may be unicode text intermingled with ASCII
|
||||
text. That brings us to the piece table.</p>
|
||||
<p>The piece table is used to divide the text into non-unicode and unicode
|
||||
pieces. The size and offset are given in FIB.fcClx and FIB.lcbClx
|
||||
respectively. The piece table may contain Property Modifiers (prm).
|
||||
These are for complex(fast-saved) files and are skipped. Each text piece
|
||||
contains offsets in the main stream that contain text for that piece.
|
||||
If the piece uses unicode, the file offset is masked with a certain bit.
|
||||
Then you have to unmask the bit and divide by 2 to get the real file
|
||||
offset. </p>
|
||||
</section>
|
||||
<section><title>Text Formatting</title>
|
||||
<section><title>Stylesheet</title>
|
||||
<p>All text formatting is based on styles contained in the StyleSheet.
|
||||
The StyleSheet is a data structure containing among other things, style
|
||||
descriptions. Each style description can contain a paragraph style and
|
||||
a character style or simply a character style. Each style description
|
||||
is stored in a compressed version on file. Basically these are deltas
|
||||
from another style.</p>
|
||||
<p>Eventually, you have to chain back to the nil style which is an
|
||||
imaginary style with certain implied values.</p>
|
||||
</section>
|
||||
<section><title>Paragraph and Character styles</title>
|
||||
<p>Paragraph and Character formatting properties for a document's text are
|
||||
stored on file as deltas from some base style in the Stylesheet. The
|
||||
deltas are used to create a complete uncompressed style in memory.</p>
|
||||
<p>Uncompressed paragraph styles are represented by the Pargraph
|
||||
Properties(PAP) data structure. Uncompressed character styles are
|
||||
represented by the Character Properties(CHP) data structure. The styles
|
||||
for the document text are stored in compressed format in the
|
||||
corresponding Formatted Disk Pages (FKP). A compressed PAP is referred
|
||||
to as a PAPX and a compressed CHP is a CHPX. The FKP locations are
|
||||
stored in the bin table. There are seperate bin tables for CHPXs and
|
||||
PAPXs. The bin tables' locations and sizes are stored in the FIB.</p>
|
||||
<p>A FKP is a 512 byte OLE page. It contains the offsets of the beginning
|
||||
and end of each paragraph/character run in the main stream and the
|
||||
compressed properties for that interval. The compessed PAPX is based on
|
||||
its base style in the StyleSheet. The compressed CHPX is based on the
|
||||
enclosing paragraph's base style in the Stylesheet.</p>
|
||||
</section>
|
||||
<section><title>Uncompressing styles and other data structures</title>
|
||||
<p>All compressed properties(CHPX, PAPX, SEPX) contain a grpprl. A grpprl
|
||||
is an array of sprms. A sprm defines a delta from some base property.
|
||||
There is a table of possible sprms in the Word 97 spec. Each sprm is a
|
||||
two byte operand followed by a parameter. The parameter size depends on
|
||||
the sprm. Each sprm describes an operation that should be performed on
|
||||
the base style. After every sprm in the grpprl is performed on the base
|
||||
style you will have the style for the paragraph, character run,
|
||||
section, etc.</p>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
||||
|
@ -1,34 +0,0 @@
|
||||
<?xml version="1.0" encoding="UTF-8"?>
|
||||
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
||||
|
||||
<document>
|
||||
<header>
|
||||
<title>Jakarta POI - HDF -Java APIs with XML manipulate MS-Word</title>
|
||||
<subtitle>Overview</subtitle>
|
||||
<authors>
|
||||
<person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
|
||||
<person name="Andrew C. Oliver" email="acoliver@apache.org"/>
|
||||
<person name="Ryan Ackley" email="sackley@apache.org"/>
|
||||
</authors>
|
||||
</header>
|
||||
|
||||
<body>
|
||||
<section><title>Overview</title>
|
||||
|
||||
<p>HDF is the name of OUR port of the Microsoft Word 97(-2002) file format to
|
||||
pure Java.</p>
|
||||
<p>HDF is still in early development. It is in the
|
||||
<link href="http://cvs.apache.org/viewcvs/jakarta-poi/src/scratchpad/">scratchpad section of the
|
||||
CVS.</link> Source code in the <em>org.apache.poi.hdf.extractor</em> tree is
|
||||
legacy code. Source in the <em>org.apache.poi.hdf.model</em>
|
||||
tree is the old legacy code refactored into an object model. Check the How-To
|
||||
page for detailed examples on using HDF.
|
||||
</p>
|
||||
<p>
|
||||
We are looking for developers!!! If you are interested in helping with HDF
|
||||
familiarize yourself with the source code and just start coding. Make sure
|
||||
you read the guidelines for <link href="http://jakarta.apache.org/poi/getinvolved/index.html">
|
||||
getting involved</link></p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
@ -1,367 +0,0 @@
|
||||
<?xml version="1.0"?>
|
||||
<!-- edited with XMLSPY v5 rel. 4 U (http://www.xmlspy.com) by Ryan Ackley (Myself) -->
|
||||
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
||||
<document>
|
||||
<body>
|
||||
<p>HWPF Milestones</p>
|
||||
<table>
|
||||
<tr>
|
||||
<th>
|
||||
Milestones
|
||||
</th>
|
||||
<th>
|
||||
Target Date
|
||||
</th>
|
||||
<th>
|
||||
Owner
|
||||
</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Read in a Word document
|
||||
with minimum formatting
|
||||
(no lists, tables, footnotes,
|
||||
endnotes, headers, footers)
|
||||
and write it back out with the
|
||||
result viewable in Word
|
||||
97/2000
|
||||
</td>
|
||||
<td>
|
||||
07/11/2003
|
||||
</td>
|
||||
<td>
|
||||
Ryan
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Add support for Lists and
|
||||
Tables
|
||||
</td>
|
||||
<td>
|
||||
8/15/2003
|
||||
</td>
|
||||
<td>
|
||||
 
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
HWPF 1.0-alpha release with
|
||||
documentation and examples
|
||||
</td>
|
||||
<td>
|
||||
8/18/2003
|
||||
</td>
|
||||
<td>
|
||||
Praveen/Ryan
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Add support for Headers,
|
||||
Footers, endnotes, and
|
||||
footnotes
|
||||
</td>
|
||||
<td>
|
||||
8/31/2003
|
||||
</td>
|
||||
<td>
|
||||
?
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Add support for forms and
|
||||
mail merge
|
||||
</td>
|
||||
<td>
|
||||
September/October 2003
|
||||
</td>
|
||||
<td>
|
||||
?
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<p>HWPF Task Lists</p>
|
||||
<p>Read in a Word document with minimum formatting (no lists, tables, footnotes,
|
||||
endnotes, headers, footers) and write it back out with the result viewable in Word 97/2000</p>
|
||||
<table>
|
||||
<tr>
|
||||
<th>
|
||||
Task
|
||||
</th>
|
||||
<th>
|
||||
Target Date
|
||||
</th>
|
||||
<th>
|
||||
Owner
|
||||
</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Create classes to read and
|
||||
write low level data
|
||||
structures with test cases
|
||||
</td>
|
||||
<td>
|
||||
7/10/2003
|
||||
</td>
|
||||
<td>
|
||||
Ryan
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Create classes to read and
|
||||
write FontTable and Font
|
||||
names with test case
|
||||
</td>
|
||||
<td>
|
||||
7/10/2003
|
||||
</td>
|
||||
<td>
|
||||
Praveen
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Final test
|
||||
</td>
|
||||
<td>
|
||||
7/11/2003
|
||||
</td>
|
||||
<td>
|
||||
Ryan
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<p>Develop user friendly API so it is fun and easy to read and write word documents
|
||||
with java.</p>
|
||||
<table>
|
||||
<tr>
|
||||
<th>
|
||||
Task
|
||||
</th>
|
||||
<th>
|
||||
Target Date
|
||||
</th>
|
||||
<th>
|
||||
Owner
|
||||
</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Develop a way for SPRMS to
|
||||
be compressed and
|
||||
uncompressed
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Override CHPAbstractType
|
||||
with a concrete class that
|
||||
exposes attributes with
|
||||
human readable names
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Override PAPAbstractType
|
||||
with a concrete class that
|
||||
exposes attributes with
|
||||
human readable names
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Override SEPAbstractType
|
||||
with a concrete class that
|
||||
exposes attributes with
|
||||
human readable names
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Override DOPAbstractType
|
||||
with a concrete class that
|
||||
exposes attributes with
|
||||
human readable names
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Override TAPAbstractType
|
||||
with a concrete class that
|
||||
exposes attributes with
|
||||
human readable names
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Override TCAbstractType
|
||||
with a concrete class that
|
||||
exposes attributes with
|
||||
human readable names
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Develop a VerifyIntegrity
|
||||
class for testing so it is easy
|
||||
to determine if a Word
|
||||
Document is well-formed.
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Develop general intuitive
|
||||
API to tie everything together
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<p>Add support for lists and tables</p>
|
||||
<table>
|
||||
<tr>
|
||||
<th>
|
||||
Task
|
||||
</th>
|
||||
<th>
|
||||
Target Date
|
||||
</th>
|
||||
<th>
|
||||
Owner
|
||||
</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Add data structures for
|
||||
reading and writing list data
|
||||
with test cases.
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Add data structures for
|
||||
reading and writing tables
|
||||
with test cases.
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
<p>HWPF 1.0-alpha release with documentation and examples</p>
|
||||
<table>
|
||||
<tr>
|
||||
<th>
|
||||
Task
|
||||
</th>
|
||||
<th>
|
||||
Target Date
|
||||
</th>
|
||||
<th>
|
||||
Owner
|
||||
</th>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Document the user model
|
||||
API
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Document the low level
|
||||
classes
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>
|
||||
Come up with detailed How-To’s
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
<td>
|
||||
|
||||
</td>
|
||||
</tr>
|
||||
</table>
|
||||
</body>
|
||||
</document>
|
Loading…
Reference in New Issue
Block a user