HDF -> HWPF : HDF directory is obsolete

git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353299 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
Tetsuya Kitahata 2003-08-08 03:00:07 +00:00
parent b0604e9210
commit 8ff11e457e
4 changed files with 0 additions and 507 deletions

View File

@ -1,12 +0,0 @@
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../dtd/book-cocoon-v10.dtd">
<book software="POI Project" title="HDF" copyright="@year@ POI Project">
<menu label="Jakarta POI">
<menu-item label="Top" href="../index.html"/>
</menu>
<menu label="HWPF">
<menu-item label="Overview" href="index.html"/>
<menu-item label="HWPF Format" href="docoverview.html"/>
<menu-item label="HWPF Project plan" href="projectplan.html"/>
</menu>
</book>

View File

@ -1,94 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
<document>
<header>
<title>HDF</title>
<subtitle>Word file format</subtitle>
<authors>
<person name="S. Ryan Ackley" email="sackley@cfl.rr.com"/>
</authors>
</header>
<body>
<section><title>The Word 97 File Format in semi-plain English</title>
<p>The purpose of this document is to give a brief high level overview of the
HDF document format. This document does not go into in-depth technical
detail and is only meant as a supplement to the Microsoft Word 97 Binary
File Format freely available at <link href="http://wotsit.org">Wotsit.org</link>.</p>
<p>The OLE file format is not discussed in this document. It is assumed that
the reader has a working knowledge of the POIFS API. </p>
<section><title>Word file structure</title>
<p>A Word file is made up of the document text and data structures
containing formatting information about the text. Of course, this is a
very simplified illustration. There are fields and macros and other
things that have not been considered. At this stage, HDF is mainly
concerned with formatted text.</p>
</section>
<section><title>Reading Word files</title>
<p>The entry point for HDF's reading of a Word file is the File Information
Block (FIB). This structure is the entry point for the locations and size
of a document's text and data structures. The FIB is located at the
beginning of the main stream.</p>
<section><title>Text</title>
<p>The document's text is also located in the main stream. Its starting
location is given as FIB.fcMin and its length is given in bytes by
FIB.ccpText. These two values are not very useful in getting the text
because of unicode. There may be unicode text intermingled with ASCII
text. That brings us to the piece table.</p>
<p>The piece table is used to divide the text into non-unicode and unicode
pieces. The size and offset are given in FIB.fcClx and FIB.lcbClx
respectively. The piece table may contain Property Modifiers (prm).
These are for complex(fast-saved) files and are skipped. Each text piece
contains offsets in the main stream that contain text for that piece.
If the piece uses unicode, the file offset is masked with a certain bit.
Then you have to unmask the bit and divide by 2 to get the real file
offset. </p>
</section>
<section><title>Text Formatting</title>
<section><title>Stylesheet</title>
<p>All text formatting is based on styles contained in the StyleSheet.
The StyleSheet is a data structure containing among other things, style
descriptions. Each style description can contain a paragraph style and
a character style or simply a character style. Each style description
is stored in a compressed version on file. Basically these are deltas
from another style.</p>
<p>Eventually, you have to chain back to the nil style which is an
imaginary style with certain implied values.</p>
</section>
<section><title>Paragraph and Character styles</title>
<p>Paragraph and Character formatting properties for a document's text are
stored on file as deltas from some base style in the Stylesheet. The
deltas are used to create a complete uncompressed style in memory.</p>
<p>Uncompressed paragraph styles are represented by the Pargraph
Properties(PAP) data structure. Uncompressed character styles are
represented by the Character Properties(CHP) data structure. The styles
for the document text are stored in compressed format in the
corresponding Formatted Disk Pages (FKP). A compressed PAP is referred
to as a PAPX and a compressed CHP is a CHPX. The FKP locations are
stored in the bin table. There are seperate bin tables for CHPXs and
PAPXs. The bin tables' locations and sizes are stored in the FIB.</p>
<p>A FKP is a 512 byte OLE page. It contains the offsets of the beginning
and end of each paragraph/character run in the main stream and the
compressed properties for that interval. The compessed PAPX is based on
its base style in the StyleSheet. The compressed CHPX is based on the
enclosing paragraph's base style in the Stylesheet.</p>
</section>
<section><title>Uncompressing styles and other data structures</title>
<p>All compressed properties(CHPX, PAPX, SEPX) contain a grpprl. A grpprl
is an array of sprms. A sprm defines a delta from some base property.
There is a table of possible sprms in the Word 97 spec. Each sprm is a
two byte operand followed by a parameter. The parameter size depends on
the sprm. Each sprm describes an operation that should be performed on
the base style. After every sprm in the grpprl is performed on the base
style you will have the style for the paragraph, character run,
section, etc.</p>
</section>
</section>
</section>
</section>
</body>
</document>

View File

@ -1,34 +0,0 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
<document>
<header>
<title>Jakarta POI - HDF -Java APIs with XML manipulate MS-Word</title>
<subtitle>Overview</subtitle>
<authors>
<person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
<person name="Andrew C. Oliver" email="acoliver@apache.org"/>
<person name="Ryan Ackley" email="sackley@apache.org"/>
</authors>
</header>
<body>
<section><title>Overview</title>
<p>HDF is the name of OUR port of the Microsoft Word 97(-2002) file format to
pure Java.</p>
<p>HDF is still in early development. It is in the
<link href="http://cvs.apache.org/viewcvs/jakarta-poi/src/scratchpad/">scratchpad section of the
CVS.</link> Source code in the <em>org.apache.poi.hdf.extractor</em> tree is
legacy code. Source in the <em>org.apache.poi.hdf.model</em>
tree is the old legacy code refactored into an object model. Check the How-To
page for detailed examples on using HDF.
</p>
<p>
We are looking for developers!!! If you are interested in helping with HDF
familiarize yourself with the source code and just start coding. Make sure
you read the guidelines for <link href="http://jakarta.apache.org/poi/getinvolved/index.html">
getting involved</link></p>
</section>
</body>
</document>

View File

@ -1,367 +0,0 @@
<?xml version="1.0"?>
<!-- edited with XMLSPY v5 rel. 4 U (http://www.xmlspy.com) by Ryan Ackley (Myself) -->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
<document>
<body>
<p>HWPF Milestones</p>
<table>
<tr>
<th>
Milestones
</th>
<th>
Target Date
</th>
<th>
Owner
</th>
</tr>
<tr>
<td>
Read in a Word document
with minimum formatting
(no lists, tables, footnotes,
endnotes, headers, footers)
and write it back out with the
result viewable in Word
97/2000
</td>
<td>
07/11/2003
</td>
<td>
Ryan
</td>
</tr>
<tr>
<td>
Add support for Lists and
Tables
</td>
<td>
8/15/2003
</td>
<td>
&#160;
</td>
</tr>
<tr>
<td>
HWPF 1.0-alpha release with
documentation and examples
</td>
<td>
8/18/2003
</td>
<td>
Praveen/Ryan
</td>
</tr>
<tr>
<td>
Add support for Headers,
Footers, endnotes, and
footnotes
</td>
<td>
8/31/2003
</td>
<td>
?
</td>
</tr>
<tr>
<td>
Add support for forms and
mail merge
</td>
<td>
September/October 2003
</td>
<td>
?
</td>
</tr>
</table>
<p>HWPF Task Lists</p>
<p>Read in a Word document with minimum formatting (no lists, tables, footnotes,
endnotes, headers, footers) and write it back out with the result viewable in Word 97/2000</p>
<table>
<tr>
<th>
Task
</th>
<th>
Target Date
</th>
<th>
Owner
</th>
</tr>
<tr>
<td>
Create classes to read and
write low level data
structures with test cases
</td>
<td>
7/10/2003
</td>
<td>
Ryan
</td>
</tr>
<tr>
<td>
Create classes to read and
write FontTable and Font
names with test case
</td>
<td>
7/10/2003
</td>
<td>
Praveen
</td>
</tr>
<tr>
<td>
Final test
</td>
<td>
7/11/2003
</td>
<td>
Ryan
</td>
</tr>
</table>
<p>Develop user friendly API so it is fun and easy to read and write word documents
with java.</p>
<table>
<tr>
<th>
Task
</th>
<th>
Target Date
</th>
<th>
Owner
</th>
</tr>
<tr>
<td>
Develop a way for SPRMS to
be compressed and
uncompressed
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Override CHPAbstractType
with a concrete class that
exposes attributes with
human readable names
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Override PAPAbstractType
with a concrete class that
exposes attributes with
human readable names
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Override SEPAbstractType
with a concrete class that
exposes attributes with
human readable names
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Override DOPAbstractType
with a concrete class that
exposes attributes with
human readable names
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Override TAPAbstractType
with a concrete class that
exposes attributes with
human readable names
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Override TCAbstractType
with a concrete class that
exposes attributes with
human readable names
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Develop a VerifyIntegrity
class for testing so it is easy
to determine if a Word
Document is well-formed.
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Develop general intuitive
API to tie everything together
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
<p>Add support for lists and tables</p>
<table>
<tr>
<th>
Task
</th>
<th>
Target Date
</th>
<th>
Owner
</th>
</tr>
<tr>
<td>
Add data structures for
reading and writing list data
with test cases.
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Add data structures for
reading and writing tables
with test cases.
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
<p>HWPF 1.0-alpha release with documentation and examples</p>
<table>
<tr>
<th>
Task
</th>
<th>
Target Date
</th>
<th>
Owner
</th>
</tr>
<tr>
<td>
Document the user model
API
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Document the low level
classes
</td>
<td>
</td>
<td>
</td>
</tr>
<tr>
<td>
Come up with detailed How-To&#8217;s
</td>
<td>
</td>
<td>
</td>
</tr>
</table>
</body>
</document>