510 lines
16 KiB
XML
510 lines
16 KiB
XML
|
<?xml version="1.0" encoding="UTF-8"?>
|
||
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "document-v11.dtd">
|
||
|
|
||
|
<document>
|
||
|
<header>
|
||
|
<title>POI 1.0 Vision Document</title>
|
||
|
<authors>
|
||
|
<person name="Andrew C. Oliver" email="acoliver@apache.org"/>
|
||
|
<person name="Marcus W. Johnson" email="mjohnson@apache.org"/>
|
||
|
</authors>
|
||
|
</header>
|
||
|
|
||
|
<body>
|
||
|
|
||
|
<section><title>Preface</title>
|
||
|
<p>
|
||
|
(21-Jan-02) While this document is just full of useful project
|
||
|
introductory information and I do suggest those interested in getting
|
||
|
involved in the project read it, it is woefully out of date.
|
||
|
</p>
|
||
|
<p>
|
||
|
We deliberately allowed this document to run out of date because it
|
||
|
is a good reflection of what the original vision was for POI 1.0.
|
||
|
You'll note that some of the terminology is not used in quite the same
|
||
|
way any longer. I've made some minor corrections where reading this
|
||
|
confused me. An example: in some places this document may refer to
|
||
|
POI API instead of POIFS API. When this vision was written we had
|
||
|
an incomplete understanding of the project.
|
||
|
</p>
|
||
|
<p>
|
||
|
Lastly, the scope of the project expanded dramatically near the end
|
||
|
of the 1.0 cycle. Our vision at the time was to focus merely on the
|
||
|
Excel port (having no idea how the project would grow or be received)
|
||
|
and provide the OLE 2 Compound Document port for others to port later
|
||
|
formats. We now plan to spearhead these ports under the umbrella of
|
||
|
the POI project. So, you've been warned. Read on, but just realize
|
||
|
that we had a fuzzy view of things to come, and hindsight is 20-20.
|
||
|
</p>
|
||
|
<p>
|
||
|
If I recall major holes were: a complete understanding of the format
|
||
|
of OLE 2 Compound Document format, Excel file format, and exactly how
|
||
|
Cocoon 2 Serializers worked. (that just about covers the whole range
|
||
|
huh?)
|
||
|
</p>
|
||
|
</section>
|
||
|
|
||
|
<section><title>1. Introduction</title>
|
||
|
<section><title>1.1 Purpose of this document</title>
|
||
|
<p>
|
||
|
The purpose of this document is to
|
||
|
collect, analyze and define high-level requirements, user needs and
|
||
|
features of the HSSF Serializer for Cocoon 2 and related libraries.
|
||
|
The HSSF Serializer is a java class supporting the Serializer
|
||
|
interface from the Cocoon 2 project and outputting in a compatible
|
||
|
format of that used by the spreadsheet program Microsoft Excel '97.
|
||
|
The HSSF Serializer will be responsible for converting XML
|
||
|
spreadsheet-like documents into Excel-compatible XLS spreadsheets.
|
||
|
</p>
|
||
|
</section>
|
||
|
|
||
|
|
||
|
<section><title>1.2 Project Overview</title>
|
||
|
<p>
|
||
|
Many web apps today hit a brick wall
|
||
|
when it comes to the user request that they be able to easily
|
||
|
manipulate their reports and data extracts in the popular Microsoft
|
||
|
Excel spreadsheet format. This often causes inferior technologies to be
|
||
|
chosen for the project simply because they easily support this
|
||
|
format. This project seeks to extend existing XML, Java and Apache
|
||
|
Cocoon 2 project technologies by:
|
||
|
</p>
|
||
|
|
||
|
<ul>
|
||
|
<li>
|
||
|
providing an extensible library
|
||
|
(POIFS) which reads/writes in a compatable format to OLE 2 Compound
|
||
|
Document Format (aka Structured Storage Format) for easy
|
||
|
implementation of other document types;
|
||
|
</li>
|
||
|
<li>
|
||
|
providing a library (HSSF) for
|
||
|
manipulating spreadsheet data and outputting it in a compatible
|
||
|
format to Microsoft Excel XLS format;
|
||
|
</li>
|
||
|
<li>
|
||
|
and providing a Cocoon 2
|
||
|
Serializer (HSSFSerializer) for serializing XML documents as
|
||
|
Excel-compatible spreadsheets.
|
||
|
</li>
|
||
|
</ul>
|
||
|
|
||
|
</section>
|
||
|
</section>
|
||
|
<section><title>2. User Description</title>
|
||
|
<section><title>2.1 User/Market Demographics</title>
|
||
|
<p>
|
||
|
There are a number of enthusiastic
|
||
|
users of XML, UNIX and Java technology. Secondly, the Microsoft
|
||
|
solution for outputting Office Document formats often involves
|
||
|
actually manipulating the software as an OLE Server. This method
|
||
|
provides extremely low performance, extremely high overhead and is
|
||
|
only capable of handling one document at a time.
|
||
|
</p>
|
||
|
<ol>
|
||
|
<li>
|
||
|
Our intended audience for the HSSF
|
||
|
Serializer portion of this project are developers writing reports or
|
||
|
data extracts in XML format.
|
||
|
</li>
|
||
|
<li>
|
||
|
Our intended audience for the HSSF
|
||
|
library portion of this project is ourselves as we are developing
|
||
|
the Serializer and anyone who needs to write to Excel spreadsheets
|
||
|
in a non-XML Java environment or who has specific needs not
|
||
|
addressed by the Serializer.
|
||
|
</li>
|
||
|
<li>
|
||
|
Our intended audience for the
|
||
|
"POIFS" OLE 2 Compound Document format reader/writer is
|
||
|
ourselves as we are writing the HSSF library and secondly, anyone
|
||
|
wishing to provide other libraries for reading/writing OLE 2
|
||
|
Compound Document Format in Java.
|
||
|
</li>
|
||
|
</ol>
|
||
|
</section>
|
||
|
<section><title>2.2. User environment</title>
|
||
|
<p>
|
||
|
The users of this software shall be
|
||
|
developers in a Java environment on any Operating System or power
|
||
|
users who are capable of XML document generation/deployment.
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>2.3. Key User Needs</title>
|
||
|
<p>
|
||
|
The OLE 2 Compound Document format is
|
||
|
undocumented for all practical purposes and cryptic for all
|
||
|
impractical purposes. Developer needs in this area include
|
||
|
documentation and an easy to use library for reading and writing in
|
||
|
this format without requiring the developer to have intimate
|
||
|
knowledge of the format.
|
||
|
</p>
|
||
|
<p>
|
||
|
There is currently no good way to write
|
||
|
to Microsoft Excel documents from Java or from a non-Microsoft
|
||
|
Windows based platform for that matter. Developers need an easy to
|
||
|
use library that supports a reasonable feature set and allows
|
||
|
seperation of data from formatting/stylistic concerns.
|
||
|
</p>
|
||
|
<p>
|
||
|
There is currently no good way to
|
||
|
transform XML data to Microsoft Excel. Apache's Cocoon 2 project
|
||
|
supplies a complete framework for XML, but nothing for outputting in
|
||
|
Excel's XLS format. Developers and power users alike need a simple
|
||
|
method to output XML documents to Excel through server-side
|
||
|
processing.
|
||
|
</p>
|
||
|
|
||
|
|
||
|
</section>
|
||
|
<section><title>2.4. Alternatives and Competition</title>
|
||
|
<p>
|
||
|
Originally there weren't any decent <link href="../hssf/alternatives.html">alternatives</link> for reading or writing
|
||
|
to Excel. This has changed somewhat.
|
||
|
</p>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section><title>3. Project Overview</title>
|
||
|
<section><title>3.1. Project Perspective</title>
|
||
|
<p>
|
||
|
The produced code shall be licensed by
|
||
|
the Apache License as used by the Cocoon 2 project and maintained on
|
||
|
a project page until such time as the Cocoon 2 developers accept it
|
||
|
as a donation (at which time the copyright will be turned over to
|
||
|
them).
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>3.2. Project Position Statement</title>
|
||
|
<p>
|
||
|
For developers on a Java and/or XML
|
||
|
environment this project will provide all the tools necessary for
|
||
|
outputting XML data in the Microsoft Excel format. This project seeks
|
||
|
to make the use of Microsoft Windows based servers unnecessary for
|
||
|
file format considerations and to fully document the OLE 2 Compound
|
||
|
Document format. The project aims not only to provide the tools for
|
||
|
serializing XML to Excel's file format and the tools for writing to
|
||
|
that file format from Java, but also to provide the tools for later
|
||
|
projects to convert other OLE 2 Compound Document formats to pure
|
||
|
Java APIs.
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>3.3. Summary of Capabilities</title>
|
||
|
<p>
|
||
|
HSSF Serializer for Apache Cocoon 2
|
||
|
</p>
|
||
|
<table>
|
||
|
<tr>
|
||
|
<td>
|
||
|
Benefit
|
||
|
</td>
|
||
|
<td>
|
||
|
Supporting Features
|
||
|
</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>
|
||
|
Standard XML tag language for sheet data
|
||
|
</td>
|
||
|
<td>
|
||
|
Serializer will transform documents utilizing a defined tag
|
||
|
language
|
||
|
</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>
|
||
|
Utilize XML to output in Excel
|
||
|
</td>
|
||
|
<td>
|
||
|
Serializer will output in Excel
|
||
|
</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>
|
||
|
Java API to output in Excel on any platform
|
||
|
</td>
|
||
|
<td>
|
||
|
The project will develop an API that outputs in Excel using
|
||
|
pure Java.
|
||
|
</td>
|
||
|
</tr>
|
||
|
<tr>
|
||
|
<td>
|
||
|
Make it easy for developers to port other OLE 2 Compound
|
||
|
Document-based formats to Java.
|
||
|
</td>
|
||
|
<td>
|
||
|
The POIFS library will contain both a high-level abstraction
|
||
|
along with low-level constructs. The project will fully document
|
||
|
the OLE 2 Compound Document Format.
|
||
|
</td>
|
||
|
</tr>
|
||
|
</table>
|
||
|
</section>
|
||
|
<section><title>3.4. Assumptions and Dependencies</title>
|
||
|
<ul>
|
||
|
<li>
|
||
|
The HSSF Serializer will run on
|
||
|
any Java 2 supporting platform with Apache Cocoon 2 installed along
|
||
|
with the HSSF and POIFS APIs.
|
||
|
</li>
|
||
|
<li>
|
||
|
The HSSF API requires a Java 2
|
||
|
implementation and the POI API.
|
||
|
</li>
|
||
|
<li>
|
||
|
The POIFS API requires a Java 2
|
||
|
implementation.
|
||
|
</li>
|
||
|
</ul>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section><title>4. Project Features</title>
|
||
|
<p>
|
||
|
The POIFS API will include:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>
|
||
|
Low level structures representing
|
||
|
the structures in a POI filesystems.
|
||
|
</li>
|
||
|
<li>
|
||
|
A low-level API for
|
||
|
creating/manipulating POI filesystems.
|
||
|
</li>
|
||
|
<li>
|
||
|
A set of high level interfaces
|
||
|
abstracting the user from the POI filesystem constructs and
|
||
|
representing it as a standard filesystem (Files, directories, etc)
|
||
|
</li>
|
||
|
</ul>
|
||
|
<p>
|
||
|
The HSSF API will include:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>
|
||
|
Low level structures representing
|
||
|
the structures in an Excel file.
|
||
|
</li>
|
||
|
<li>
|
||
|
A low-level API for creating and
|
||
|
manipulating Excel files and writing them into POI filesystems.
|
||
|
</li>
|
||
|
<li>
|
||
|
A high level model and style
|
||
|
interface for manipulating spreadsheet data without knowing anything
|
||
|
about the Excel format itself.
|
||
|
</li>
|
||
|
</ul>
|
||
|
<section><title>4.1 POI Filesystem API</title>
|
||
|
<p>
|
||
|
The POI Filesystem API includes:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>An implementation of Big Blocks</li>
|
||
|
<li>An implementation of Small Blocks</li>
|
||
|
<li>An implementation of Header Blocks</li>
|
||
|
<li>An implementation of Block Allocation Tables</li>
|
||
|
<li>An implementation of Property Sets</li>
|
||
|
<li>An implementation of the POI
|
||
|
filesystem including functions to get and set the above constructs;
|
||
|
compound functions for reading/writing files/directories.
|
||
|
</li>
|
||
|
<li>An abstraction of the POI
|
||
|
filesystem providing interfaces representing Files, Directories,
|
||
|
FileSystems in normal terminology and encapulating the above
|
||
|
constructs.
|
||
|
</li>
|
||
|
<li>Full documentation of the POI file
|
||
|
format.
|
||
|
</li>
|
||
|
<li>Full documentation of the APIs and
|
||
|
interfaces provided through Javadoc, user documentation (aimed at
|
||
|
developers using the APIs)
|
||
|
</li>
|
||
|
<li>Examples aimed at teaching the
|
||
|
user to write code using POI. (titled: recipes for POI)
|
||
|
</li>
|
||
|
<li>Performance specifications.
|
||
|
(Example POI filesystems rated by some measure of complexity along
|
||
|
with system specifications and execution times for given operations)
|
||
|
</li>
|
||
|
</ul>
|
||
|
</section>
|
||
|
<section><title>4.2 HSSF API</title>
|
||
|
<p>
|
||
|
The HSSF API includes:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>An implementation of Record
|
||
|
(binary 2 byte type followed by 2 byte size (n) followed by n bytes)</li>
|
||
|
<li>Implementations of many standard
|
||
|
record types mapping the data bytes to fields along with methods to
|
||
|
reserialize those fields</li>
|
||
|
<li>An implementation of the HSSF File
|
||
|
including functions to get/set the above constructs, create a blank
|
||
|
file with the minimum required record types and mappings between
|
||
|
getting/setting data and style in a workbook to the creation of
|
||
|
record types, and read HSSF files.</li>
|
||
|
<li>An abstraction of the HSSF file
|
||
|
format providing interfaces representing the HSSF File, HSSF
|
||
|
Workbook, HSSF Sheet, HSSF Column, HSSF Formulas in a manner
|
||
|
seperating the data from the styling and encapsulating the above
|
||
|
constructs.</li>
|
||
|
<li>Full documentation of the HSSF
|
||
|
file format (which will be a subset of the Excel '97 File format).
|
||
|
This must be done with care for legal reasons.</li>
|
||
|
<li>Full documentation of the APIs and
|
||
|
interfaces provided through Javadoc, user documentation (aimed at
|
||
|
developers using the apis).</li>
|
||
|
<li>Examples aimed at teaching
|
||
|
developers to use the APIs.
|
||
|
</li>
|
||
|
<li>Performance specifications.
|
||
|
(Example files rated by some measure of complexity along with system
|
||
|
specifications and execution times for given operations - possibly
|
||
|
the same files used for POI's tests)</li>
|
||
|
</ul>
|
||
|
</section>
|
||
|
<section><title>4.3 HSSF Serializer</title>
|
||
|
<p>
|
||
|
The HSSF Serializer subproject:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>A class supporting the Cocoon 2
|
||
|
Serializer Interface.</li>
|
||
|
<li>An interface between the SAX
|
||
|
events and the HSSF APIs.</li>
|
||
|
<li>A specified tag language for using
|
||
|
with the Serializer.</li>
|
||
|
<li>Documentation on the tag language
|
||
|
for the HSSF Serializer</li>
|
||
|
<li>Normal javadocs.</li>
|
||
|
<li>Example XML files</li>
|
||
|
<li>Performance specifications.
|
||
|
(Example XML docs and stylesheets rated by some measure of
|
||
|
complexity along with system specifications and execution times)</li>
|
||
|
</ul>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section><title>5. Other Product Requirements</title>
|
||
|
<section><title>5.1. Applicable Standards</title>
|
||
|
<p>
|
||
|
All Java code will be 100% pure Java.
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>5.2. System Requirements</title>
|
||
|
<p>
|
||
|
The minimum system requirements for POIFS are:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>64 Mbytes memory</li>
|
||
|
<li>Java 2 environment</li>
|
||
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||
|
</ul>
|
||
|
<p>
|
||
|
The minimum system requirements for HSSF are:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>64 Mbytes memory</li>
|
||
|
<li>Java 2 environment</li>
|
||
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||
|
<li>POIFS API</li>
|
||
|
</ul>
|
||
|
<p>
|
||
|
The minimum system requirements for the HSSF Serializer are:
|
||
|
</p>
|
||
|
<ul>
|
||
|
<li>64 Mbytes memory</li>
|
||
|
<li>Java 2 environment</li>
|
||
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
||
|
<li>Cocoon 2</li>
|
||
|
<li>HSSF API</li>
|
||
|
<li>POI API</li>
|
||
|
</ul>
|
||
|
</section>
|
||
|
<section><title>5.3. Performance Requirements</title>
|
||
|
<p>
|
||
|
All components must perform well enough
|
||
|
to be practical for use in a webserver environment (especially
|
||
|
Cocoon2/Tomcat/Apache combo)
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>5.4. Environmental Requirements</title>
|
||
|
<p>
|
||
|
The software will run primarily in
|
||
|
developer environments. We should make some allowances for
|
||
|
not-highly-technical users to write XML documents for the HSSF
|
||
|
Serializer. All other components will assume intermediate Java 2
|
||
|
knowledge. No XML knowledge will be required except for using the
|
||
|
HSSF Serializer. As much documentation as is practical shall be
|
||
|
required for all components as XML is relatively new, and the
|
||
|
concepts introduced for writing spreadsheets and to POI filesystems
|
||
|
will be brand new to Java and many Java developers.
|
||
|
</p>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section><title>6. Documentation Requirements</title>
|
||
|
<section><title>6.1 POI Filesystem</title>
|
||
|
<p>
|
||
|
The filesystem as read and written by
|
||
|
POI shall be fully documented and explained so that the average Java
|
||
|
developer can understand it.
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>6.2. POI API</title>
|
||
|
<p>
|
||
|
The POI API will be fully documented
|
||
|
through Javadoc. A walkthrough of using the high level POI API shall
|
||
|
be provided. No documentation outside of the Javadoc shall be
|
||
|
provided for the low-level POI APIs.
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>6.3. HSSF File Format</title>
|
||
|
<p>
|
||
|
The HSSF File Format as implemented by
|
||
|
the HSSF API will be fully documented. No documentation will be
|
||
|
provided for features that are not supported by HSSF API that are
|
||
|
supported by the Excel 97 File Format. Care will be taken not to
|
||
|
infringe on any "legal stuff".
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>6.4. HSSF API</title>
|
||
|
<p>
|
||
|
The HSSF API will be documented by
|
||
|
javadoc. A walkthrough of using the high level HSSF API shall be
|
||
|
provided. No documentation outside of the Javadoc shall be provided
|
||
|
for the low level HSSF APIs.
|
||
|
</p>
|
||
|
</section>
|
||
|
|
||
|
<section><title>6.5. HSSF Serializer</title>
|
||
|
<p>
|
||
|
The HSSF Serializer will be documented
|
||
|
by javadoc.
|
||
|
</p>
|
||
|
</section>
|
||
|
|
||
|
<section><title>6.6 HSSF Serializer Tag language</title>
|
||
|
<p>
|
||
|
The XML tag language along with
|
||
|
function and usage shall be fully documented. Examples will be
|
||
|
provided as well.
|
||
|
</p>
|
||
|
</section>
|
||
|
</section>
|
||
|
<section><title>7. Terminology</title>
|
||
|
<section><title>7.1 Filesystem</title>
|
||
|
<p>
|
||
|
filesystem shall refer only to the POI formatted archive.
|
||
|
</p>
|
||
|
</section>
|
||
|
<section><title>7.2 File</title>
|
||
|
<p>
|
||
|
file shall refer to the embedded data stream within a
|
||
|
POI filesystem. This will be the actual embedded document.
|
||
|
</p>
|
||
|
</section>
|
||
|
</section>
|
||
|
</body>
|
||
|
</document>
|