7397304eed
git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@543466 13f79535-47bb-0310-9956-ffa450edef68
595 lines
22 KiB
XML
595 lines
22 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
====================================================================
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
====================================================================
|
|
-->
|
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "document-v11.dtd">
|
|
|
|
<document>
|
|
<header>
|
|
<title>POI 2.0 Vision Document</title>
|
|
<authors>
|
|
<person name="Andrew C. Oliver" email="acoliver2@users.sourceforge.net"/>
|
|
<person name="Marcus W. Johnson" email="mjohnson@apache.org"/>
|
|
<person name="Glen Stampoultzis" email="user@poi.apache.org"/>
|
|
<person name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
|
|
</authors>
|
|
</header>
|
|
|
|
<body>
|
|
|
|
<section><title>Preface</title>
|
|
<p>
|
|
This is the POI 2.0 cycle vision document. Although the vision
|
|
has not changed and this document is certainly not out of date and
|
|
the vision has not changed, the structure of the project has
|
|
changed a bit. We're not going to change the vision document to
|
|
reflect this (however proper that may be) because it would only
|
|
involve deletion. There is no purpose in providing less
|
|
information provided we give clarification.
|
|
</p>
|
|
<p>
|
|
This document was created before the POI components for
|
|
<link href="http://xml.apache.org/cocoon">Apache Cocoon</link>
|
|
were accepted into the Cocoon project itself. It was also
|
|
written before POI was accepted into Jakarta. So while the
|
|
vision hasn't changed some of the components are actually now
|
|
part of other projects. We'll still be working on them on the
|
|
same timeline roughly (minus the overhead of coordination with
|
|
other groups), but they are no longer technically part of the
|
|
POI project itself.
|
|
</p>
|
|
</section>
|
|
|
|
<section><title>1. Introduction</title>
|
|
<section><title>1.1 Purpose of this document</title>
|
|
<p>
|
|
The purpose of this document is to
|
|
collect, analyze and define high-level requirements, user needs,
|
|
and features of the second release of the POI project software.
|
|
The POI project currently consists of the following components:
|
|
the HSSF Serializer, the HSSF library and the POIFS library.
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
The HSSF Serializer is a set of Java classes whose main
|
|
class supports the Serializer interface from the Cocoon
|
|
2 project and outputs the serialized data in a format
|
|
compatible with the spreadsheet program Microsoft Excel
|
|
'97.
|
|
</li>
|
|
<li>
|
|
The HSSF library is a set of classes for reading and
|
|
writing Microsoft Excel 97 file format using pure Java.
|
|
</li>
|
|
<li>
|
|
The POIFS library is a set of classes for reading and
|
|
writing Microsoft's OLE 2 Compound Document format using
|
|
pure Java.
|
|
</li>
|
|
</ul>
|
|
<p>By the completion of this release cycle the POI project will also
|
|
include the HSSF Generator and the HWPF library.
|
|
</p>
|
|
<ul>
|
|
<li>The HSSF Generator will be responsible for using HSSF to read
|
|
in the XLS (Excel 97) file format and create SAX events. The HSSF
|
|
Generator will support the applicable interfaces specified by the
|
|
Apache Cocoon 2 project.
|
|
</li>
|
|
<li>The HWPF library will provide a set of high level interfaces
|
|
for reading and writing Microsoft Word 97 file format using pure
|
|
Java.</li>
|
|
</ul>
|
|
|
|
</section>
|
|
|
|
|
|
<section><title>1.2 Project Overview</title>
|
|
<p>
|
|
The first release of the POI project
|
|
was an astounding success. This release seeks to build on that
|
|
success by:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
Refactoring POIFS into imput and
|
|
output classes as well as an event-driven API for reading.
|
|
</li>
|
|
<li>
|
|
Refactor HSSF for greater
|
|
performance as well as an event-driven API for reading
|
|
</li>
|
|
<li>
|
|
Extend HSSF by adding the ability to read and write formulas.
|
|
</li>
|
|
<li>
|
|
Extend HSSF by adding the ability to read and write
|
|
user-defined styles.
|
|
</li>
|
|
<li>
|
|
Create a Cocoon 2 Generator for HSSF using the same tags
|
|
as the HSSF Serializer.
|
|
</li>
|
|
<li>
|
|
Create a new library (HWPF) for reading and writing
|
|
Microsoft Word DOC format.
|
|
</li>
|
|
<li>
|
|
Refactor the HSSFSerializer into a separate extensible
|
|
POIFSSerializer and HSSFSerializer
|
|
</li>
|
|
<li>
|
|
Providing the create excel charts. (write only)
|
|
</li>
|
|
</ul>
|
|
</section>
|
|
</section>
|
|
<section><title>2. User Description</title>
|
|
<section><title>2.1 User/Market Demographics</title>
|
|
<p>
|
|
There are a number of enthusiastic
|
|
users of XML, UNIX and Java technology. Furthermore, the Microsoft
|
|
solution for outputting Office Document formats often involves
|
|
actually manipulating the software as an OLE Server. This method
|
|
provides extremely low performance, extremely high overhead and is
|
|
only capable of handing one document at a time.
|
|
</p>
|
|
<ol>
|
|
<li>
|
|
Our intended audience for the HSSF
|
|
Serializer portion of this project are developers writing reports or
|
|
data extracts in XML format.
|
|
</li>
|
|
<li>
|
|
Our intended audience for the HSSF
|
|
library portion of this project is ourselves as we are developing
|
|
the HSSF serializer and anyone who needs to read and write Excel
|
|
spreadsheets in a non-XML Java environment, or who has specific
|
|
needs not addressed by the Serializer
|
|
</li>
|
|
<li>
|
|
Our intended audience for the
|
|
POIFS library is ourselves as we are developing the HSSF and HWPF
|
|
libraries and anyone wishing to provide other libraries for
|
|
reading/writing other file formats utilizing the OLE 2 Compound
|
|
Document Format in Java.
|
|
</li>
|
|
<li>
|
|
Our intended audience for the HSSF
|
|
generator are developers who need to export Excel spreadsheets to
|
|
XML in a non-proprietary environment.
|
|
</li>
|
|
<li>
|
|
Our intended audience for the HWPF
|
|
library is ourselves, as we will be developing a HWPF Serializer in a
|
|
later release, and anyone wishing to add .DOC file processing and
|
|
creation to their projects.
|
|
</li>
|
|
</ol>
|
|
</section>
|
|
<section><title>2.2. User environment</title>
|
|
<p>
|
|
The users of this software shall be
|
|
developers in a Java environment on any operating system, or power
|
|
users who are capable of XML document generation/deployment.
|
|
</p>
|
|
</section>
|
|
<section><title>2.3. Key User Needs</title>
|
|
<p>
|
|
The HSSF library currently requires a
|
|
full object representation to be created before reading values. This
|
|
results in very high memory utilization. We need to reduce this
|
|
substantially for reading. It would be preferable to do this for
|
|
writing, but it may not be possible due to the constraints imposed by
|
|
the file format itself. Memory utilization during read is our top
|
|
user complaint.
|
|
</p>
|
|
<p>
|
|
The POIFS library currently requires a
|
|
full object representation to be created before reading values. This
|
|
results in very high memory utilization. We need to reduce this
|
|
substantially for reading.
|
|
</p>
|
|
<p>
|
|
The HSSF library currently ignores
|
|
formula cells and identifies them as "UnknownRecord" at the
|
|
lower level of the API. We must provide a way to read and write
|
|
formulas. This is now the top requested feature.
|
|
</p>
|
|
<p>
|
|
The HSSF library currently does not support
|
|
charts. This is a key requirement of some users who wish to use HSSF
|
|
in a reporting engine.
|
|
</p>
|
|
<p>
|
|
The HSSF Serializer currently does not
|
|
provide serialization for cell styling. User's will want stylish
|
|
spreadsheets to result from their XML.
|
|
</p>
|
|
<p>
|
|
There is currently no way to generate
|
|
the XML from an XLS that is consistent with the format used by the
|
|
HSSF Serializer.
|
|
</p>
|
|
<p>
|
|
There should be a way to read and write
|
|
the DOC file format using pure Java.
|
|
</p>
|
|
|
|
</section>
|
|
</section>
|
|
<section><title>3. Project Overview</title>
|
|
<section><title>3.1. Project Perspective</title>
|
|
<p>
|
|
The produced code shall be licensed by
|
|
the Apache License as used by the Cocoon 2 project (APL 1.1) and
|
|
maintained on at <link href="http://poi.sourceforge.net/">http://poi.sourceforge.net</link>
|
|
and <link href="http://sourcefoge.net/projects/poi">http://sourcefoge.net/projects/poi</link>.
|
|
It is our hope to at some point integrate with the various Apache
|
|
projects (xml.apache.org and jakarta.apache.org), at which point we'd
|
|
turn the copyright over to them.
|
|
</p>
|
|
</section>
|
|
<section><title>3.2. Project Position Statement</title>
|
|
<p>
|
|
For developers on a Java and/or XML
|
|
environment this project will provide all the tools necessary for
|
|
outputting XML data in the Microsoft Excel format. This project seeks
|
|
to make the use of Microsoft Windows based servers unnecessary for
|
|
file format considerations and to fully document the OLE 2 Compound
|
|
Document format. The project aims not only to provide the tools for
|
|
serializing XML to Excel and Word file formats and the tools for
|
|
writing to those file formats from Java, but also to provide the
|
|
tools for later projects to convert other OLE 2 Compound Document
|
|
formats to pure Java APIs.
|
|
</p>
|
|
</section>
|
|
<section><title>3.3. Summary of Capabilities</title>
|
|
<p>
|
|
HSSF Serializer for Apache Cocoon 2
|
|
</p>
|
|
<table>
|
|
<tr>
|
|
<th>
|
|
Benefit
|
|
</th>
|
|
<th>
|
|
Supporting Features
|
|
</th>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Ability to serialize styles from XML spreadsheets.
|
|
</td>
|
|
<td>
|
|
HSSFSerialzier will support styles.
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Ability to read and write formulas in XLS files.
|
|
</td>
|
|
<td>
|
|
HSSF will support reading/writing formulas.
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Ability to output in MS Word on any platform using Java.
|
|
</td>
|
|
<td>
|
|
The project will develop an API that outputs in Word format
|
|
using pure Java.
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Enhance performance for reading and writing XLS files.
|
|
</td>
|
|
<td>
|
|
HSSF will undergo a number of performance enhancements. HSSF
|
|
will include a new event-based API for reading XLS files. POIFS
|
|
will support a new event-based API for reading OLE2 CDF files.
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
Ability to generate XML from XLS files
|
|
</td>
|
|
<td>
|
|
The project will develop an HSSF Generator.
|
|
</td>
|
|
</tr>
|
|
<tr>
|
|
<td>
|
|
The ability to generate charts
|
|
</td>
|
|
<td>
|
|
HSSF will provide low level support for chart records as well
|
|
as high level API support for generating charts. The ability
|
|
to read chart information will not initially be provided.
|
|
</td>
|
|
</tr>
|
|
|
|
</table>
|
|
</section>
|
|
<section><title>3.4. Assumptions and Dependencies</title>
|
|
<ul>
|
|
<li>
|
|
The HSSF Serializer and Generator
|
|
will support the Gnumeric 1.0 XML tag language.
|
|
</li>
|
|
<li>
|
|
The HSSF Generator and HSSF
|
|
Serializer will be mutually validating. It should be possible to
|
|
have an XLS file created by the Serializer run through the Generator
|
|
and the output back through the Serializer (via the Cocoon pipeline)
|
|
and get the same file or a reasonable facimille (no one cares if it
|
|
differs by the order of the binary records in some minor but
|
|
non-visually recognizable manner).
|
|
</li>
|
|
<li>
|
|
The HSSF Generator will run on any
|
|
Java 2 supporting platform with Apache Cocoon 2 installed along with
|
|
the HSSF and POIFS APIs.
|
|
</li>
|
|
<li>
|
|
The HSSF Serializer will run on
|
|
any Java 2 supporting platform with Apache Cocoon 2 installed along
|
|
with the HSSF and POIFS APIs.
|
|
</li>
|
|
<li>
|
|
The HWPF API requires a Java 2
|
|
implementation and the POIFS API.
|
|
</li>
|
|
<li>
|
|
The HSSF API requires a Java 2
|
|
implementation and the POIFS API.
|
|
</li>
|
|
<li>
|
|
The POIFS API requires a Java 2
|
|
implementation.
|
|
</li>
|
|
|
|
</ul>
|
|
</section>
|
|
</section>
|
|
<section><title>4. Project Features</title>
|
|
<p>
|
|
Enhancements to the POIFS API will
|
|
include:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
An event driven API for reading
|
|
POIFS Filesystems.
|
|
</li>
|
|
<li>
|
|
A low-level API for
|
|
creating/manipulating POI filesystems.
|
|
</li>
|
|
<li>
|
|
Code improvements supporting
|
|
greater separation between read and write structures.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
Enhancements to the HSSF API will
|
|
include:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
An event driven API for reading
|
|
XLS files.
|
|
</li>
|
|
<li>
|
|
Performance improvements.
|
|
</li>
|
|
<li>
|
|
Formula support (read/write)
|
|
</li>
|
|
<li>
|
|
Support for user-defined data
|
|
formats
|
|
</li>
|
|
<li>
|
|
Better documentation of the file
|
|
format and structure.
|
|
</li>
|
|
<li>
|
|
An API for creation of charts.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
The HSSF Generator will include:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
A set of classes supporting the
|
|
Cocoon 2 Generator interfaces providing a method for reading XLS
|
|
files and outputting SAX events.
|
|
</li>
|
|
<li>
|
|
The same tag format used by the
|
|
HSSFSerializer in any given release.
|
|
</li>
|
|
</ul>
|
|
<p>
|
|
The HWPF API will include:
|
|
</p>
|
|
<ul>
|
|
<li>
|
|
An event driven API for reading
|
|
DOC files.
|
|
</li>
|
|
<li>
|
|
A set of high and low level APIs
|
|
for reading and writing DOC files.
|
|
</li>
|
|
<li>
|
|
Documentation of the DOC file
|
|
format or enhancements to existing documentation.
|
|
</li>
|
|
</ul>
|
|
</section>
|
|
<section><title>5. Other Product Requirements</title>
|
|
<section><title>5.1. Applicable Standards</title>
|
|
<p>
|
|
All Java code will be 100% pure Java.
|
|
</p>
|
|
</section>
|
|
<section><title>5.2. System Requirements</title>
|
|
<p>
|
|
The minimum system requirements for the POIFS API are:
|
|
</p>
|
|
<ul>
|
|
<li>64 Mbytes memory</li>
|
|
<li>Java 2 environment</li>
|
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
|
</ul>
|
|
<p>
|
|
The minimum system requirements for the the HSSF API are:
|
|
</p>
|
|
<ul>
|
|
<li>64 Mbytes memory</li>
|
|
<li>Java 2 environment</li>
|
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
|
<li>POIFS API</li>
|
|
</ul>
|
|
<p>
|
|
The minimum system requirements for the the HWPF API are:
|
|
</p>
|
|
<ul>
|
|
<li>64 Mbytes memory</li>
|
|
<li>Java 2 environment</li>
|
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
|
<li>POIFS API</li>
|
|
</ul>
|
|
|
|
<p>
|
|
The minimum system requirements for the HSSF Serializer are:
|
|
</p>
|
|
<ul>
|
|
<li>64 Mbytes memory</li>
|
|
<li>Java 2 environment</li>
|
|
<li>Pentium or better processor (or equivalent on other platforms)</li>
|
|
<li>Cocoon 2</li>
|
|
<li>HSSF API</li>
|
|
<li>POI API</li>
|
|
</ul>
|
|
</section>
|
|
<section><title>5.3. Performance Requirements</title>
|
|
<p>
|
|
All components must perform well enough
|
|
to be practical for use in a webserver environment (especially
|
|
the "killer trio": Cocoon2/Tomcat/Apache combo)
|
|
</p>
|
|
</section>
|
|
<section><title>5.4. Environmental Requirements</title>
|
|
<p>
|
|
The software will run primarily in
|
|
developer environments. We should make some allowances for
|
|
not-highly-technical users to write XML documents for the HSSF
|
|
Serializer. All other components will assume intermediate Java 2
|
|
knowledge. No XML knowledge will be required except for using the
|
|
HSSF Serializer. As much documentation as is practical shall be
|
|
required for all components as XML is relatively new, and the
|
|
concepts introduced for writing spreadsheets and to POI filesystems
|
|
will be brand new to Java and many Java developers.
|
|
</p>
|
|
</section>
|
|
</section>
|
|
<section><title>6. Documentation Requirements</title>
|
|
<section><title>6.1 POI Filesystem</title>
|
|
<p>
|
|
The filesystem as read and written by
|
|
POI shall be fully documented and explained so that the average Java
|
|
developer can understand it.
|
|
</p>
|
|
</section>
|
|
<section><title>6.2. POI API</title>
|
|
<p>
|
|
The POI API will be fully documented
|
|
through Javadoc. A walkthrough of using the high level POI API shall
|
|
be provided. No documentation outside of the Javadoc shall be
|
|
provided for the low-level POI APIs.
|
|
</p>
|
|
</section>
|
|
<section><title>6.3. HSSF File Format</title>
|
|
<p>
|
|
The HSSF File Format as implemented by
|
|
the HSSF API will be fully documented. No documentation will be
|
|
provided for features that are not supported by HSSF API that are
|
|
supported by the Excel 97 File Format. Care will be taken not to
|
|
infringe on any "legal stuff". Additionally, we are
|
|
collaborating with the fine folks at OpenOffice.org on
|
|
*free* documentation of the format.
|
|
</p>
|
|
</section>
|
|
<section><title>6.4. HSSF API</title>
|
|
<p>
|
|
The HSSF API will be documented by
|
|
javadoc. A walkthrough of using the high level HSSF API shall be
|
|
provided. No documentation outside of the Javadoc shall be provided
|
|
for the low level HSSF APIs.
|
|
</p>
|
|
</section>
|
|
<section><title>6.5 HWPF API</title>
|
|
<p>
|
|
The HWPF API will be documented by
|
|
javadoc. A walkthrough of using the high level HWPF API shall be
|
|
provided. No documentation outside of the Javadoc shall be provided
|
|
for the low level HWPF APIs.
|
|
</p>
|
|
</section>
|
|
<section><title>6.6 HSSF Serializer</title>
|
|
<p>
|
|
The HSSF Serializer will be documented
|
|
by javadoc.
|
|
</p>
|
|
</section>
|
|
<section><title>6.7 HSSF Generator</title>
|
|
<p>
|
|
The HSSF Generator will be documented
|
|
by javadoc.
|
|
</p>
|
|
</section>
|
|
<section><title>6.8 HSSF Serializer Tag language</title>
|
|
<p>
|
|
The XML tag language along with
|
|
function and usage shall be fully documented. Examples will be
|
|
provided as well.
|
|
</p>
|
|
</section>
|
|
</section>
|
|
<section><title>7. Terminology</title>
|
|
<section><title>7.1 Filesystem</title>
|
|
<p>
|
|
filesystem shall refer only to the POI formatted archive.
|
|
</p>
|
|
</section>
|
|
<section><title>7.2 File</title>
|
|
<p>
|
|
file shall refer to the embedded data stream within a
|
|
POI filesystem. This will be the actual embedded document.
|
|
</p>
|
|
</section>
|
|
</section>
|
|
</body>
|
|
</document>
|