190 lines
10 KiB
HTML
Executable File
190 lines
10 KiB
HTML
Executable File
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
|
|
<HTML>
|
|
<HEAD>
|
|
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1">
|
|
<TITLE></TITLE>
|
|
<META NAME="GENERATOR" CONTENT="StarOffice/5.2 (Linux)">
|
|
<META NAME="CREATED" CONTENT="20011031;14571450">
|
|
<META NAME="CHANGEDBY" CONTENT=" ">
|
|
<META NAME="CHANGED" CONTENT="20011230;13132100">
|
|
</HEAD>
|
|
<BODY>
|
|
<H1>POIFS HOW TO</H1>
|
|
<H2>How to use POIFS directly</H2>
|
|
<H3>Andrew C. Oliver - December 14, 2001</H3>
|
|
<DL>
|
|
<DD STYLE="margin-bottom: 0.2in">10.31.2001- initial revision for
|
|
build POI 0.12.3
|
|
</DD><DD STYLE="margin-bottom: 0.2in">
|
|
12.15.2001 - minor revisions - thread safety, entry modification,
|
|
name restrictions, and so on.</DD><DD STYLE="margin-bottom: 0.2in">
|
|
12.30.2001 - revised for POI 1.0-final - minor revisions
|
|
</DD></DL>
|
|
<H2>
|
|
Capabilities</H2>
|
|
<DL>
|
|
<DD STYLE="margin-bottom: 0.2in">This release of POIFS contains the
|
|
full functionality to read, write and modify (by recreation) files
|
|
in the format most commonly referred to as OLE 2 Compound Document
|
|
Format (proabably tm - Microsoft).
|
|
</DD></DL>
|
|
<H2>
|
|
Target Audience</H2>
|
|
<P>This release candidate is intended for general use. It is
|
|
considered to be production-ready. It has not yet been extensively
|
|
tested (especially in a high load multi-threaded server situation),
|
|
though it's been unit tested quite a bit. This release is considered
|
|
to be "golden" as it has been used by HSSF and other users
|
|
without problems for some time, and has not changed recently.
|
|
</P>
|
|
<H2>General Use</H2>
|
|
<H3>User API</H3>
|
|
<H4>High level description and overview</H4>
|
|
<P>Files written with the POIFS library are referred to as POIFS file
|
|
systems (or sometimes archives). The OLE 2 Compound Document format
|
|
is designed to mimic many of the characteristics of a pre-modern file
|
|
system (most similar to FAT). We make the distinction between POIFS
|
|
written files and "native" written OLE 2 Compound Document
|
|
Format files because while we believe POIFS to be a full, correct and
|
|
complete implementation, most of this was accomplished through
|
|
researching other open source implementations and flat out guesses.</P>
|
|
<P>This overview is in no way intended to be complete (for a more
|
|
intense discussion please see POIFSFormat.html in this same
|
|
directory), it should give you a good idea into the principals of a
|
|
POIFS file system. Please note that specific file formats such as XLS
|
|
(HSSF) or DOC utilize POIFS file systems to contain their data, POIFS
|
|
itself does not know how to interpret the archived data.</P>
|
|
<P>Every POIFS file system contains a hierarchy of directories
|
|
starting with the root (there is always one, and only one, root).
|
|
Each directory, including the root, may contain one or more
|
|
directories and/or documents. Every directory and document has a
|
|
name. The root directory has a name, but unlike other directories,
|
|
its name is fixed and cannot be renamed.</P>
|
|
<P><STRONG>The POIFS API was not designed to be, and is not,
|
|
thread-safe.</STRONG> Only one thread of control should ever
|
|
manipulate a specific POIFS file system over that file system's
|
|
lifetime. You can, of course, have multiple threads, each
|
|
manipulating a distinct POIFS file system instance.</P>
|
|
<H4>Writing a new one</H4>
|
|
<P>To create a new (from scratch) POIFS file system for writing to,
|
|
you simply create an instance of
|
|
<CODE>net.sourceforge.poi.poifs.filesystem.Filesystem</CODE> using
|
|
the default constructor (no arguments). Initially this POIFS file
|
|
system will be empty except for containing the essential root
|
|
directory.</P>
|
|
<P>From there you can create a directory entry by calling <CODE>
|
|
Filesystem.createDirectory(name)</CODE>, and passing in the name of
|
|
the directory. This will return an instance
|
|
of <CODE>net.sourceforge.poi.poifs.filesystem.DirectoryEntry</CODE>
|
|
. You can also create a document within the root directory by
|
|
calling <CODE> Filesystem.createDocument(name, inputstream)</CODE>,
|
|
and passing the name of the document and an instance
|
|
of <CODE>java.io.InputStream</CODE> from which the document's
|
|
data can be obtained. It is noted that, the most commonly used file
|
|
formats of the Microsoft Corporation such as DOC, XLS, etc. are all
|
|
POIFS-compatible file systems with documents stored in the root
|
|
directory.</P>
|
|
<P>Supposing the document is to be stored in a directory other than
|
|
the root, you take the instance of <CODE>DirectoryEntry</CODE>
|
|
that you created and call <CODE>createDocument(name,
|
|
inputstream)</CODE> on it instead. You can also create a child
|
|
directory by calling <CODE> createDirectory(name)</CODE>.
|
|
Alternatively you can call <CODE>Filesystem.getRoot()</CODE> and
|
|
use it just like any other directory entry.</P>
|
|
<P>When you've finished creating entries in the filesystem, simply
|
|
call <CODE> Filesystem.writeFilesystem(stream)</CODE> passing in
|
|
an instance of <CODE> java.io.OutputStream</CODE>. Be sure you
|
|
close the stream when you're done.</P>
|
|
<H5><A NAME="Names"></A>Names</H5>
|
|
<P>The POIFS file system imposes two limitations on document and
|
|
directory names:</P>
|
|
<OL>
|
|
<LI><P STYLE="margin-bottom: 0in">The names of documents and
|
|
directories must be unique within their containing directory. Pretty
|
|
obvious.
|
|
</P>
|
|
<LI><P>Names are restricted to 31 characters. If you create a
|
|
directory or document with a name longer than that, it will be
|
|
silently truncated. When truncated, it may conflict with the name of
|
|
another directory or document, and the create operation will fail.
|
|
</P>
|
|
</OL>
|
|
<H5>Why not Readers and Writers?</H5>
|
|
<P>The POIFS file system uses Streams because HSSF, and virtually all
|
|
other applications that would use POIFS, deals with binary files,
|
|
which Streams handle correctly. Readers and Writers deal with text
|
|
and know how to handle 16-bit characters. If there is a demand for
|
|
providing support for Readers and Writers, let us know.</P>
|
|
<P>Here is some example code (excerpted and adapted from
|
|
net.sourceforge.poi.hssf.usermodel.Workbook class):</P>
|
|
<PRE> byte[] bytes = getBytes(); // get the bytes for the document (elsewhere in the class)
|
|
FileOutputStream stream = new FileOutputStream("/home/reportsys/test/text.xls"); // create a new FileOuputStream
|
|
Filesystem fs = new Filesystem(); // create a new POIFS Filesystem object
|
|
fs.createDocument(new ByteArrayInputStream(bytes), "Workbook"); // create a new document in the root directory of the POIFS filesystem
|
|
// close on ByteArrayInputStream is a no-op so we don't bother, no real file handle is used
|
|
fs.writeFilesystem(stream); // write the filesystem to the output stream.
|
|
Stream.close(); // close our stream (don't leak file handles its bad news)</PRE><H4>
|
|
Reading or modifying an existing file</H4>
|
|
<P>Reading in an exising POIFS file system is equally simple. Create
|
|
a new instance of <CODE>net.sourceforge.poi.poifs.filesystem.Filesystem</CODE>
|
|
by calling the <CODE>Filesystem(java.io.InputStream)</CODE>
|
|
constructor and passing in your file system's data (this would
|
|
probably be a <CODE>FileInputStream</CODE> , but it doesn't matter).
|
|
From there you can get documents from the root directory by calling
|
|
<CODE>Filesystem.createDocumentInputStream(name)</CODE> and passing a
|
|
string representing that document's name.</P>
|
|
<P>If you wish to walk the filesystem, the easiest thing to do is
|
|
<CODE>DirectoryEntry.getEntries()</CODE>. This will give you a
|
|
<CODE>java.util.Iterator</CODE> of <CODE>Entry</CODE> instances
|
|
(<CODE>DirectoryEntry </CODE>and <CODE>DocumentEntry</CODE> are
|
|
extensions of <CODE>Entry</CODE>) contained by the <CODE>DirectoryEntry</CODE>
|
|
. For instance you could call <CODE>Filesystem.getRoot()</CODE> to
|
|
retrieve a <CODE>DirectoryEntry</CODE> instance. From there you could
|
|
call <CODE>DirectoryEntry.getEntries()</CODE> and retrieve an
|
|
<CODE>Iterator</CODE> of those entries. Iterating through these
|
|
entries, you'd call <CODE>getName()</CODE> to check the name of the
|
|
entry and <CODE>isDocumentEntry()</CODE> or <CODE>isDirectoryEntry()</CODE>
|
|
to determine its type. Going the other way, given an <CODE>Entry</CODE>,
|
|
you can walk back up the directory chain by calling <CODE>getParent()</CODE>,
|
|
which returns the <CODE>Entry</CODE>'s containing <CODE>DirectoryEntry</CODE>
|
|
(calling <CODE>getParent()</CODE> on the root directory returns a
|
|
<SAMP>null</SAMP> reference).</P>
|
|
<P>With a <CODE>DocumentEntry</CODE>, you can create an instance of
|
|
<CODE>net.sourceforge.poi.poifs.filesystem.DocumentInputStream</CODE>
|
|
, by passing the <CODE>DocumentEntry</CODE> as the only argument to
|
|
the constructor of <CODE>DocumentInputStream.</CODE>The
|
|
<CODE>DocumentInputStream</CODE> class is a simple extension of
|
|
<CODE>java.io.InputStream</CODE> that fully supports the <CODE>InputStream</CODE>
|
|
API, including the <CODE>mark</CODE> , <CODE>reset</CODE>, and <CODE>skip</CODE>
|
|
methods, providing a form of random access I/O.</P>
|
|
<P>To modify the file you would simply walk through the entries and
|
|
follow the same instructions for writing a POIFS file system from
|
|
scratch. There are also methods to delete an <CODE>Entry</CODE>
|
|
(note: you cannot delete the root directory, nor can you delete a
|
|
<CODE>DirectoryEntry</CODE> unless it's empty) and to rename an <CODE>Entry</CODE>
|
|
(but see the <A HREF="#Names">notes</A> above).
|
|
</P>
|
|
<H3>POIFS Logging facility</H3>
|
|
<P>POIFS does not yet use log4j style logging.</P>
|
|
<P>Here is an example
|
|
</P>
|
|
<PRE STYLE="margin-bottom: 0.2in">Paste log config example</PRE><H3>
|
|
POIFS Developer's Tools</H3>
|
|
<P>POIFS does not yet have developer's tools.
|
|
</P>
|
|
<H3>What's Next?</H3>
|
|
<OL>
|
|
<LI><P STYLE="margin-bottom: 0in">Refactoring of the API to more
|
|
cleanly separate write from read.
|
|
</P>
|
|
<LI><P STYLE="margin-bottom: 0in">Add logging/tracing code
|
|
</P>
|
|
<LI><P STYLE="margin-bottom: 0in">Add tree viewer (probably Andy)
|
|
</P>
|
|
<LI><P>Read/write support for creation and modification time stamps
|
|
</P>
|
|
</OL>
|
|
<P><BR><BR>
|
|
</P>
|
|
</BODY>
|
|
</HTML> |