This release candidate is intended for general use. It is considered to be production-ready. It has not yet been extensively tested (especially in a high load multi-threaded server situation), though it's been unit tested quite a bit. This release is considered to be "golden" as it has been used by HSSF and other users without problems for some time, and has not changed recently.
Files written with the POIFS library are referred to as POIFS file systems (or sometimes archives). The OLE 2 Compound Document format is designed to mimic many of the characteristics of a pre-modern file system (most similar to FAT). We make the distinction between POIFS written files and "native" written OLE 2 Compound Document Format files because while we believe POIFS to be a full, correct and complete implementation, most of this was accomplished through researching other open source implementations and flat out guesses.
This overview is in no way intended to be complete (for a more intense discussion please see POIFSFormat.html in this same directory), it should give you a good idea into the principals of a POIFS file system. Please note that specific file formats such as XLS (HSSF) or DOC utilize POIFS file systems to contain their data, POIFS itself does not know how to interpret the archived data.
Every POIFS file system contains a hierarchy of directories starting with the root (there is always one, and only one, root). Each directory, including the root, may contain one or more directories and/or documents. Every directory and document has a name. The root directory has a name, but unlike other directories, its name is fixed and cannot be renamed.
The POIFS API was not designed to be, and is not, thread-safe. Only one thread of control should ever manipulate a specific POIFS file system over that file system's lifetime. You can, of course, have multiple threads, each manipulating a distinct POIFS file system instance.
To create a new (from scratch) POIFS file system for writing to,
you simply create an instance of
net.sourceforge.poi.poifs.filesystem.Filesystem
using
the default constructor (no arguments). Initially this POIFS file
system will be empty except for containing the essential root
directory.
From there you can create a directory entry by calling
Filesystem.createDirectory(name)
, and passing in the name of
the directory. This will return an instance
of net.sourceforge.poi.poifs.filesystem.DirectoryEntry
. You can also create a document within the root directory by
calling Filesystem.createDocument(name, inputstream)
,
and passing the name of the document and an instance
of java.io.InputStream
from which the document's
data can be obtained. It is noted that, the most commonly used file
formats of the Microsoft Corporation such as DOC, XLS, etc. are all
POIFS-compatible file systems with documents stored in the root
directory.
Supposing the document is to be stored in a directory other than
the root, you take the instance of DirectoryEntry
that you created and call createDocument(name,
inputstream)
on it instead. You can also create a child
directory by calling createDirectory(name)
.
Alternatively you can call Filesystem.getRoot()
and
use it just like any other directory entry.
When you've finished creating entries in the filesystem, simply
call Filesystem.writeFilesystem(stream)
passing in
an instance of java.io.OutputStream
. Be sure you
close the stream when you're done.
The POIFS file system imposes two limitations on document and directory names:
The names of documents and directories must be unique within their containing directory. Pretty obvious.
Names are restricted to 31 characters. If you create a directory or document with a name longer than that, it will be silently truncated. When truncated, it may conflict with the name of another directory or document, and the create operation will fail.
The POIFS file system uses Streams because HSSF, and virtually all other applications that would use POIFS, deals with binary files, which Streams handle correctly. Readers and Writers deal with text and know how to handle 16-bit characters. If there is a demand for providing support for Readers and Writers, let us know.
Here is some example code (excerpted and adapted from net.sourceforge.poi.hssf.usermodel.Workbook class):
byte[] bytes = getBytes(); // get the bytes for the document (elsewhere in the class) FileOutputStream stream = new FileOutputStream("/home/reportsys/test/text.xls"); // create a new FileOuputStream Filesystem fs = new Filesystem(); // create a new POIFS Filesystem object fs.createDocument(new ByteArrayInputStream(bytes), "Workbook"); // create a new document in the root directory of the POIFS filesystem // close on ByteArrayInputStream is a no-op so we don't bother, no real file handle is used fs.writeFilesystem(stream); // write the filesystem to the output stream. Stream.close(); // close our stream (don't leak file handles its bad news)
Reading in an exising POIFS file system is equally simple. Create
a new instance of net.sourceforge.poi.poifs.filesystem.Filesystem
by calling the Filesystem(java.io.InputStream)
constructor and passing in your file system's data (this would
probably be a FileInputStream
, but it doesn't matter).
From there you can get documents from the root directory by calling
Filesystem.createDocumentInputStream(name)
and passing a
string representing that document's name.
If you wish to walk the filesystem, the easiest thing to do is
DirectoryEntry.getEntries()
. This will give you a
java.util.Iterator
of Entry
instances
(DirectoryEntry
and DocumentEntry
are
extensions of Entry
) contained by the DirectoryEntry
. For instance you could call Filesystem.getRoot()
to
retrieve a DirectoryEntry
instance. From there you could
call DirectoryEntry.getEntries()
and retrieve an
Iterator
of those entries. Iterating through these
entries, you'd call getName()
to check the name of the
entry and isDocumentEntry()
or isDirectoryEntry()
to determine its type. Going the other way, given an Entry
,
you can walk back up the directory chain by calling getParent()
,
which returns the Entry
's containing DirectoryEntry
(calling getParent()
on the root directory returns a
null reference).
With a DocumentEntry
, you can create an instance of
net.sourceforge.poi.poifs.filesystem.DocumentInputStream
, by passing the DocumentEntry
as the only argument to
the constructor of DocumentInputStream.
The
DocumentInputStream
class is a simple extension of
java.io.InputStream
that fully supports the InputStream
API, including the mark
, reset
, and skip
methods, providing a form of random access I/O.
To modify the file you would simply walk through the entries and
follow the same instructions for writing a POIFS file system from
scratch. There are also methods to delete an Entry
(note: you cannot delete the root directory, nor can you delete a
DirectoryEntry
unless it's empty) and to rename an Entry
(but see the notes above).
POIFS does not yet use log4j style logging.
Here is an example
Paste log config example
POIFS does not yet have developer's tools.
Refactoring of the API to more cleanly separate write from read.
Add logging/tracing code
Add tree viewer (probably Andy)
Read/write support for creation and modification time stamps