poi/src/documentation/xdocs/poifs/html/how-to.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
	<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1">
	<TITLE></TITLE>
	<META NAME="GENERATOR" CONTENT="StarOffice/5.2 (Linux)">
	<META NAME="CREATED" CONTENT="20011031;14571450">
	<META NAME="CHANGEDBY" CONTENT=" ">
	<META NAME="CHANGED" CONTENT="20011230;13132100">
</HEAD>
<BODY>
<H1>POIFS HOW TO</H1>
<H2>How to use POIFS directly</H2>
<H3>Andrew C. Oliver - December 14, 2001</H3>
<DL>
	<DD STYLE="margin-bottom: 0.2in">10.31.2001- initial revision for
	build POI 0.12.3
	</DD><DD STYLE="margin-bottom: 0.2in">
	12.15.2001 - minor revisions - thread safety, entry modification,
	name restrictions, and so on.</DD><DD STYLE="margin-bottom: 0.2in">
	12.30.2001 - revised for POI 1.0-final - minor revisions
	</DD></DL>
<H2>
Capabilities</H2>
<DL>
	<DD STYLE="margin-bottom: 0.2in">This release of POIFS contains the
	full functionality to read, write and modify (by recreation) files
	in the format most commonly referred to as OLE 2 Compound Document
	Format (proabably tm - Microsoft).
	</DD></DL>
<H2>
Target Audience</H2>
<P>This release candidate is intended for general use. It is
considered to be production-ready. It has not yet been extensively
tested (especially in a high load multi-threaded server situation),
though it's been unit tested quite a bit. This release is considered
to be &quot;golden&quot; as it has been used by HSSF and other users
without problems for some time, and has not changed recently.
</P>
<H2>General Use</H2>
<H3>User API</H3>
<H4>High level description and overview</H4>
<P>Files written with the POIFS library are referred to as POIFS file
systems (or sometimes archives). The OLE 2 Compound Document format
is designed to mimic many of the characteristics of a pre-modern file
system (most similar to FAT). We make the distinction between POIFS
written files and &quot;native&quot; written OLE 2 Compound Document
Format files because while we believe POIFS to be a full, correct and
complete implementation, most of this was accomplished through
researching other open source implementations and flat out guesses.</P>
<P>This overview is in no way intended to be complete (for a more
intense discussion please see POIFSFormat.html in this same
directory), it should give you a good idea into the principals of a
POIFS file system. Please note that specific file formats such as XLS
(HSSF) or DOC utilize POIFS file systems to contain their data, POIFS
itself does not know how to interpret the archived data.</P>
<P>Every POIFS file system contains a hierarchy of directories
starting with the root (there is always one, and only one, root).
Each directory, including the root, may contain one or more
directories and/or documents. Every directory and document has a
name. The root directory has a name, but unlike other directories,
its name is fixed and cannot be renamed.</P>
<P><STRONG>The POIFS API was not designed to be, and is not,
thread-safe.</STRONG> Only one thread of control should ever
manipulate a specific POIFS file system over that file system's
lifetime. You can, of course, have multiple threads, each
manipulating a distinct POIFS file system instance.</P>
<H4>Writing a new one</H4>
<P>To create a new (from scratch) POIFS file system for writing to,
you simply create an instance of
<CODE>net.sourceforge.poi.poifs.filesystem.Filesystem</CODE> using
the default constructor (no arguments). Initially this POIFS file
system will be empty except for containing the essential root
directory.</P>
<P>From there you can create a directory entry by calling&nbsp;<CODE>
Filesystem.createDirectory(name)</CODE>, and passing in the name of
the directory. This will return an instance
of&nbsp;<CODE>net.sourceforge.poi.poifs.filesystem.DirectoryEntry</CODE>
. You can also create a document within the root directory by
calling&nbsp;<CODE> Filesystem.createDocument(name, inputstream)</CODE>,
and passing the name of the document and an instance
of&nbsp;<CODE>java.io.InputStream</CODE> from which the document's
data can be obtained. It is noted that, the most commonly used file
formats of the Microsoft Corporation such as DOC, XLS, etc. are all
POIFS-compatible file systems with documents stored in the root
directory.</P>
<P>Supposing the document is to be stored in a directory other than
the root, you take the instance of&nbsp;<CODE>DirectoryEntry</CODE>
that you created and call&nbsp;<CODE>createDocument(name,
inputstream)</CODE> on it instead. You can also create a child
directory by calling&nbsp;<CODE> createDirectory(name)</CODE>.
Alternatively you can call&nbsp;<CODE>Filesystem.getRoot()</CODE> and
use it just like any other directory entry.</P>
<P>When you've finished creating entries in the filesystem, simply
call&nbsp;<CODE> Filesystem.writeFilesystem(stream)</CODE> passing in
an instance of&nbsp;<CODE> java.io.OutputStream</CODE>. Be sure you
close the stream when you're done.</P>
<H5><A NAME="Names"></A>Names</H5>
<P>The POIFS file system imposes two limitations on document and
directory names:</P>
<OL>
	<LI><P STYLE="margin-bottom: 0in">The names of documents and
	directories must be unique within their containing directory. Pretty
	obvious.
	</P>
	<LI><P>Names are restricted to 31 characters. If you create a
	directory or document with a name longer than that, it will be
	silently truncated. When truncated, it may conflict with the name of
	another directory or document, and the create operation will fail.
	</P>
</OL>
<H5>Why not Readers and Writers?</H5>
<P>The POIFS file system uses Streams because HSSF, and virtually all
other applications that would use POIFS, deals with binary files,
which Streams handle correctly. Readers and Writers deal with text
and know how to handle 16-bit characters. If there is a demand for
providing support for Readers and Writers, let us know.</P>
<P>Here is some example code (excerpted and adapted from
net.sourceforge.poi.hssf.usermodel.Workbook class):</P>
<PRE>        byte[]     bytes        = getBytes();                                             // get the bytes for the document (elsewhere in the class)
        FileOutputStream stream = new FileOutputStream(&quot;/home/reportsys/test/text.xls&quot;);  // create a new FileOuputStream
        Filesystem fs           = new Filesystem();                                       // create a new POIFS Filesystem object
        fs.createDocument(new ByteArrayInputStream(bytes), &quot;Workbook&quot;);                   // create a new document in the root directory of the POIFS filesystem
                                                                                          // close on ByteArrayInputStream is a no-op so we don't bother, no real file handle is used
        fs.writeFilesystem(stream);                                                       // write the filesystem to the output stream.
        Stream.close();                                                                   // close our stream (don't leak file handles its bad news)</PRE><H4>
Reading or modifying an existing file</H4>
<P>Reading in an exising POIFS file system is equally simple. Create
a new instance of <CODE>net.sourceforge.poi.poifs.filesystem.Filesystem</CODE>
by calling the <CODE>Filesystem(java.io.InputStream)</CODE>
constructor and passing in your file system's data (this would
probably be a <CODE>FileInputStream</CODE> , but it doesn't matter).
From there you can get documents from the root directory by calling
<CODE>Filesystem.createDocumentInputStream(name)</CODE> and passing a
string representing that document's name.</P>
<P>If you wish to walk the filesystem, the easiest thing to do is
<CODE>DirectoryEntry.getEntries()</CODE>. This will give you a
<CODE>java.util.Iterator</CODE> of <CODE>Entry</CODE> instances
(<CODE>DirectoryEntry </CODE>and <CODE>DocumentEntry</CODE> are
extensions of <CODE>Entry</CODE>) contained by the <CODE>DirectoryEntry</CODE>
. For instance you could call <CODE>Filesystem.getRoot()</CODE> to
retrieve a <CODE>DirectoryEntry</CODE> instance. From there you could
call <CODE>DirectoryEntry.getEntries()</CODE> and retrieve an
<CODE>Iterator</CODE> of those entries. Iterating through these
entries, you'd call <CODE>getName()</CODE> to check the name of the
entry and <CODE>isDocumentEntry()</CODE> or <CODE>isDirectoryEntry()</CODE>
to determine its type. Going the other way, given an <CODE>Entry</CODE>,
you can walk back up the directory chain by calling <CODE>getParent()</CODE>,
which returns the <CODE>Entry</CODE>'s containing <CODE>DirectoryEntry</CODE>
(calling <CODE>getParent()</CODE> on the root directory returns a
<SAMP>null</SAMP> reference).</P>
<P>With a <CODE>DocumentEntry</CODE>, you can create an instance of
<CODE>net.sourceforge.poi.poifs.filesystem.DocumentInputStream</CODE>
, by passing the <CODE>DocumentEntry</CODE> as the only argument to
the constructor of <CODE>DocumentInputStream.</CODE>The
<CODE>DocumentInputStream</CODE> class is a simple extension of
<CODE>java.io.InputStream</CODE> that fully supports the <CODE>InputStream</CODE>
API, including the <CODE>mark</CODE> , <CODE>reset</CODE>, and <CODE>skip</CODE>
methods, providing a form of random access I/O.</P>
<P>To modify the file you would simply walk through the entries and
follow the same instructions for writing a POIFS file system from
scratch. There are also methods to delete an <CODE>Entry</CODE>
(note: you cannot delete the root directory, nor can you delete a
<CODE>DirectoryEntry</CODE> unless it's empty) and to rename an <CODE>Entry</CODE>
(but see the <A HREF="#Names">notes</A> above).
</P>
<H3>POIFS Logging facility</H3>
<P>POIFS does not yet use log4j style logging.</P>
<P>Here is an example
</P>
<PRE STYLE="margin-bottom: 0.2in">Paste log config example</PRE><H3>
POIFS Developer's Tools</H3>
<P>POIFS does not yet have developer's tools.
</P>
<H3>What's Next?</H3>
<OL>
	<LI><P STYLE="margin-bottom: 0in">Refactoring of the API to more
	cleanly separate write from read.
	</P>
	<LI><P STYLE="margin-bottom: 0in">Add logging/tracing code
	</P>
	<LI><P STYLE="margin-bottom: 0in">Add tree viewer (probably Andy)
	</P>
	<LI><P>Read/write support for creation and modification time stamps
	</P>
</OL>
<P><BR><BR>
</P>
</BODY>
</HTML>