updated poifs docs

git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352106 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
mjohnson 2002-02-19 04:03:10 +00:00
parent 1ecca52edb
commit 50836090d6
16 changed files with 667 additions and 837 deletions

Binary file not shown.

After

Width:  |  Height:  |  Size: 7.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 13 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 4.7 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 1.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 9.4 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 17 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 11 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 2.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

View File

@ -8,6 +8,7 @@
<menu label="Navigation">
<menu-item label="Main" href="../index.html"/>
<menu-item label="How To" href="how-to.html"/>
<menu-item label="File System Documentation" href="fileformat.html"/>
<menu-item label="Use Cases" href="usecases.html"/>
</menu>

View File

@ -0,0 +1,666 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.0//EN" "../dtd/document-v10.dtd">
<document>
<header>
<authors>
<person email="mjohnson@apache.org" name="Marc Johnson" id="MJ"/>
</authors>
</header>
<body>
<s1 title="POIFS File System Internals">
<s2 title="Introduction">
<p>POIFS file systems are essentially normal files stored on a
Java-compatible platform's native file system. They are
typically identified by names ending in a four character
extension noting what type of data they contain. For
example, a file ending in &quot;.xls&quot; would likely
contain spreadsheet data, and a file ending in
&quot;.doc&quot; would probably contain a word processing
document. POIFS file systems are called &quot;file
system&quot;, because they contain multiple embedded files
in a manner similar to traditional file systems. Along
functional lines, it would be more accurate to call these
POIFS archives. For the remainder of this document it is
referred to as a file system in order to avoid confusion
with the &quot;files&quot; it contains.</p>
<p>POIFS file systems are compatible with those document
formats used by a well-known software company's popular
office productivity suite and programs outputting
compatible data. Because the POIFS file system does not
provide compression, encryption or any other worthwhile
feature, its not a good choice unless you require
interoperability with these programs.</p>
<p>The POIFS file system does not encode the documents
themselves. For example, if you had a word processor file
with the extension &quot;.doc&quot;, you would actually
have a POIFS file system with a document file archived
inside of that file system.</p>
</s2>
<s2 title="Document Conventions">
<p>This document utilizes the numeric types as described by
the Java Language Specification, which can be found at
<link href="http://java.sun.com">http://java.sun.com</link>. In
short:</p>
<ul>
<li>A <b>byte</b> is an 8 bit signed integer ranging from
-128 to 127.</li>
<li>A <b>short</b> is a 16 bit signed integer ranging from
-32768 to 32767</li>
<li>An <b>int</b> is a 32 bit signed integer ranging from
-2147483648 to 2147483647</li>
<li>A <b>long</b> is a 64 bit signed integer ranging from
-9.22E18 to 9.22E18.</li>
</ul>
<p>The Java Language Specification spells out a number of
other types that are not referred to by this document.</p>
<p>Where this document makes references to &quot;endian
conversion&quot; it is referring to the byte order of
stored numbers. Numbers in &quot;little-endian order&quot;
are stored with the <b>least</b> significant byte first. In
order to properly read a short, for example, you'd read two
bytes and then shift the second byte 8 bits to the left
before performing an <code>or</code> operation to it
against the first byte. The following code illustrates this
method:</p>
<source>
public int getShort (byte[] rec)
{
return ((rec[1] &lt;&lt; 8) | (rec[0] &amp; 0x00ff));
}</source>
</s2>
<s2 title="File System Walkthrough">
<p>This is a walkthrough of a POIFS file system and how it is
put together. It is not intended to give a concise
description but to give a &quot;big picture&quot; of the
general structure and how it's interpreted.</p>
<p>A POIFS file system begins with a header. This header
identifies locations in the file by function and provides a
sanity check identifying a file as a POIFS file system.</p>
<p>The first 64 bits of the header compose a <b>magic number
identifier.</b> This identifier tells the client software
that this is indeed a POIFS file system and that it should
be treated as such. This is a &quot;sanity check&quot; to
make sure this is a POIFS file system and not some other
format. The header also contains an <b>array of block
numbers</b>. These block numbers refer to blocks in the
file. When these blocks are read together they form the
<b>Block Allocation Table</b>. The header also contains a
pointer to the first element in the <b>property table</b>,
also known as the <b>root element</b>, and a pointer to the
<b>small Block Allocation Table (SBAT)</b>.</p>
<p>The <b>block allocation table</b> or <b>BAT</b>, along with
the <b>property table</b>, specify which blocks in the file
system belong to which files. After the header block, the
file system is divided into identically sized blocks of
data, numbered from 0 to however many blocks there are in
the file system. For each file in the file system, its
entry in the property table includes the index of the first
block in the array of blocks. Each block's index into the
array of blocks is also its index into the BAT, and the
integer value stored at that index in the BAT gives the
index of the next block in the array (and thus the index of
the next BAT value). A special value is stored in the BAT
to indicate &quot;end of file&quot;.</p>
<p>The <b>property table</b> is essentially the directory
storage for the file system. It consists of the name of the
file or directory, its <b>start block</b> in both the file
system and <b>BAT</b>, and its actual size. The first
property in the property table is the <b>root
element</b>. It has two purposes: to be a directory entry
(the root of the directory tree, to be specific), and to
hold the start block for the <b>small block data</b>.</p>
<p>Small block data is a special file that contains the data
for small files (less than 4K bytes). It subdivides its
blocks into smaller blocks and there is a special small
block allocation table that, like the main BAT for larger
files, is used to map a small file to its small blocks.</p>
</s2>
<s3 title="Header Block">
<p>The POIFS file system begins with a <b>header
block</b>. The first 64 bits of the header form a long
<b>file type id</b> or <b>magic number identifier</b> of
<code>0xE11AB1A1E011CFD0L</code>. This is basically a
sanity check. If this isn't the first thing in the header
(and consequently the file system) then this is not a
POIFS file system and should be read with some other
library.</p>
<p>It's important to know the most important parts of the
header. These are discussed in the rest of this
section.</p>
<s4 title="BATs">
<p>At offset <b>0x2C</b> is an int specifying the number
of elements in the <b>BAT array</b>. The array at
<b>0x4C</b> an array of ints. This array contains the
indices of every block in the Block Allocation
Table.</p>
</s4>
<s4 title="XBATs">
<p>Very large POIFS archives may have more blocks than can
be addressed by the BAT blocks enumerated in the header
block. How large? Well, the BAT array in the header can
contain up to 109 BAT block indices; each BAT block
references up to 128 blocks, and each block is 512
bytes, so we're talking about 109 * 128 * 512 =
6.8MB. That's a pretty respectable document! But, you
could have much more data than that, and in today's
world of cheap gigabyte drives, why not? So, the BAT
may be extended in that event. The integer value at
offset <b>0x44</b> of the header is the index of the
first <b>extended BAT (XBAT) block</b>. At offset
<b>0x48</b> of the header, there is an int value that
specifies how many XBAT blocks there are. The XBAT
blocks begin at the specified index into the array of
blocks making up the POIFS file system, and continue in
sequence for the specified count of XBAT blocks.</p>
<p>Each XBAT block contains the indices of up to 128 BAT
blocks, so the document size can be expanded by another
8MB for each XBAT block. The BAT blocks indexed by an
XBAT block are appended to the end of the list of BAT
blocks enumerated in the header block. Thus the BAT
blocks enumerated in the header block are BAT blocks 0
through 108, the BAT blocks enumerated in the first
XBAT block are BAT blocks 109 through 236, the BAT
blocks enumerated in the second XBAT block are BAT
blocks 237 through 364, and so on.</p>
<p>Through the use of XBAT blocks, the limit on the
overall document size is that imposed by the 4-byte
block indices; if the indices are unsigned ints, the
maximum file size is 2 terabytes, 1 terabyte if the
indices are treated as signed ints. Either way, I have
yet to see a disk drive large enough to accommodate
such a file on the shelves at the local office supply
stores.</p>
</s4>
<s4 title="SBATs">
<p>If a file contained in a POIFS archive is smaller than
4096 bytes, it is stored in small blocks. Small blocks
are 64 bytes in length and are contained within big
blocks, up to 8 to a big block. As the main BAT is used
to navigate the array of big blocks, so the <b>small
block allocation table</b> is used to navigate the
array of small blocks. The SBAT's start block index is
found at offset <b>0x3C</b> of the header block, and
remaining blocks constituting the SBAT are found by
walking the main BAT as if it were an ordinary file in
the POIFS file system (this process is described
below).</p>
</s4>
<s4 title="Property Table Start Index">
<p>An integer at address <b>0x30</b> specifies the start
index of the property table. This integer is specified
as a <b>&quot;block index&quot;</b>. The Property Table
is stored, as is almost everything in a POIFS file
system, in big blocks and walked via the BAT. The
Property Table is described below.</p>
</s4>
</s3>
<s3 title="Property Table">
<p>The property table is essentially nothing more than the
directory system. Properties are 128 byte records
contained within the 512 byte blocks. The first property
is always the Root Entry. The following applies to
individual properties within a property table:</p>
<ul>
<li>At offset <b>0x00</b> in the property is the
&quot;<b>name</b>&quot;. This is stored as an
uncompressed 16 bit unicode string. In short every
other byte corresponds to an &quot;ASCII&quot;
character. The size of this string is stored at offset
<b>0x40</b> (<b>string size</b>) as a short.</li>
<li>At offset <b>0x42</b> is the <b>property type</b>
(byte). The type is 1 for directory, 2 for file or 5
for the Root Entry.</li>
<li>At offset <b>0x43</b> is the <b>node color</b>
(byte). The color is either 1, (black), or 0,
(red). Properties are apparently meant to be arranged
in a red-black binary tree, subject to the following
rules:
<ol>
<li>The root of the tree is always black</li>
<li>Two consecutive nodes cannot both be red</li>
<li>A property is less than another property if its
name length is less than the other property's name
length</li>
<li>If two properties have the same name length, the
sort order is determined by the sort order of the
properties' names.</li>
</ol></li>
<li>At offset <b>0x44</b> is the index (int) of the
<b>previous property</b>.</li>
<li>At offset <b>0x48</b> is the index (int) of the
<b>next property</b>.</li>
<li>At offset <b>0x4C</b> is the index (int) of the
<b>first directory entry</b>. This is used by
directory entries.</li>
<li>At offset <b>0x74</b> is an integer giving the
<b>start block</b> for the file described by this
property. This index corresponds to an index in the
array of indices that is the Block Allocation Table
(or the Small Block Allocation Table) as well as the
index of the first block in the file. This is used by
files and the root entry.</li>
<li>At offset <b>0x78</b> is an integer giving the total
<b>actual size</b> of the file pointed at by this
property. If the file size is less than 4096, the file
is stored in small blocks and the SBAT is used to walk
the small blocks making up the file. If the file size
is 4096 or larger, the file is stored in big blocks
and the main BAT is used to walk the big blocks making
up the file. The exception to this rule is the <b>Root
Entry</b>, which, regardless of its size, is
<b>always</b> stored in big blocks and the main BAT is
used to walk the big blocks making up this special
file.</li>
</ul>
</s3>
<s3 title="Root Entry">
<p>The <b>Root Entry</b> in the <b>Property Table</b>
contains the information necessary to read and write
small files, which are files less than 4096 bytes
long. The start block field of the Root Entry is the
start index of the <b>Small Block Array</b>, which is
read like any other file in the POIFS file system. Since
the SBAT cannot be used without the Small Block Array,
the Root Entry MUST be read or written using the <b>Block
Allocation Table</b>. The blocks making up the Small
Block Array are divided into 64-byte small blocks, up to
the size indicated in the Root Entry (which should always
be a multiple of 64).</p>
</s3>
<s3 title="Walking the Nodes of the Property Table">
<p>The individual properties form a directory tree, with the
<b>Root Entry</b> as the directory tree's root, as shown
in the accompanying drawing. Note the numbers in
parentheses in each node; they represent the node's index
in the array of properties. The <b>NEXT_PROP</b>,
<b>PREVIOUS_PROP</b>, and <b>CHILD_PROP</b> fields hold
these indices, and are used to navigate the tree.</p>
<img src="images/PropertySet.jpg" />
<p>Each directory entry (i.e., a property whose type is
<b>directory</b> or <b>root entry</b>) uses its
<b>CHILD_PROP</b> field to point to one of its
subordinate (child) properties. It doesn't seem to matter
which of its children it points to. Thus in the previous
drawing, the Root Entry's CHILD_PROP field may contain 1,
4, or the index of one of its other children. Similarly,
the directory node (index 1) may have, in its CHILD_PROP
field, 2, 3, or the index of one of its other
children.</p>
<p>The children of a given directory property point to each
other in a similar fashion by using their
<b>NEXT_PROP</b> and <b>PREVIOUS_PROP</b> fields.</p>
<p>Unused <b>NEXT_PROP</b>, <b>PREVIOUS_PROP</b>, and
<b>CHILD_PROP</b> fields contain the marker value of
-1. All file properties have a value of -1 for their
CHILD_PROP fields for example.</p>
</s3>
<s3 title="Block Allocation Table">
<p>The <b>BAT blocks</b> are pointed at by the bat array
contained in the header and supplemented, if necessary,
by the <b>XBAT blocks</b>. These blocks form a large
table of integers. These integers are block numbers. The
<b>Block Allocation Table</b> holds chains of integers.
These chains are terminated with -2. The elements in
these chains refer to blocks in the files. The starting
block of a file is NOT specified in the BAT. It is
specified by the <b>property</b> for a given file. The
elements in this BAT are both the block number (within
the file minus the header) <b>and</b> the number of the
next BAT element in the chain. This can be thought of as
a linked list of blocks. The BAT array contains the links
from one block to the next, including the end of chain
marker.</p>
<p>Here's an example: Let's assume that the BAT begins as
follows:</p>
<p><code>BAT[ 0 ] = 2</code></p>
<p><code>BAT[ 1 ] = 5</code></p>
<p><code>BAT[ 2 ] = 3</code></p>
<p><code>BAT[ 3 ] = 4</code></p>
<p><code>BAT[ 4 ] = 6</code></p>
<p><code>BAT[ 5 ] = -2</code></p>
<p><code>BAT[ 6 ] = 7</code></p>
<p><code>BAT[ 7 ] = -2</code></p>
<p><code>...</code></p>
<p>Now, if we have a file whose Property Table entry says it
begins with index 0, we walk the BAT array and see that
the file consists of blocks 0 (because the start block is
0), 2 (because BAT[ 0 ] is 2), 3 (BAT[ 2 ] is 3), 4 (BAT[
3 ] is 4), 6 (BAT[ 4 ] is 6), and 7 (BAT[ 6 ] is 7). It
ends at block 7 because BAT[ 7 ] is -2, which is the end
of chain marker.</p>
<p>Similarly, a file beginning at index 1 consists of
blocks 1 and 5.</p>
<p>Other special numbers in a BAT array are:</p>
<ul>
<li>-1, which indicates an unused block</li>
<li>-3, which indicates a &quot;special&quot; block, such
as a block used to make up the Small Block Array, the
Property Table, the main BAT, or the SBAT</li>
</ul>
</s3>
<s2 title="File System Structures">
<p>The following outlines the basic file system structures.</p>
<s3 title="Header (block 1) -- 512 (0x200) bytes">
<table>
<tr>
<td><b>Field</b></td>
<td><b>Description</b></td>
<td><b>Offset</b></td>
<td><b>Length</b></td>
<td><b>Default value or const</b></td>
</tr>
<tr>
<td>FILETYPE</td>
<td>Magic number identifying this as a POIFS file
system.</td>
<td>0x0000</td>
<td>Long</td>
<td>0xE11AB1A1E011CFD0</td>
</tr>
<tr>
<td>UK1</td>
<td>Unknown constant</td>
<td>0x0008</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>UK2</td>
<td>Unknown Constant</td>
<td>0x000C</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>UK3</td>
<td>Unknown Constant</td>
<td>0x0014</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>UK4</td>
<td>Unknown Constant (revision?)</td>
<td>0x0018</td>
<td>Short</td>
<td>0x003B</td>
</tr>
<tr>
<td>UK5</td>
<td>Unknown Constant (version?)</td>
<td>0x001A</td>
<td>Short</td>
<td>0x0003</td>
</tr>
<tr>
<td>UK6</td>
<td>Unknown Constant</td>
<td>0x001C</td>
<td>Short</td>
<td>-2</td>
</tr>
<tr>
<td>LOG_2_BIG_BLOCK_SIZE</td>
<td>Log, base 2, of the big block size</td>
<td>0x001E</td>
<td>Short</td>
<td>9 (2 ^ 9 = 512 bytes)</td>
</tr>
<tr>
<td>LOG_2_SMALL_BLOCK_SIZE</td>
<td>Log, base 2, of the small block size</td>
<td>0x0020</td>
<td>Integer</td>
<td>6 (2 ^ 6 = 64 bytes)</td>
</tr>
<tr>
<td>UK7</td>
<td>Unknown Constant</td>
<td>0x0024</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>UK8</td>
<td>Unknown Constant</td>
<td>0x0028</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>BAT_COUNT</td>
<td>Number of elements in the BAT array</td>
<td>0x002C</td>
<td>Integer</td>
<td>required</td>
</tr>
<tr>
<td>PROPERTIES_START</td>
<td>Block index of the first block of the property
table</td>
<td>0x0030</td>
<td>Integer</td>
<td>required</td>
</tr>
<tr>
<td>UK9</td>
<td>Unknown Constant</td>
<td>0x0034</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>UK10</td>
<td>Unknown Constant</td>
<td>0x0038</td>
<td>Integer</td>
<td>0x00001000</td>
</tr>
<tr>
<td>SBAT_START</td>
<td>Block index of first big block containing the small
block allocation table (SBAT)</td>
<td>0x003C</td>
<td>Integer</td>
<td>-2</td>
</tr>
<tr>
<td>UK11</td>
<td>Unknown Constant</td>
<td>0x0040</td>
<td>Integer</td>
<td>1</td>
</tr>
<tr>
<td>XBAT_START</td>
<td>Block index of the first block in the Extended Block
Allocation Table (XBAT)</td>
<td>0x0044</td>
<td>Integer</td>
<td>-2</td>
</tr>
<tr>
<td>XBAT_COUNT</td>
<td>Number of elements in the Extended Block Allocation
Table (to be added to the BAT)</td>
<td>0x0048</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>BAT_ARRAY</td>
<td>Array of block indices constituting the Block
Allocation Table (BAT)</td>
<td>0x004C, 0x0050, 0x0054 ... 0x01FC</td>
<td>Integer[]</td>
<td>-1 for unused elements, at least first element must
be filled.</td>
</tr>
<tr>
<td>N/A</td>
<td>Header block data not otherwise described in this
table</td>
<td>N/A</td>
<td>N/A</td>
<td>-1</td>
</tr>
</table>
</s3>
<s3 title="Block Allocation Table Block -- 512 (0x200) bytes">
<table>
<tr>
<td><B>Field</B></td>
<td><B>Description</B></td>
<td><B>Offset</B></td>
<td><B>Length</B></td>
<td><B>Default value or const</B></td>
</tr>
<tr>
<td>BAT_ELEMENT</td>
<td>Any given element in the BAT block</td>
<td>0x0000, 0x0004, 0x0008, ... 0x01FC</td>
<td>Integer</td>
<td>
<ul>
<li>-1 = unused</li>
<li>-2 = end of chain</li>
<li>-3 = special (e.g., BAT block)</li>
</ul>
<p>All other values point to the next element in the
chain and the next index of a block composing the
file.</p>
</td>
</tr>
</table>
</s3>
<s3 title="Property Block -- 512 (0x200) byte block">
<table>
<tr>
<td><B>Field</B></td>
<td><B>Description</B></td>
<td><B>Offset</B></td>
<td><B>Length</B></td>
<td><B>Default value or const</B></td>
</tr>
<tr>
<td>Properties[]</td>
<td>This block contains the properties.</td>
<td>0x0000, 0x0080, 0x0100, 0x0180</td>
<td>128 bytes</td>
<td>All unused space is set to -1.</td>
</tr>
</table>
</s3>
<s3 title="Property -- 128 (0x80) byte block">
<table>
<tr>
<td><B>Field</B></td>
<td><B>Description</B></td>
<td><B>Offset</B></td>
<td><B>Length</B></td>
<td><B>Default value or const</B></td>
</tr>
<tr>
<td>NAME</td>
<td>A unicode null-terminated uncompressed 16bit string
(lose the high bytes) containing the name of the
property.</td>
<td>0x00, 0x02, 0x04, ... 0x3E</td>
<td>Short[]</td>
<td>0x0000 for unused elements, field required, 32
(0x40) element max</td>
</tr>
<tr>
<td>NAME_SIZE</td>
<td>Number of characters in the NAME field</td>
<td>0x40</td>
<td>Short</td>
<td>Required</td>
</tr>
<tr>
<td>PROPERTY_TYPE</td>
<td>Property type (directory, file, or root)</td>
<td>0x42</td>
<td>Byte</td>
<td>1 (directory), 2 (file), or 5 (root entry)</td>
</tr>
<tr>
<td>NODE_COLOR</td>
<td>Node color</td>
<td>0x43</td>
<td>Byte</td>
<td>0 (red) or 1 (black)</td>
</tr>
<tr>
<td>PREVIOUS_PROP</td>
<td>Previous property index</td>
<td>0x44</td>
<td>Integer</td>
<td>-1</td>
</tr>
<tr>
<td>NEXT_PROP</td>
<td>Next property index</td>
<td>0x48</td>
<td>Integer</td>
<td>-1</td>
</tr>
<tr>
<td>CHILD_PROP</td>
<td>First child property index</td>
<td>0x4c</td>
<td>Integer</td>
<td>-1</td>
</tr>
<tr>
<td>SECONDS_1</td>
<td>Seconds component of the created timestamp?</td>
<td>0x64</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>DAYS_1</td>
<td>Days component of the created timestamp?</td>
<td>0x68</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>SECONDS_2</td>
<td>Seconds component of the modified timestamp?</td>
<td>0x6C</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>DAYS_2</td>
<td>Days component of the modified timestamp?</td>
<td>0x70</td>
<td>Integer</td>
<td>0</td>
</tr>
<tr>
<td>START_BLOCK</td>
<td>Starting block of the file, used as the first block
in the file and the pointer to the next block from
the BAT</td>
<td>0x74</td>
<td>Integer</td>
<td>Required</td>
</tr>
<tr>
<td>SIZE</td>
<td>Actual size of the file this property points
to. (used to truncate the blocks to the real
size).</td>
<td>0x78</td>
<td>Integer</td>
<td>0</td>
</tr>
</table>
</s3>
</s2>
</s1>
</body>
</document>

View File

@ -1,837 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="CONTENT-TYPE" CONTENT="text/html; charset=iso-8859-1">
<TITLE></TITLE>
<META NAME="GENERATOR" CONTENT="StarOffice/5.2 (Linux)">
<META NAME="AUTHOR" CONTENT=" ">
<META NAME="CREATED" CONTENT="20010728;10223600">
<META NAME="CHANGEDBY" CONTENT="Marc Johnson">
<META NAME="CHANGED" CONTENT="20010810;13415800">
<STYLE>
<!--
@page { margin-left: 1.25in; margin-right: 1.25in; margin-top: 1in; margin-bottom: 1in }
H1 { margin-bottom: 0.08in; font-size: 16pt }
TD P { margin-bottom: 0.08in }
H2 { margin-bottom: 0.08in; font-size: 14pt; font-style: italic }
H3 { margin-bottom: 0.08in }
H4 { margin-bottom: 0.08in; font-size: 11pt; font-style: italic }
P { margin-bottom: 0.08in }
-->
</STYLE>
</HEAD>
<BODY>
<H1>POI Filesystem format</H1>
<H2>Introduction</H2>
<P STYLE="margin-bottom: 0in; font-weight: medium">
The POI file format is essentially an archive wrapper
around files. It is intended to mimic a filesystem. For
the remainder of this document it is referred to as a
filesystem in order to avoid confusion with the
&quot;files&quot; it contains.
</P>
<P STYLE="margin-bottom: 0in; font-weight: medium; text-decoration: none">
POI filesystems are compatible with those document formats
used by a well-known software company's popular office
productivity suite and programs outputting compatible
data. Because the POI filesystem does not provide
compression, encryption or any other worthwhile feature,
its not a good choice unless you require interoperability
with these programs.
</P>
<P STYLE="margin-bottom: 0in; font-weight: medium">
The POI filesystem does not encode the documents
themselves. For example, if you had a word processor file
with the extension &quot;.doc&quot;, you would actually
have a POI filesystem with a document file archived inside
of the filesystem.
</P>
<H2>Document Conventions</H2>
<P STYLE="margin-bottom: 0in">
This document utilizes the numeric types as described by
the Java Language Specification, which can be found at
java.sun.com. In short:
</P>
<UL>
<LI>
<P STYLE="margin-bottom: 0in">
a byte is an 8 bit signed integer ranging from
(-128) to 127.
</P>
</LI>
<LI>
<P STYLE="margin-bottom: 0in">
a short is a 16 bit signed integer ranging from
(-32768) to 32767
</P>
</LI>
<LI>
<P STYLE="margin-bottom: 0in">
an int is a 32 bit signed integer ranging from
(-2.14e+9) to 2.14e+9
</P>
</LI>
<LI>
<P STYLE="margin-bottom: 0in">
a long is a 64 bit signed integer ranging from
(-9.22e+18) to 9.22e+18
</P>
</LI>
</UL>
<P STYLE="margin-bottom: 0in">
The Java Language Specification spells out a number of
other types that are not referred to by this document.
</P>
<P STYLE="margin-bottom: 0in">
Where this document makes references to &quot;endian
conversion&quot; it is referring to the byte order of
stored numbers. Numbers in &quot;little-endian order&quot;
are stored with the LEAST significant byte first. In order
to properly read a short, for example, you'd read two
bytes and then shift the second byte 8 bits to the left
before performing an <CODE>or</CODE> operation to it
against the first byte while stripping the
&quot;sign&quot; from the first byte. The following code
illustrates this method:
</P>
<P STYLE="text-decoration: none">
<FONT FACE="Courier, monospace"><FONT
SIZE=2><B>public int getShort (byte[ ] rec)
{</B></FONT></FONT>
</P>
<P>
<FONT FACE="Courier, monospace"><FONT SIZE=2><B>return (
(rec[1] &lt;&lt; 8) | (rec[0] &amp; 0xff)
);</B></FONT></FONT>
</P>
<P>
<FONT FACE="Courier, monospace"><FONT
SIZE=2><B>}</B></FONT></FONT>
</P>
<H2>Filesystem Introduction</H2>
<P STYLE="margin-bottom: 0in">
POI filesystems are essentially normal files stored on a
Java-compatible platform's native filesystem. They are
identified by names ending in a four character identifier
noting what type of data they contain. For example, a file
ending in &quot;.xls&quot; would likely contain
spreadsheet data, and a file ending in &quot;.doc&quot;
would probably contain a word processing document. POI
filesystems are called &quot;filesystem&quot;, because
they contain multiple embedded files in a manner similar
to traditional filesystems. Along functional lines, it
would be more accurate to call these POI archives.
</P>
<P STYLE="margin-bottom: 0in">
POI filesystems do not provide encryption, compression, or
any other feature of a modern archive and are therefore a
poor choice for implementing new file formats. It is
suggested that POI filesystems are most useful for
interoperability with legacy applications that use a
compatible file format.
</P>
<H2>Filesystem Walkthrough</H2>
<P STYLE="margin-bottom: 0in">
This is a walkthrough of a POI filesystem and how it is
put together. It is not intended to give a concise
description but to give a &quot;big picture&quot; of the
general structure and how it's interpreted.
</P>
<P STYLE="margin-bottom: 0in">
A POI filesystem begins with a <A
HREF="HeaderBlock"><B><I>header</I></B></A>. This header
identifies locations in the file by function and provides
a sanity check identifying a native filesystem file as
indeed a POI filesystem.
</P>
<P STYLE="margin-bottom: 0in">
The first 64 bits of the header compose a <B><I>magic
number identifier.</I></B> This identifier tells the
client software that this is indeed a POI filesystem and
that it should be treated as such. This is a &quot;sanity
check&quot; to make sure this is a POI filesystem and not
some other format. The header also contains an <B><I>array
of block numbers</I></B>. These block numbers refer to
blocks in the file. When these blocks are read together
they form the <A HREF="#BAT"><B><I>Block Allocation
Table</I></B></A>. The header also contains a pointer to
the first element in the <A
HREF="#PropertyTable"><B><I>property table</I></B></A>
also known as the <A HREF="RootEntry"><B><I>root
element</I></B></A>, and a pointer to the <B>small Block
Allocation Table (SBAT)</B>.
</P>
<P STYLE="margin-bottom: 0in">
The <A HREF="#BAT"><B><I>block allocation
table</I></B></A> or <B><I>BAT</I></B>, along with the <A
HREF="#PropertyTable"><B><I>property table</I></B></A>
specify which blocks in the filesystem belong to which
files. It is somewhat hard to conceptualize the Block
Allocation Table at first. The block allocation table is
essentially an array of integers that point at each
other. These elements form chains.
</P>
<P STYLE="margin-bottom: 0in">
To read the <A HREF="#BAT"><B><I>block allocation
table</I></B></A> you must first read the <B><I>start
block </I></B>of the file from the <A
HREF="#PropertyTable"><B><I>property
table</I></B></A>. This is both your index for the next
element in the <B><I>BAT </I></B>array as well as the
index of the first block in your file. For instance: if
the <B><I>start block</I></B> from your file's property is
0 then you read block 0 (the first block after the header)
from your filesystem as the first block of your file. You
also read element 0 from the <B><I>BAT array</I></B>.
Supposing this element has a value equal to 2, you'd read
block 2 from your filesystem as the next block of your
file and element 2 from your <B><I>BAT array</I></B>.
This will be covered further later in this document.
</P>
<P STYLE="margin-bottom: 0in">
The <A HREF="#PropertyTable"><B><I>Property
Table</I></B></A> is essentially the directory structure
for the filesystem. It consists of the name of the file or
directory, its <B><I>start block</I></B> in both the
filesystem and <B><I>BAT</I></B>, and its actual size.
The first property in the <A
HREF="#PropertyTable">property table</A> is the <A
HREF="RootEntry"><B><I>root element</I></B></A>. Its real
purpose is to hold the start block for the <B><I>small
blocks.</I></B>
</P>
<H3>Filesystem Structure</H3>
<P STYLE="margin-bottom: 0in; font-weight: medium">
All values in the POI filesystem are stored in
&quot;little-endian&quot; order, meaning you must reverse
the order of the bytes before assigning them to
variables. Assume the values you see below are originally
stored backwards.
</P>
<P STYLE="margin-bottom: 0in; font-weight: medium">
The POI filesystem is divided into 512 byte blocks. Each
block has an implicit block-type. The order and
description of these is described below.
</P>
<A NAME="HeaderBlock"><H3>Header Block</H3></A>
<P STYLE="margin-bottom: 0in; font-weight: medium">
The POI filesystem begins with a <B><I>header
block</I></B>. The first 64 bits of the header form a long
<B><I>file type id</I></B> or <B><I>magic number
identifier</I></B> of
<CODE>0xE11AB1A1E011CFD0L</CODE>. This is basically a
sanity check. If this isn't the first thing in the header
(and consequently the filesystem) then this is not a POI
filesystem and should be read with some other library.
</P>
<P STYLE="margin-bottom: 0in; font-weight: medium">
It's important to know the most important parts of the
header. These are discussed in the rest of this
section.
</P>
<H4>BATs</H4>
<P STYLE="margin-bottom: 0in">
At offset <B>0x2c</B> is an int specifying the number of
elements in the <B><I>BAT array</I></B>. The array at
<B>0x4c</B> an array of ints. This array contains the
indices of every block in the <A HREF="#BAT">Block
Allocation Table</A>.
</P>
<H4><I><B>XBATs</B></I></H4>
<P STYLE="margin-bottom: 0in">
Very large POI archives may have more blocks than can be
addressed by the BAT blocks enumerated in the header
block. How large? Well, the BAT array in the header can
contain up to 109 BAT block indices; each BAT block
references up to 128 blocks, and each block is 512 bytes,
so we're talking about 109 * 128 * 512 = 6.8MB. That's a
pretty respectable document! But, you could have much more
data than that, and in today's world of cheap gigabyte
drives, why not? So, the BAT may be extended in that
event. The integer value at offset <B>0x44</B> of the
header is the index of the first <B><I>extended BAT (XBAT)
block</I></B>. At offset <B>0x48</B> of the header, there
is an int value that specifies how many XBAT blocks there
are. The XBAT blocks begin at the specified index into the
array of blocks making up the POI filesystem, and continue
in sequence for the specified count of XBAT blocks.
</p>
<p>
Each XBAT block contains the indices of up to 128 BAT
blocks, so the document size can be expanded by another
8MB for each XBAT block. The BAT blocks indexed by an XBAT
block are appended to the end of the list of BAT blocks
enumerated in the header block. Thus the BAT blocks
enumerated in the header block are BAT blocks 0 through
108, the BAT blocks enumerated in the first XBAT block are
BAT blocks 109 through 236, the BAT blocks enumerated in
the second XBAT block are BAT blocks 237 through 364, and
so on.
</P>
<p>
Through the use of XBAT blocks, the limit on the overall
document size is that imposed by the 4-byte block indices;
if the indices are unsigned ints, the maximum file size is
2 terabytes, 1 terabyte if the indices are treated as
signed ints. Either way, I have yet to see a disk drive
large enough to accommodate such a file on the shelves at
the local office supply stores.
</p>
<H4>SBATs</H4>
<P STYLE="margin-bottom: 0in">
If a file contained in a POI archive is smaller than 4096
bytes, it is stored in small blocks. Small blocks are 64
bytes in length and are contained within big blocks, up to
8 to a big block. As the main BAT is used to navigate the
array of big blocks, so the <B><I>small block allocation
table</I></B> is used to navigate the array of small
blocks. The SBAT's start block index is found at offset
<B>0x3C</B> of the header block, and remaining blocks
constituting the SBAT are found by walking the main BAT as
if it were an ordinary file in the POI filesystem (this
process is described below).
</P>
<H4>Property Table Start Index</H4>
<P STYLE="margin-bottom: 0in">
An integer at address <B>0x30</B> specifies the start
index of the <A HREF="#PropertyTable">property
table</A>. This integer is specified as a
<B><I>&quot;block index&quot;. </I></B>The <A
HREF="#PropertyTable">Property Table</A> is stored, as is
almost everything in a POI file system, in big blocks and
walked via the BAT. The <A HREF="#PropertyTable">Property
Table</A> is described below.
</P>
<A NAME="PropertyTable"><H3>Property Table</H3></A>
<P STYLE="margin-bottom: 0in">
The property table is essentially nothing more than the
directory system. Properties are 128 byte records
contained within the 512 byte blocks. The first property
is always the <A HREF="RootEntry">Root Entry</A>. The
following applies to individual properties within a
property table:
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x00</B> in the property is the
&quot;<B><I>name</I></B>&quot;. This is stored as an
uncompressed 16 bit unicode string. In short every other
byte corresponds to an &quot;ASCII&quot; character. The
size of this string is stored at offset <B>0x40</B>
(<B><I>string size</I></B>) as a short.
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x42</B> is the <B><I>property type</I></B>
(byte). The type is 1 for directory, 2 for file or 5 for
the Root Entry.
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x43</B> is the <B><I>node color</I></B>
(byte). The color is either 1, (black), or 0,
(red). Properties are apparently meant to be arranged in a
red-black binary tree, subject to the following rules:
<A name="node_rules"></A>
<OL>
<LI>The root of the tree is always black
<LI>Two consecutive nodes cannot both be red
<LI>A property is less than another property if its
name length is less than the other property's name
length
<LI>If two properties have the same name length, the
sort order is determined by the sort order of the
properties' names.
</OL>
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x44</B> is the index (int) of the
<B><I>previous property</I></B>.
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x48</B> is the index (int) of the <B><I>next
property</I></B>.
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x4C</B> is the index (int) of the
<B><I>first directory entry</I></B>.
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x74</B> is an integer giving the <B><I>start
block</I></B> for the file described by this
property. This index corresponds to an index in the array
of indices that is the Block Allocation Table (or the
Small Block Allocation Table) as well as the index of the
first block in the file.
</P>
<P STYLE="margin-bottom: 0in">
At offset <B>0x78</B> is an integer giving the total
<B><I>actual size</I></B> of the file pointed at by this
property. If the file size is less than 4096, the file is
stored in small blocks and the SBAT is used to walk the
small blocks making up the file. If the file size is 4096
or larger, the file is stored in big blocks and the main
BAT is used to walk the big blocks making up the file. The
exception to this rule is the <B><I>Root Entry</I></B>,
which, regardless of its size, is ALWAYS stored in big
blocks and the main BAT is used to walk the big blocks
making up this special file.
</P>
<A NAME="RootEntry"><H3>Root Entry</H3></A>
<P STYLE="margin-bottom: 0in">
The <B><I>Root Entry</I></B> in the <A
HREF="#PropertyTable"><B><I>Property Table</I></B></A>
contains the information necessary to read and write small
files, which are files less than 4096 bytes long. The
start block field of the Root Entry is the start index of
the <B><I>Small Block Array</I></B>, which is read like
any other file in the POI filesysstem. Since the SBAT
cannot be used without the Small Block Array, the Root
Entry MUST be read or written using the <A
HREF="#BAT"><B><I>Block Allocation Table</I></B></A>. The
blocks making up the Small Block Array are divided into
64-byte small blocks, up to the size indicated in the Root
Entry (which should always be a multiple of 64)
</P>
<H3>Walking the Nodes of the <A HREF="#PropertyTable">Property
Table</A></H3>
<P STYLE="margin-bottom: 0in">
The individual properties form a directory tree, with the
<B><I>Root Entry</I></B> as the directory tree's root, as
shown in the accompanying drawing. Note the numbers in
parentheses in each node; they represent the node's index
in the array of properties. The <B>NEXT_PROP</B>,
<B>PREVIOUS_PROP</B>, and <B>CHILD_PROP</B> fields hold
these indices, and are used to navigate the tree.
</P>
<P>
<IMG SRC="PropertySet.jpg">
</P>
<P STYLE="margin-bottom: 0in">
Each <A NAME="directoryEntry">directory entry</A> (i.e., a
property whose type is <B><I>directory</I></B> or
<B><I>root entry</I></B>) uses its <B>CHILD_PROP</B> field
to point to one of its subordinate (child) properties. It
doesn't seem to matter which of its children it points
to. Thus in the previous drawing, the Root Entry's
CHILD_PROP field may contain 1, 4, or the index of one of
its other children. Similarly, the directory node (index
1) may have, in its CHILD_PROP field, 2, 3, or the index
of one of its other children.
</P>
<P STYLE="margin-bottom: 0in">
The children of a given <A
HREF="#directoryEntry">directory property</A> point to
each other in a similar fashion by using their
<B>NEXT_PROP</B> and <B>PREVIOUS_PROP</B> fields. The
ordering of the children is governed by rules described <a
href="#node_rules">here</a>
</P>
<P STYLE="margin-bottom: 0in">
Unused <B>NEXT_PROP</B>, <B>PREVIOUS_PROP</B>, and
<B>CHILD_PROP</B> fields contain the marker value of
-1. All file properties have a value of -1 for their
CHILD_PROP fields for example.
</P>
<A NAME="BAT"><H3>Block Allocation Table</H3></A>
<P STYLE="margin-bottom: 0in">
The <B><I>BAT blocks</I></B> are pointed at by the bat
array contained in the <A HREF="HeaderBlock">header</A>
and supplemented, if necessary, by the <B><I>XBAT
blocks</I></B>. These blocks form a large table of
integers. These integers are block numbers. The
<B><I>Block Allocation Table</I></B> holds chains of
integers. These chains are terminated with -2. The
elements in these chains refer to blocks in the files. The
starting block of a file is NOT specified in the BAT. It
is specified by the <B><I>property</I></B> for a given
file. The elements in this BAT are both the block number
(within the file minus the header) AND the number of the
next BAT element in the chain. This can be thought of as a
linked list of blocks. The BAT array contains the links
from one block to the next, including the end of chain
marker.
</P>
<P>
Here's an example: Let's assume that the BAT begins as
follows:
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 0 ] = 2</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 1 ] = 5</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 2 ] = 3</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 3 ] = 4</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 4 ] = 6</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 5 ] =
-2</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 6 ] = 7</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<FONT FACE="Courier, monospace"><B>BAT[ 7 ] =
-2</B></FONT>
</P>
<P STYLE="margin-bottom: 0in">
<B>...</B>
</P>
<P STYLE="margin-bottom: 0in">
Now, if we have a file whose <A
HREF="#PropertyTable">Property Table</A> entry says it
begins with index 0, we walk the BAT array and see that
the file consists of blocks 0 (because the start block is
0), 2 (because BAT[ 0 ] is 2), 3 (BAT[ 2 ] is 3), 4 (BAT[
3 ] is 4), 6 (BAT[ 4 ] is 6), and 7 (BAT[ 6 ] is 7). It
ends at block 7 because BAT[ 7 ] is -2, which is the end
of chain marker.
</P>
<P STYLE="margin-bottom: 0in">
Similarly, a file beginning at index 1 consists of
blocks 1 and 5.
</P>
<P STYLE="margin-bottom: 0in">
Other special numbers in a BAT array are:
</P>
<UL>
<LI>
<P STYLE="margin-bottom: 0in">
-1, which indicates an unused block
</P>
</LI>
<LI>
<P STYLE="margin-bottom: 0in">
-3, which indicates a &quot;special&quot; block,
such as a block used to make up the Small Block
Array, the <A HREF="#PropertyTable">Property
Table</A>, the main BAT, or the SBAT
</P>
</LI>
</UL>
<H2>Filesystem Structures</H2>
<P>
The following outlines the basic filesystem structures.
</P>
<H3>Header (block 1) -- 512 (0x200) bytes</H3>
<TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
<TR VALIGN=TOP>
<TD><B>Field</B></TD>
<TD><B>Description</B></TD>
<TD><B>Offset</B></TD>
<TD><B>Length</B></TD>
<TD><B>Default value or const</B></TD>
</TR>
<TR VALIGN=TOP>
<TD>FILETYPE</TD>
<TD>Magic number identifying this as a POI
filesystem.</TD>
<TD>0x0000</TD>
<TD>Long</TD>
<TD>0xE11AB1A1E011CFD0</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK1</TD>
<TD>Unknown constant</TD>
<TD>0x0008</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK2</TD>
<TD>Unknown Constant</TD>
<TD>0x000C</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK3</TD>
<TD>Unknown Constant</TD>
<TD>0x0014</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK4</TD>
<TD>Unknown Constant (revision?)</TD>
<TD>0x0018</TD>
<TD>Short</TD>
<TD>0x003B</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK5</TD>
<TD>Unknown Constant (version?)</TD>
<TD>0x001A</TD>
<TD>Short</TD>
<TD>0x0003</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK6</TD>
<TD>Unknown Constant</TD>
<TD>0x001C</TD>
<TD>Short</TD>
<TD>-2</TD>
</TR>
<TR VALIGN=TOP>
<TD>LOG_2_BIG_BLOCK_SIZE</TD>
<TD>Log, base 2, of the big block size</TD>
<TD>0x001E</TD>
<TD>Short</TD>
<TD>9 (2 ^ 9 = 512 bytes)</TD>
</TR>
<TR VALIGN=TOP>
<TD>LOG_2_SMALL_BLOCK_SIZE</TD>
<TD>Log, base 2, of the small block size</TD>
<TD>0x0020</TD>
<TD>Integer</TD>
<TD>6 (2 ^ 6 = 64 bytes)</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK7</TD>
<TD>Unknown Constant</TD>
<TD>0x0024</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK8</TD>
<TD>Unknown Constant</TD>
<TD>0x0028</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>BAT_COUNT</TD>
<TD>Number of elements in the BAT array</TD>
<TD>0x002C</TD>
<TD>Integer</TD>
<TD>required</TD>
</TR>
<TR VALIGN=TOP>
<TD>PROPERTIES_START</TD>
<TD>Block index of the first block of the <A
HREF="#PropertyTable">property table</A></TD>
<TD>0x0030</TD>
<TD>Integer</TD>
<TD>required</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK9</TD>
<TD>Unknown Constant</TD>
<TD>0x0034</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK10</TD>
<TD>Unknown Constant</TD>
<TD>0x0038</TD>
<TD>Integer</TD>
<TD>0x00001000</TD>
</TR>
<TR VALIGN=TOP>
<TD>SBAT_START</TD>
<TD>Block index of first big block containing the
small block allocation table (SBAT)</TD>
<TD>0x003C</TD>
<TD>Integer</TD>
<TD>-2</TD>
</TR>
<TR VALIGN=TOP>
<TD>UK11</TD>
<TD>Unknown Constant</TD>
<TD>0x0040</TD>
<TD>Integer</TD>
<TD>1</TD>
</TR>
<TR VALIGN=TOP>
<TD>XBAT_START</TD>
<TD>Block index of the first block in the Extended
Block Allocation Table (XBAT)</TD>
<TD>0x0044</TD>
<TD>Integer</TD>
<TD>-2</TD>
</TR>
<TR VALIGN=TOP>
<TD>XBAT_COUNT</TD>
<TD>Number of elements in the Extended Block
Allocation Table (to be added to the BAT)</TD>
<TD>0x0048</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>BAT_ARRAY</TD>
<TD>Array of block indicies constituting the <A
HREF="#BAT">Block Allocation Table (BAT)</A></TD>
<TD>0x004C, 0x0050, 0x0054 ... 0x01FC</TD>
<TD>Integer[ ]</TD>
<TD>-1 for unused elements, at least first element
must be filled.</TD>
</TR>
<TR VALIGN=TOP>
<TD>N/A</TD>
<TD>Header block data not otherwise described in this
table</TD>
<TD>N/A</TD>
<TD>N/A</TD>
<TD>-1</TD>
</TR>
</TABLE>
<A HREF="#BAT"><H3><B>Block Allocation Table Block -- 512
(0x200) bytes</B></H3></A>
<TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
<TR VALIGN=TOP>
<TD><B>Field</B></TD>
<TD><B>Description</B></TD>
<TD><B>Offset</B></TD>
<TD><B>Length</B></TD>
<TD><B>Default value or const</B></TD>
</TR>
<TR VALIGN=TOP>
<TD>BAT_ELEMENT</TD>
<TD>Any given element in the BAT block</TD>
<TD>0x0000, 0x0004, 0x0008, ... 0x01FC</TD>
<TD>Integer</TD>
<TD>-1 = unused<BR>
-2 = end of chain<BR>
-3 = special (e.g., BAT block)<BR>
All other values point to the next element in the
chain and the next index of a block composing the
file.</TD>
</TR>
</TABLE>
<H3>Property Block -- 512 (0x200) byte block</H3>
<TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
<TR VALIGN=TOP>
<TD><B>Field</B></TD>
<TD><B>Description</B></TD>
<TD><B>Offset</B></TD>
<TD><B>Length</B></TD>
<TD><B>Default value or const</B></TD>
</TR>
<TR VALIGN=TOP>
<TD>Properties[ ]</TD>
<TD>This block contains the properties.</TD>
<TD>0x0000, 0x0080, 0x0100, 0x0180</TD>
<TD>128 bytes</TD>
<TD>All unused space is set to -1.</TD>
</TR>
</TABLE>
<H3>Property -- 128 (0x80) byte block</H3>
<TABLE BORDER=0 CELLPADDING=4 CELLSPACING=0>
<TR VALIGN=TOP>
<TD><B>Field</B></TD>
<TD><B>Description</B></TD>
<TD><B>Offset</B></TD>
<TD><B>Length</B></TD>
<TD><B>Default value or const</B></TD>
</TR>
<TR VALIGN=TOP>
<TD>NAME</TD>
<TD>A unicode null-terminated uncompressed 16bit
string (lose the high bytes) containing the name
of the property.</TD>
<TD>0x00, 0x02, 0x04, ... 0x3E</TD>
<TD>Short[ ]</TD>
<TD>0x0000 for unused elements, field required, 32
(0x40) element max</TD>
</TR>
<TR VALIGN=TOP>
<TD>NAME_SIZE</TD>
<TD>Number of characters in the NAME field</TD>
<TD>0x40</TD>
<TD>Short</TD>
<TD>Required</TD>
</TR>
<TR VALIGN=TOP>
<TD>PROPERTY_TYPE</TD>
<TD>Property type (directory, file, or root)</TD>
<TD>0x42</TD>
<TD>Byte</TD>
<TD>1 (directory), 2 (file), or 5 (root entry)</TD>
</TR>
<TR VALIGN=TOP>
<TD>NODE_COLOR</TD>
<TD>Node color</TD>
<TD>0x43</TD>
<TD>Byte</TD>
<TD>0 (red) or 1 (black)</TD>
</TR>
<TR VALIGN=TOP>
<TD>PREVIOUS_PROP</TD>
<TD>Previous property index</TD>
<TD>0x44</TD>
<TD>Integer</TD>
<TD>-1</TD>
</TR>
<TR VALIGN=TOP>
<TD>NEXT_PROP</TD>
<TD>Next property index</TD>
<TD>0x48</TD>
<TD>Integer</TD>
<TD>-1</TD>
</TR>
<TR VALIGN=TOP>
<TD>CHILD_PROP</TD>
<TD>First child property index</TD>
<TD>0x4c</TD>
<TD>Integer</TD>
<TD>-1</TD>
</TR>
<TR VALIGN=TOP>
<TD>SECONDS_1</TD>
<TD>Seconds component of the created timestamp?</TD>
<TD>0x64</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>DAYS_1</TD>
<TD>Days since epoch component of the created
timestamp?</TD>
<TD>0x68</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>SECONDS_2</TD>
<TD>Seconds component of the modified timestamp?</TD>
<TD>0x6C</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>DAYS_2</TD>
<TD>Days since epoch component of the modified
timestamp?</TD>
<TD>0x70</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
<TR VALIGN=TOP>
<TD>START_BLOCK</TD>
<TD>Starting block of the file, used as the first
block in the file and the pointer to the next
block from the BAT</TD>
<TD>0x74</TD>
<TD>Integer</TD>
<TD>Required</TD>
</TR>
<TR VALIGN=TOP>
<TD>SIZE</TD>
<TD>Actual size of the file this property points
to. (used to truncate the blocks to the real
size).</TD>
<TD>0x78</TD>
<TD>Integer</TD>
<TD>0</TD>
</TR>
</TABLE>
</BODY>
</HTML>