f1182ce38a
git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353345 13f79535-47bb-0310-9956-ffa450edef68
1012 lines
26 KiB
XML
1012 lines
26 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
|
<!-- $Id$ -->
|
|
|
|
<document>
|
|
<header>
|
|
<title>Jakarta POI - HPSF Internals</title>
|
|
<authors>
|
|
<person name="Rainer Klute" email="klute@rainer-klute.de"/>
|
|
</authors>
|
|
</header>
|
|
<body>
|
|
<section><title>HPSF Internals</title>
|
|
|
|
<section><title>Introduction</title>
|
|
|
|
<p>A Microsoft Office document is internally organized like a filesystem
|
|
with directory and files. Microsoft calls these files
|
|
<strong>streams</strong>. A document can have properties attached to it,
|
|
like author, title, number of words etc. These metadata are not stored in
|
|
the main stream of, say, a Word document, but instead in a dedicated
|
|
stream with a special format. Usually this stream's name is
|
|
<code>\005SummaryInformation</code>, where <code>\005</code> represents
|
|
the character with a decimal value of 5.</p>
|
|
|
|
<p>A single piece of information in the stream is called a
|
|
<strong>property</strong>, for example the document title. Each property
|
|
has an integral <strong>ID</strong> (e.g. 2 for title), a
|
|
<strong>type</strong> (telling that the title is a string of bytes) and a
|
|
<strong>value</strong> (what this is should be obvious). A stream
|
|
containing properties is called a
|
|
<strong>property set stream</strong>.</p>
|
|
|
|
<p>This document describes the internal structure of a property set stream,
|
|
i.e. the <strong>HPSF</strong>. It does
|
|
not describe how a Microsoft Office document is organized internally and
|
|
how to retrieve a stream from it. See the <link
|
|
href="../poifs/index.html">POIFS documentation</link> for that kind of
|
|
stuff.</p>
|
|
|
|
<p>The HPSF is not only used in the Summary
|
|
Information stream in the top-level document of a Microsoft Office
|
|
document. Often there is also a property set stream named
|
|
<code>\005DocumentSummaryInformation</code> with additional properties.
|
|
Embedded documents may have their own property set streams. You cannot
|
|
tell by a stream's name whether it is a property set stream or not.
|
|
Instead you have to open the stream and look at its bytes.</p>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Data Types</title>
|
|
|
|
<p>Before delving into the details of the property set stream format we
|
|
have to have a short look at data types. Integral values are stored in the
|
|
so-called <strong>little endian</strong> format. In this format the bytes
|
|
that make out an integral value are stored in the "wrong" order. For
|
|
example, the decimal value 4660 is 0x1234 in the hexadecimal notation. If
|
|
you think this should be represented by a byte 0x12 followed by another
|
|
byte 0x34, you are right. This is called the <strong>big endian</strong>
|
|
format. In the little endian format, however, this order is reversed and
|
|
the low-value byte comes first: 0x3412.
|
|
</p>
|
|
|
|
<p>The following table gives an overview about some important data
|
|
types:</p>
|
|
|
|
<table>
|
|
|
|
<tr>
|
|
<th>Name</th>
|
|
<th>Length</th>
|
|
<th>Example (Big Endian)</th>
|
|
<th>Example (Little Endian)</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><strong>Bytes</strong></td>
|
|
<td>1 byte</td>
|
|
<td><code>0x12</code></td>
|
|
<td><code>0x12</code></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><strong>Word</strong></td>
|
|
<td>2 bytes</td>
|
|
<td><code>0x1234</code></td>
|
|
<td><code>0x3412</code></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><strong>DWord</strong></td>
|
|
<td>4 bytes</td>
|
|
<td><code>0x12345678</code></td>
|
|
<td><code>0x78563412</code></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td><strong>ClassID</strong><br/>
|
|
A sequence of one DWord, two Words and eight Bytes</td>
|
|
|
|
<td>16 bytes</td>
|
|
|
|
<td><code>0xE0859FF2F94F6810AB9108002B27B3D9</code> resp.
|
|
<code>E0859FF2-F94F-6810-AB-91-08-00-2B-27-B3-D9</code></td>
|
|
|
|
<td><code>0xF29F85E04FF91068AB9108002B27B3D9</code> resp.
|
|
<code>F29F85E0-4FF9-1068-AB-91-08-00-2B-27-B3-D9</code></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td></td>
|
|
<td></td>
|
|
<td>The ClassID examples are given here in two different notations. The
|
|
second notation without the "0x" at the beginning and with dashes
|
|
inside shows the internal grouping into one DWord, two Words and eight
|
|
Bytes.</td>
|
|
<td><em>Watch out:</em> Microsoft documentation and tools show class IDs
|
|
a little bit differently like
|
|
<code>F29F85E0-4FF9-1068-AB91-08002B27B3D9</code>.
|
|
However, that representation is (intentionally?) misleading with
|
|
respect to endianess.</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>HPSF Overview</title>
|
|
|
|
<p>A property set stream consists of three main parts:</p>
|
|
|
|
<ol>
|
|
<li>The <strong>header</strong> and</li>
|
|
<li>the <strong>section(s)</strong> containing the properties.</li>
|
|
</ol>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>The Header</title>
|
|
|
|
<p>The first bytes in a property set stream is the <strong>header</strong>.
|
|
It has a fixed length and looks like this:</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Offset</th>
|
|
<th>Type</th>
|
|
<th>Contents</th>
|
|
<th>Remarks</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>0</td>
|
|
<td>Word</td>
|
|
<td><code>0xFFFE</code></td>
|
|
<td>If the first four bytes of a stream do not contain these values, the
|
|
stream is not a property set stream.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>2</td>
|
|
<td>Word</td>
|
|
<td><code>0x0000</code></td>
|
|
<td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>4</td>
|
|
<td>DWord</td>
|
|
<td>Denotes the operating system and the OS version under which this
|
|
stream was created. The operating system ID is in the DWord's higher
|
|
word (after little endian decoding): <code>0x0000</code> for Win16,
|
|
<code>0x0001</code> for Macintosh and <code>0x0002</code> for Win32 -
|
|
that's all. The reader is most likely aware of the fact that there are
|
|
some more operating systems. However, Microsoft does not seem to
|
|
know.</td>
|
|
<td></td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>8</td>
|
|
<td>ClassID</td>
|
|
<td><code>0x00000000000000000000000000000000</code></td>
|
|
<td>Most property set streams have this value but this is not
|
|
required.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>24</td>
|
|
<td>DWord</td>
|
|
<td><code>0x01000000</code> or greater</td>
|
|
<td>Section count. This field's value should be equal to 1 or greater.
|
|
Microsoft claims that this is a "reserved" field, but it seems to tell
|
|
how many sections (see below) are following in the stream. This would
|
|
really make sense because otherwise you could not know where and how
|
|
far you should read section data.</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Section List</title>
|
|
|
|
<p>Following the header is the section list. This is an array of pairs each
|
|
consisting of a section format ID and an offset. This array has as many
|
|
pairs of ClassID and and DWord fields as the section count field in the
|
|
header says. The Summary Information stream contains a single section, the
|
|
Document Summary Information stream contains two.</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Type</th>
|
|
<th>Contents</th>
|
|
<th>Remarks</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>ClassID</td>
|
|
<td>Section format ID</td>
|
|
<td><code>0xF29F85E04FF91068AB9108002B27B3D9</code> for the single section
|
|
in the Summary Information stream.<br/><br/>
|
|
|
|
<code>0xD5CDD5022E9C101B939708002B2CF9AE</code> for the first
|
|
section in the Document Summary Information stream.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>DWord</td>
|
|
<td>Offset</td>
|
|
<td>The number of bytes between the beginning of the stream and the
|
|
beginning of the section within the stream.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>ClassID</td>
|
|
<td>Section format ID</td>
|
|
<td>...</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>DWord</td>
|
|
<td>Offset</td>
|
|
<td>...</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>...</td>
|
|
<td>...</td>
|
|
<td>...</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Section</title>
|
|
|
|
<p>A section is divided into three parts: the section header (with the
|
|
section length and the number of properties in the section), the
|
|
properties list (with type and offset of each property), and the
|
|
properties themselves. Here are the details:</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th> </th>
|
|
<th>Type</th>
|
|
<th>Contents</th>
|
|
<th>Remarks</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Section header</td>
|
|
|
|
<td>DWord</td>
|
|
<td>Length</td>
|
|
<td>The length of the section in bytes.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td></td>
|
|
<td>DWord</td>
|
|
<td>Property count</td>
|
|
<td>The number of properties in the section.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
|
|
<td>Properties list</td>
|
|
|
|
<td>DWord</td>
|
|
<td>Property ID</td>
|
|
<td>The property ID tells what the property means. For example, an ID of
|
|
<code>0x0002</code> in the Summary Information stands for the document's
|
|
title. See the <link href="#property_ids">Property IDs</link>
|
|
chapter below for more details.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td></td>
|
|
<td>DWord</td>
|
|
<td>Offset</td>
|
|
<td>The number of bytes between the beginning of the section and the
|
|
property.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td></td>
|
|
<td>...</td>
|
|
<td>...</td>
|
|
<td>...</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>Properties</td>
|
|
|
|
<td>DWord</td>
|
|
<td>Property type ("variant")</td>
|
|
<td>This is the property's data type, e.g. an integer value, a byte
|
|
string or a Unicode string. See the
|
|
<link href="#property_types"><em>Property Types</em></link> chapter
|
|
for details!</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td></td>
|
|
<td><em>Field length depends on the property type
|
|
("variant")</em></td>
|
|
<td>Property value</td>
|
|
<td>This field's length depends on the property's type. These are the
|
|
bytes that make out the DWord, the byte string or some other data of
|
|
fixed or variable length.<br/><br/>
|
|
|
|
The property value's length is always stored in an area which is a
|
|
multiple of 4 in length. If the property is shorter, e.g. a byte
|
|
string of 13 bytes, the remaining bytes are padded with <code>0x00</code>
|
|
bytes.</td>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td></td>
|
|
<td>...</td>
|
|
<td>...</td>
|
|
<td>...</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Property IDs</title>
|
|
<anchor id="property_ids"/>
|
|
|
|
<p>As seen above, a section holds a property list: an array with property
|
|
IDs and offsets. The property ID gives each property a meaning. For
|
|
example, in the Summary Information stream the property ID 2 says that
|
|
this property is the document's title.</p>
|
|
|
|
<p>If you want to know a property ID's meaning, it is not sufficient to
|
|
know the ID itself. You must also know the
|
|
<strong>section format ID</strong>. For example, in the Document Summary
|
|
Information stream the property ID 2 means not the document's title but
|
|
its category. Due to Microsoft's infinite wisdom the section format ID is
|
|
not part of the section. Thus if you have only a section without the
|
|
stream it is in, you cannot make any sense of the properties because you
|
|
do not know what they mean.</p>
|
|
|
|
<p>So each section format ID has its own name space of property IDs.
|
|
Microsoft defined some "well-known" property IDs for the Summary
|
|
Information and the Document Summary Information streams. You can extend
|
|
them by your own additional IDs. This will be described below.</p>
|
|
|
|
<section><title>Property IDs in The Summary Information Stream</title>
|
|
|
|
<p>The Summary Information stream has a single section with a section
|
|
format ID of <code>0xF29F85E04FF91068AB9108002B27B3D9</code>. The following
|
|
table defines the meaning of its property IDs. Each row associates a
|
|
property ID with a <em>name</em> and an <em>ID string</em>. (The property
|
|
<em>type</em> is just for informational purposes given here. As we have
|
|
seen above, the type is always given along with the value.)</p>
|
|
|
|
<p>The property <em>name</em> is a readable string which could be
|
|
displayed to the user. However, this string is useful only for users who
|
|
understand English. The property name does not help with other
|
|
languages.</p>
|
|
|
|
<p>The property <em>ID string</em> is about the same but looks more
|
|
technically and is nothing a user should bother with. You could the ID
|
|
string and map it to an appropriate display string in a particular
|
|
language. Of course you could do that with the property ID as well and
|
|
with less overhead, but people (including software developers) tend to be
|
|
better in remembering symbolic constants than remembering numbers.</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Property ID</th>
|
|
<th>Property Name</th>
|
|
<th>Property ID String</th>
|
|
<th>Property Type</th>
|
|
</tr>
|
|
<tr>
|
|
<td>2</td>
|
|
<td>Title</td>
|
|
<td>PID_TITLE</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>3</td>
|
|
<td>Subject</td>
|
|
<td>PID_SUBJECT</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>4</td>
|
|
<td>Author</td>
|
|
<td>PID_AUTHOR</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>5</td>
|
|
<td>Keywords</td>
|
|
<td>PID_KEYWORDS</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>6</td>
|
|
<td>Comments</td>
|
|
<td>PID_COMMENTS</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>7</td>
|
|
<td>Template</td>
|
|
<td>PID_TEMPLATE</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>8</td>
|
|
<td>Last Saved By</td>
|
|
<td>PID_LASTAUTHOR</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>9</td>
|
|
<td>Revision Number</td>
|
|
<td>PID_REVNUMBER</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>10</td>
|
|
<td>Total Editing Time</td>
|
|
<td>PID_EDITTIME</td>
|
|
<td>VT_FILETIME</td>
|
|
</tr>
|
|
<tr>
|
|
<td>11</td>
|
|
<td>Last Printed</td>
|
|
<td>PID_LASTPRINTED</td>
|
|
<td>VT_FILETIME</td>
|
|
</tr>
|
|
<tr>
|
|
<td>12</td>
|
|
<td>Create Time/Date</td>
|
|
<td>PID_CREATE_DTM</td>
|
|
<td>VT_FILETIME</td>
|
|
</tr>
|
|
<tr>
|
|
<td>13</td>
|
|
<td>Last Saved Time/Date</td>
|
|
<td>PID_LASTSAVE_DTM</td>
|
|
<td>VT_FILETIME</td>
|
|
</tr>
|
|
<tr>
|
|
<td>14</td>
|
|
<td>Number of Pages</td>
|
|
<td>PID_PAGECOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>15</td>
|
|
<td>Number of Words</td>
|
|
<td>PID_WORDCOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>16</td>
|
|
<td>Number of Characters</td>
|
|
<td>PID_CHARCOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>17</td>
|
|
<td>Thumbnail</td>
|
|
<td>PID_THUMBNAIL</td>
|
|
<td>VT_CF</td>
|
|
</tr>
|
|
<tr>
|
|
<td>18</td>
|
|
<td>Name of Creating Application</td>
|
|
<td>PID_APPNAME</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>19</td>
|
|
<td>Security</td>
|
|
<td>PID_SECURITY</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Property IDs in The Document Summary Information Stream</title>
|
|
|
|
<p>The Document Summary Information stream has two sections with a section
|
|
format ID of <code>0xD5CDD5022E9C101B939708002B2CF9AE</code> for the first
|
|
one. The following table defines the meaning of the property IDs in the
|
|
first section. See the preceeding section for interpreting the table.</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Property ID</th>
|
|
<th>Property name</th>
|
|
<th>Property ID string</th>
|
|
<th>VT type</th>
|
|
</tr>
|
|
|
|
<tr>
|
|
<td>0</td>
|
|
<td>Dictionary</td>
|
|
<td>PID_DICTIONARY</td>
|
|
<td>[Special format]</td>
|
|
</tr>
|
|
<tr>
|
|
<td>1</td>
|
|
<td>Code page</td>
|
|
<td>PID_CODEPAGE</td>
|
|
<td>VT_I2</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2</td>
|
|
<td>Category</td>
|
|
<td>PID_CATEGORY</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>3</td>
|
|
<td>PresentationTarget</td>
|
|
<td>PID_PRESFORMAT</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>4</td>
|
|
<td>Bytes</td>
|
|
<td>PID_BYTECOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>5</td>
|
|
<td>Lines</td>
|
|
<td>PID_LINECOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>6</td>
|
|
<td>Paragraphs</td>
|
|
<td>PID_PARCOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>7</td>
|
|
<td>Slides</td>
|
|
<td>PID_SLIDECOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>8</td>
|
|
<td>Notes</td>
|
|
<td>PID_NOTECOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>9</td>
|
|
<td>HiddenSlides</td>
|
|
<td>PID_HIDDENCOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>10</td>
|
|
<td>MMClips</td>
|
|
<td>PID_MMCLIPCOUNT</td>
|
|
<td>VT_I4</td>
|
|
</tr>
|
|
<tr>
|
|
<td>11</td>
|
|
<td>ScaleCrop</td>
|
|
<td>PID_SCALE</td>
|
|
<td>VT_BOOL</td>
|
|
</tr>
|
|
<tr>
|
|
<td>12</td>
|
|
<td>HeadingPairs</td>
|
|
<td>PID_HEADINGPAIR</td>
|
|
<td>VT_VARIANT | VT_VECTOR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>13</td>
|
|
<td>TitlesofParts</td>
|
|
<td>PID_DOCPARTS</td>
|
|
<td>VT_LPSTR | VT_VECTOR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>14</td>
|
|
<td>Manager</td>
|
|
<td>PID_MANAGER</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>15</td>
|
|
<td>Company</td>
|
|
<td>PID_COMPANY</td>
|
|
<td>VT_LPSTR</td>
|
|
</tr>
|
|
<tr>
|
|
<td>16</td>
|
|
<td>LinksUpTo Date</td>
|
|
<td>PID_LINKSDIRTY</td>
|
|
<td>VT_BOOL</td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Property Types</title>
|
|
<anchor id="property_types"/>
|
|
|
|
<p>A property consists of a DWord <em>type field</em> followed by the
|
|
property value. The property type is an integer value and tells how the
|
|
data byte following it are to be interpreted. In the Microsoft world it is
|
|
also known as the <em>variant</em>.</p>
|
|
|
|
<p>The <em>Usage</em> column says where a variant type may occur. Not all
|
|
of them are allowed in a property set but just those marked with a [P].
|
|
<strong>[V]</strong> - may appear in a VARIANT, <strong>[T]</strong> - may
|
|
appear in a TYPEDESC, <strong>[P]</strong> - may appear in an OLE property
|
|
set, <strong>[S]</strong> - may appear in a Safe Array.</p>
|
|
|
|
<table>
|
|
<tr>
|
|
<th>Variant ID</th>
|
|
<th>Variant Type</th>
|
|
<th>Usage</th>
|
|
<th>Description</th>
|
|
</tr>
|
|
<tr>
|
|
<td>0</td>
|
|
<td>VT_EMPTY</td>
|
|
<td>[V] [P]</td>
|
|
<td>nothing</td>
|
|
</tr>
|
|
<tr>
|
|
<td>1</td>
|
|
<td>VT_NULL</td>
|
|
<td>[V] [P]</td>
|
|
<td>SQL style Null</td>
|
|
</tr>
|
|
<tr>
|
|
<td>2</td>
|
|
<td>VT_I2</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>2 byte signed int</td>
|
|
</tr>
|
|
<tr>
|
|
<td>3</td>
|
|
<td>VT_I4</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>4 byte signed int</td>
|
|
</tr>
|
|
<tr>
|
|
<td>4</td>
|
|
<td>VT_R4</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>4 byte real</td>
|
|
</tr>
|
|
<tr>
|
|
<td>5</td>
|
|
<td>VT_R8</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>8 byte real</td>
|
|
</tr>
|
|
<tr>
|
|
<td>6</td>
|
|
<td>VT_CY</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>currency</td>
|
|
</tr>
|
|
<tr>
|
|
<td>7</td>
|
|
<td>VT_DATE</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>date</td>
|
|
</tr>
|
|
<tr>
|
|
<td>8</td>
|
|
<td>VT_BSTR</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>OLE Automation string</td>
|
|
</tr>
|
|
<tr>
|
|
<td>9</td>
|
|
<td>VT_DISPATCH</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>IDispatch *</td>
|
|
</tr>
|
|
<tr>
|
|
<td>10</td>
|
|
<td>VT_ERROR</td>
|
|
<td>[V] [T] [S]</td>
|
|
<td>SCODE</td>
|
|
</tr>
|
|
<tr>
|
|
<td>11</td>
|
|
<td>VT_BOOL</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>True=-1, False=0</td>
|
|
</tr>
|
|
<tr>
|
|
<td>12</td>
|
|
<td>VT_VARIANT</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>VARIANT *</td>
|
|
</tr>
|
|
<tr>
|
|
<td>13</td>
|
|
<td>VT_UNKNOWN</td>
|
|
<td>[V] [T] [S]</td>
|
|
<td>IUnknown *</td>
|
|
</tr>
|
|
<tr>
|
|
<td>14</td>
|
|
<td>VT_DECIMAL</td>
|
|
<td>[V] [T] [S]</td>
|
|
<td>16 byte fixed point</td>
|
|
</tr>
|
|
<tr>
|
|
<td>16</td>
|
|
<td>VT_I1</td>
|
|
<td>[T]</td>
|
|
<td>signed char</td>
|
|
</tr>
|
|
<tr>
|
|
<td>17</td>
|
|
<td>VT_UI1</td>
|
|
<td>[V] [T] [P] [S]</td>
|
|
<td>unsigned char</td>
|
|
</tr>
|
|
<tr>
|
|
<td>18</td>
|
|
<td>VT_UI2</td>
|
|
<td>[T] [P]</td>
|
|
<td>unsigned short</td>
|
|
</tr>
|
|
<tr>
|
|
<td>19</td>
|
|
<td>VT_UI4</td>
|
|
<td>[T] [P]</td>
|
|
<td>unsigned short</td>
|
|
</tr>
|
|
<tr>
|
|
<td>20</td>
|
|
<td>VT_I8</td>
|
|
<td>[T] [P]</td>
|
|
<td>signed 64-bit int</td>
|
|
</tr>
|
|
<tr>
|
|
<td>21</td>
|
|
<td>VT_UI8</td>
|
|
<td>[T] [P]</td>
|
|
<td>unsigned 64-bit int</td>
|
|
</tr>
|
|
<tr>
|
|
<td>22</td>
|
|
<td>VT_INT</td>
|
|
<td>[T]</td>
|
|
<td>signed machine int</td>
|
|
</tr>
|
|
<tr>
|
|
<td>23</td>
|
|
<td>VT_UINT</td>
|
|
<td>[T]</td>
|
|
<td>unsigned machine int</td>
|
|
</tr>
|
|
<tr>
|
|
<td>24</td>
|
|
<td>VT_VOID</td>
|
|
<td>[T]</td>
|
|
<td>C style void</td>
|
|
</tr>
|
|
<tr>
|
|
<td>25</td>
|
|
<td>VT_HRESULT</td>
|
|
<td>[T]</td>
|
|
<td>Standard return type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>26</td>
|
|
<td>VT_PTR</td>
|
|
<td>[T]</td>
|
|
<td>pointer type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>27</td>
|
|
<td>VT_SAFEARRAY</td>
|
|
<td>[T]</td>
|
|
<td>(use VT_ARRAY in VARIANT)</td>
|
|
</tr>
|
|
<tr>
|
|
<td>28</td>
|
|
<td>VT_CARRAY</td>
|
|
<td>[T]</td>
|
|
<td>C style array</td>
|
|
</tr>
|
|
<tr>
|
|
<td>29</td>
|
|
<td>VT_USERDEFINED</td>
|
|
<td>[T]</td>
|
|
<td>user defined type</td>
|
|
</tr>
|
|
<tr>
|
|
<td>30</td>
|
|
<td>VT_LPSTR</td>
|
|
<td>[T] [P]</td>
|
|
<td>null terminated string</td>
|
|
</tr>
|
|
<tr>
|
|
<td>31</td>
|
|
<td>VT_LPWSTR</td>
|
|
<td>[T] [P]</td>
|
|
<td>wide null terminated string</td>
|
|
</tr>
|
|
<tr>
|
|
<td>64</td>
|
|
<td>VT_FILETIME</td>
|
|
<td>[P]</td>
|
|
<td>FILETIME</td>
|
|
</tr>
|
|
<tr>
|
|
<td>65</td>
|
|
<td>VT_BLOB</td>
|
|
<td>[P]</td>
|
|
<td>Length prefixed bytes</td>
|
|
</tr>
|
|
<tr>
|
|
<td>66</td>
|
|
<td>VT_STREAM</td>
|
|
<td>[P]</td>
|
|
<td>Name of the stream follows</td>
|
|
</tr>
|
|
<tr>
|
|
<td>67</td>
|
|
<td>VT_STORAGE</td>
|
|
<td>[P]</td>
|
|
<td>Name of the storage follows</td>
|
|
</tr>
|
|
<tr>
|
|
<td>68</td>
|
|
<td>VT_STREAMED_OBJECT</td>
|
|
<td>[P]</td>
|
|
<td>Stream contains an object</td>
|
|
</tr>
|
|
<tr>
|
|
<td>69</td>
|
|
<td>VT_STORED_OBJECT</td>
|
|
<td>[P]</td>
|
|
<td>Storage contains an object</td>
|
|
</tr>
|
|
<tr>
|
|
<td>70</td>
|
|
<td>VT_BLOB_OBJECT</td>
|
|
<td>[P]</td>
|
|
<td>Blob contains an object</td>
|
|
</tr>
|
|
<tr>
|
|
<td>71</td>
|
|
<td>VT_CF</td>
|
|
<td>[P]</td>
|
|
<td>Clipboard format</td>
|
|
</tr>
|
|
<tr>
|
|
<td>72</td>
|
|
<td>VT_CLSID</td>
|
|
<td>[P]</td>
|
|
<td>A Class ID</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x1000</td>
|
|
<td>VT_VECTOR</td>
|
|
<td>[P]</td>
|
|
<td>simple counted array</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x2000</td>
|
|
<td>VT_ARRAY</td>
|
|
<td>[V]</td>
|
|
<td>SAFEARRAY*</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x4000</td>
|
|
<td>VT_BYREF</td>
|
|
<td>[V]</td>
|
|
<td>void* for local use</td>
|
|
</tr>
|
|
<tr>
|
|
<td>0x8000</td>
|
|
<td>VT_RESERVED</td>
|
|
<td><br/></td>
|
|
<td><br/></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0xFFFF</td>
|
|
<td>VT_ILLEGAL</td>
|
|
<td><br/></td>
|
|
<td><br/></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0xFFF</td>
|
|
<td>VT_ILLEGALMASKED</td>
|
|
<td><br/></td>
|
|
<td><br/></td>
|
|
</tr>
|
|
<tr>
|
|
<td>0xFFF</td>
|
|
<td>VT_TYPEMASK</td>
|
|
<td><br/></td>
|
|
<td><br/></td>
|
|
</tr>
|
|
</table>
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>References</title>
|
|
|
|
<p>In order to assemble the HPSF description I used information publically
|
|
available on the Internet only. The references given below have been very
|
|
helpful. If you have any amendments or corrections, please let us know!
|
|
Thank you!</p>
|
|
|
|
<ol>
|
|
|
|
<li>In
|
|
<link href="http://www.kyler.com/pubs/ddj9894.html"><em>Understanding OLE
|
|
documents</em></link>, Ken Kyler gives an introduction to OLE2
|
|
documents
|
|
and especially to property sets. He names the property names, types, and
|
|
IDs of the Summary Information and Document Summary Information
|
|
stream.</li>
|
|
|
|
<li>The
|
|
<link href="http://www.dwam.net/docs/oleref/"><em>ActiveX Programmer's
|
|
Reference</em></link> at
|
|
<link href="http://www.dwam.net/docs/oleref/">http://www.dwam.net/docs/oleref/</link>
|
|
seems a little outdated, but that's what I have found.</li>
|
|
|
|
<li>An overview of the <code>VT_</code> types is in
|
|
<link href="http://www.marin.clara.net/COM/variant_type_definitions.htm"><em>Variant
|
|
Type Definitions</em></link>.</li>
|
|
|
|
<li>What is a <code>FILETIME</code>? The answer can be found
|
|
under <link
|
|
href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/sysinfo/base/filetime_str.asp"></link>, <link href="http://www.vbapi.com/ref/f/filetime.html">http://www.vbapi.com/ref/f/filetime.html</link> or
|
|
<link href="http://www.cs.rpi.edu/courses/fall01/os/FILETIME.html">http://www.cs.rpi.edu/courses/fall01/os/FILETIME.html</link>.
|
|
In short: <em>The FILETIME structure holds a date and time associated
|
|
with a file. The structure identifies a 64-bit integer specifying the
|
|
number of 100-nanosecond intervals which have passed since January 1,
|
|
1601. This 64-bit value is split into the two dwords stored in the
|
|
structure.</em></li>
|
|
|
|
<li>Information about the code page property in the
|
|
DocumentSummaryInformation stream is available at <link
|
|
href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/stg/stg/property_id_1.asp">http://msdn.microsoft.com/library/default.asp?url=/library/en-us/stg/stg/property_id_1.asp</link>.</li>
|
|
|
|
<li>This documentation origins from the <link href="http://www.rainer-klute.de/~klute/Software/poibrowser/doc/HPSF-Description.html">HPSF description</link> available at <link href="http://www.rainer-klute.de/~klute/Software/poibrowser/doc/HPSF-Description.html">http://www.rainer-klute.de/~klute/Software/poibrowser/doc/HPSF-Description.html</link>.</li>
|
|
</ol>
|
|
</section>
|
|
</section>
|
|
</body>
|
|
</document>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
mode: xml
|
|
sgml-omittag:nil
|
|
sgml-shorttag:nil
|
|
sgml-namecase-general:nil
|
|
sgml-general-insert-case:lower
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|