- Started to document the reading of general property set streams.
- Minor documentation fixes. git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352993 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
a40b753e09
commit
b1d9d2e10f
@ -7,13 +7,13 @@
|
||||
<header>
|
||||
<title>HPSF HOW-TO</title>
|
||||
<authors>
|
||||
<person name="Rainer Klute" email="klute@rainer-klute.de"/>
|
||||
<person name="Rainer Klute" email="klute@apache.org"/>
|
||||
</authors>
|
||||
</header>
|
||||
<body>
|
||||
<section title="How To Use the HPSF APIs">
|
||||
|
||||
<p>This HOW-TO is organized in three section. You should read them
|
||||
<p>This HOW-TO is organized in three sections. You should read them
|
||||
sequentially because the later sections build upon the earlier ones.</p>
|
||||
|
||||
<ol>
|
||||
@ -40,12 +40,9 @@
|
||||
</li>
|
||||
</ol>
|
||||
|
||||
<p>Please note that there is separate document on <link
|
||||
href="thumbnails.html">thumbnails</link>!</p>
|
||||
|
||||
|
||||
|
||||
<anchor id="sec1" />
|
||||
<anchor id="sec1"/>
|
||||
<section title="Reading Standard Properties">
|
||||
|
||||
<note>This section explains how to read
|
||||
@ -56,19 +53,20 @@
|
||||
|
||||
<p>The first thing you should understand is that properties are stored in
|
||||
separate documents inside the POI filesystem. (If you don't know what a
|
||||
POI filesystem is, read its <link
|
||||
href="../poifs/index.html">documentation</link>.) A document in a POI
|
||||
filesystem is also called a <strong>stream</strong>.</p>
|
||||
POI filesystem is, read the <link href="../poifs/index.html">POIFS
|
||||
documentation</link>.) A document in a POI filesystem is also called a
|
||||
<strong>stream</strong>.</p>
|
||||
|
||||
<p>The following example shows how to read a POI filesystem's
|
||||
"title" property. Reading other properties is similar. Consider the API
|
||||
documentation of <code>org.apache.poi.hpsf.SummaryInformation</code>.</p>
|
||||
documentation of <code>org.apache.poi.hpsf.SummaryInformation</code> to
|
||||
learn which methods are available!</p>
|
||||
|
||||
<p>The standard properties this section focusses on can be
|
||||
found in a document called <em>\005SummaryInformation</em> in the root of
|
||||
the POI filesystem. The notation <em>\005</em> in the document's name
|
||||
means the character with the decimal value of 5. In order to read the
|
||||
title, an application has to perform the following steps:</p>
|
||||
<p>The standard properties this section focusses on can be found in a
|
||||
document called <em>\005SummaryInformation</em> located in the root of the
|
||||
POI filesystem. The notation <em>\005</em> in the document's name means
|
||||
the character with the decimal value of 5. In order to read the title, an
|
||||
application has to perform the following steps:</p>
|
||||
|
||||
<ol>
|
||||
<li>
|
||||
@ -76,9 +74,8 @@
|
||||
of the POI filesystem.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Create an instance of the class
|
||||
<code>SummaryInformation</code> from that
|
||||
document.</p>
|
||||
<p>Create an instance of the class <code>SummaryInformation</code> from
|
||||
that document.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Call the <code>SummaryInformation</code> instance's
|
||||
@ -96,7 +93,10 @@
|
||||
(POIFS) proceeds as shown by the following code fragment. (The full
|
||||
source code of the sample application is available in the
|
||||
<em>examples</em> section of the POI source tree as
|
||||
<em>ReadTitle.java</em>.)</p>
|
||||
<em>ReadTitle.java</em>.</p>
|
||||
|
||||
<fixme>I just found out that <em>ReadTitle.java</em> is no longer there! I
|
||||
shall look it up in the CVS and try to restore it.</fixme>
|
||||
|
||||
<source>
|
||||
import java.io.*;
|
||||
@ -141,7 +141,7 @@ r.registerListener(new MyPOIFSReaderListener(),
|
||||
<code>processPOIFSReaderEvent</code> method. The eventing POI filesystem
|
||||
calls this method when it finds the <em>\005SummaryInformation</em>
|
||||
document. In the sample application <code>MyPOIFSReaderListener</code> is
|
||||
a static class in the <em>ReadTitle.java</em> source file.)</p>
|
||||
a static class in the <em>ReadTitle.java</em> source file.</p>
|
||||
|
||||
<p>Now everything is prepared and reading the POI filesystem can
|
||||
start:</p>
|
||||
@ -209,10 +209,10 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
||||
case that the POI filesystem does not have a title.</p>
|
||||
|
||||
<source>final String title = si.getTitle();
|
||||
if (title != null)
|
||||
System.out.println("Title: \"" + title + "\"");
|
||||
else
|
||||
System.out.println("Document has no title.");</source>
|
||||
if (title != null)
|
||||
System.out.println("Title: \"" + title + "\"");
|
||||
else
|
||||
System.out.println("Document has no title.");</source>
|
||||
|
||||
<p>Please note that a Microsoft Office document does not necessarily
|
||||
contain the <em>\005SummaryInformation</em> stream. The documents created
|
||||
@ -249,7 +249,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
||||
|
||||
<p>And of course you cannot call <code>getTitle()</code> because
|
||||
<code>DocumentSummaryInformation</code> has different query methods. See
|
||||
the API documentation for the details!</p>
|
||||
the Javadoc API documentation for the details!</p>
|
||||
|
||||
<p>In the previous section the application simply caught all
|
||||
<strong>exceptions</strong> and was in no way interested in any
|
||||
@ -259,17 +259,19 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
||||
|
||||
<dl>
|
||||
<dt><code>NoPropertySetStreamException</code>:</dt>
|
||||
<dd><p>This exception is thrown if the application tries to create a
|
||||
<code>PropertySet</code> or one of its subclasses
|
||||
<code>SummaryInformation</code> and
|
||||
<code>DocumentSummaryInformation</code> from a stream that is not a
|
||||
property set stream. A faulty property set stream counts as not being a
|
||||
property set stream at all. An application should be prepared to deal
|
||||
with this case even if opens streams named
|
||||
<dd>
|
||||
<p>This exception is thrown if the application tries to create a
|
||||
<code>PropertySet</code> instance from a stream that is not a
|
||||
property set stream. (<code>SummaryInformation</code> and
|
||||
<code>DocumentSummaryInformation</code> are subclasses of
|
||||
<code>PropertySet</code>.) A faulty property set stream counts as not
|
||||
being a property set stream at all. An application should be prepared to
|
||||
deal with this case even if it opens streams named
|
||||
<em>\005SummaryInformation</em> or
|
||||
<em>\005DocumentSummaryInformation</em> only. These are just names. A
|
||||
stream's name by itself does not ensure that the stream contains the
|
||||
expected contents and that this contents is correct.</p></dd>
|
||||
expected contents and that this contents is correct.</p>
|
||||
</dd>
|
||||
|
||||
<dt><code>UnexpectedPropertySetTypeException</code></dt>
|
||||
<dd><p>This exception is thrown if a certain type of property set is
|
||||
@ -292,7 +294,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
||||
document. Embedded objects may have property sets of their own. An
|
||||
application can open these property set streams as described above. The
|
||||
only difference is that they are not located in the POI filesystem's root
|
||||
but in a nested directory instead. Just register a
|
||||
but in a <strong>nested directory</strong> instead. Just register a
|
||||
<code>POIFSReaderListener</code> for the property set streams you are
|
||||
interested in. For example, the <em>POIBrowser</em> application in the
|
||||
contrib section tries to open each and every document in a POI filesystem
|
||||
@ -303,12 +305,49 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
||||
<anchor id="sec3"/>
|
||||
<section title="Reading Non-Standard Properties">
|
||||
|
||||
<note>This section tells how to read
|
||||
non-standard properties. Non-standard properties are application-specific
|
||||
name/value/type triples.</note>
|
||||
<note>This section tells how to read non-standard properties. Non-standard
|
||||
properties are application-specific name/type/value triples.</note>
|
||||
|
||||
<fixme author="Rainer Klute">Write this section!</fixme>
|
||||
<p>Now comes the really hardcode stuff. As mentioned above,
|
||||
<code>SummaryInformation</code> and
|
||||
<code>DocumentSummaryInformation</code> are just special cases of the
|
||||
general concept of a property set. The general concept says that a
|
||||
property set consists of <strong>properties</strong>. Each property is an
|
||||
entity that has a <strong>name</strong>, a <strong>type</strong>, and a
|
||||
<strong>value</strong>.</p>
|
||||
|
||||
<p>Okay, that was still rather easy. However, to make things more
|
||||
complicated Microsoft in its infinite wisdom decided that a property set
|
||||
shalt be broken into <strong>sections</strong>. Each section holds a bunch
|
||||
of properties. But since that's still not complicated enough: a section
|
||||
can optionally have a dictionary that maps property IDs to property
|
||||
names - we'll explain later what that means.</p>
|
||||
|
||||
<note>[To be continued.]</note>
|
||||
|
||||
<fixme>Let's consider a Java application that wants to read a stream
|
||||
containing a general property set. It is modelled by the class
|
||||
<code>PropertySet</code> in the <code>org.apache.poi.hpsf</code>
|
||||
package.</fixme>
|
||||
</section>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
mode: xml
|
||||
sgml-omittag:nil
|
||||
sgml-shorttag:nil
|
||||
sgml-namecase-general:nil
|
||||
sgml-general-insert-case:lower
|
||||
sgml-minimize-attributes:nil
|
||||
sgml-always-quote-attributes:t
|
||||
sgml-indent-step:1
|
||||
sgml-indent-data:t
|
||||
sgml-parent-document:nil
|
||||
sgml-exposed-tags:nil
|
||||
sgml-local-catalogs:nil
|
||||
sgml-local-ecat-files:nil
|
||||
End:
|
||||
-->
|
||||
|
@ -7,27 +7,30 @@
|
||||
<title>HPSF (Horrible Property Set Format)</title>
|
||||
<subtitle>Overview</subtitle>
|
||||
<authors>
|
||||
<person name="Rainer Klute" email="klute@rainer-klute.de"/>
|
||||
<person name="Rainer Klute" email="klute@apache.org"/>
|
||||
</authors>
|
||||
</header>
|
||||
<body>
|
||||
<section title="Overview">
|
||||
<p>Microsoft applications like "Word" or "Excel" let the user describe his
|
||||
document by properties like "title", "category" and so on. The application
|
||||
itself adds further information: last author, creation date etc. These
|
||||
properties are stored in so-called <strong>property set streams</strong>. A
|
||||
property set stream is a separate document within a <link
|
||||
href="../poifs/index.html">POI filesystem</link>. HPSF is POI's pure-Java
|
||||
implementation to read (and in future to write) property set streams.</p>
|
||||
<p>Microsoft applications like "Word", "Excel" or "Powerpoint" let the user
|
||||
describe his document by properties like "title", "category" and so on. The
|
||||
application itself adds further information: last author, creation date
|
||||
etc. These document properties are stored in so-called <strong>property set
|
||||
streams</strong>. A property set stream is a separate document within a
|
||||
<link href="../poifs/index.html">POI filesystem</link>. We'll call property
|
||||
set streams mostly just "property sets". HPSF is POI's pure-Java
|
||||
implementation to read (and in future to write) property sets.</p>
|
||||
|
||||
<p>The <link href="how-to.html">HPSF HOWTO</link> describes what a Java
|
||||
application should do to read a property set using HPSF and to retrieve the
|
||||
information it needs.</p>
|
||||
|
||||
<p>HPSF supports OLE2 property set streams in general, not only the special
|
||||
case of document properties mentioned above. The <link
|
||||
href="internals.html">HPSF description</link> describes the internal
|
||||
structure of property set streams.</p>
|
||||
<p>HPSF supports OLE2 property set streams in general, and is not limited to
|
||||
the special case of document properties in the Microsoft Office files
|
||||
mentioned above. The <link href="internals.html">HPSF description</link>
|
||||
describes the internal structure of property set streams. A separate
|
||||
document explains the internal of <link href="thumbnails.html">thumbnail
|
||||
images</link>.</p>
|
||||
</section>
|
||||
</body>
|
||||
</document>
|
||||
|
@ -13,21 +13,17 @@
|
||||
<body>
|
||||
<section title="The VT_CF Format">
|
||||
|
||||
<p>
|
||||
Thumbnail information is stored as a VT_CF, or Thumbnail Variant.
|
||||
The Thumbnail Variant is used to store various types of information
|
||||
in a clipboard. The VT_CF can store information in formats for the
|
||||
Macintosh or Windows clipboard.
|
||||
</p>
|
||||
<p>Thumbnail information is stored as a VT_CF, or Thumbnail Variant. The
|
||||
Thumbnail Variant is used to store various types of information in a
|
||||
clipboard. The VT_CF can store information in formats for the Macintosh or
|
||||
Windows clipboard.</p>
|
||||
|
||||
<p>There are many types of data that can be copied to the clipboard, but the
|
||||
only types of information needed for thumbnail manipulation are the image
|
||||
formats.</p>
|
||||
|
||||
<p>
|
||||
There are many types of data that can be copied to the clipboard,
|
||||
but the only types of information needed for thumbnail manipulation are
|
||||
the image formats.
|
||||
</p>
|
||||
|
||||
<p>The <code>VT_CF</code> structure looks like this:</p>
|
||||
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th>Element:</th>
|
||||
@ -43,11 +39,9 @@
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<p>
|
||||
The Clipboard Size refers to the size (in bytes) of Clipboard Data
|
||||
(variable size) plus the Clipboard Format (four bytes).
|
||||
</p>
|
||||
|
||||
<p>The Clipboard Size refers to the size (in bytes) of Clipboard Data
|
||||
(variable size) plus the Clipboard Format (four bytes).</p>
|
||||
|
||||
<p>Clipboard Format Tag has four possible values:</p>
|
||||
|
||||
<table>
|
||||
@ -83,16 +77,14 @@
|
||||
|
||||
<section title="Windows Clipboard Data">
|
||||
|
||||
<p>
|
||||
Windows clipboard data has four image formats for thumbnails:
|
||||
</p>
|
||||
|
||||
<p>Windows clipboard data has four image formats for thumbnails:</p>
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th>Value</th>
|
||||
<th>Identifier</th>
|
||||
<th>Description</th>
|
||||
</tr>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>3</td>
|
||||
<td><code>CF_METAFILEPICT</code></td>
|
||||
@ -102,75 +94,89 @@
|
||||
<td>8</td>
|
||||
<td><code>CF_DIB</code></td>
|
||||
<td>Device Independent Bitmap</td>
|
||||
</tr>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>14</td>
|
||||
<td><code>CF_ENHMETAFILE</code></td>
|
||||
<td>Enhanced Windows metafile format</td>
|
||||
</tr>
|
||||
</tr>
|
||||
<tr>
|
||||
<td>2</td>
|
||||
<td><code>CF_BITMAP</code></td>
|
||||
<td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td>
|
||||
</tr>
|
||||
</tr>
|
||||
</table>
|
||||
|
||||
<section title="Windows Metafile Format">
|
||||
|
||||
<p>
|
||||
The most common format for thumbnails on the Windows platform
|
||||
is the Windows metafile format. The Clipboard places and extra
|
||||
header in front of a the standard Windows Metafile Format data.
|
||||
</p>
|
||||
|
||||
<p>
|
||||
The Clipboard Data byte array looks like this when an image is
|
||||
stored in Windows' Clipboard WMF format.
|
||||
</p>
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th>Identifier</th>
|
||||
<td>CF_METAFILEPICT</td>
|
||||
<td>mm</td>
|
||||
<td>width</td>
|
||||
<td>height</td>
|
||||
<td>handle</td>
|
||||
<td>WMF data</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Size</th>
|
||||
<td>32 bit unsigned int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>byte array - variable length</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Description</th>
|
||||
<td>Clipboard WMF</td>
|
||||
<td>Mapping Mode</td>
|
||||
<td>Image Width</td>
|
||||
<td>Image Height</td>
|
||||
<td>handle to the WMF data array in memory, or 0</td>
|
||||
<td>standard WMF byte stream</td>
|
||||
</tr>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
|
||||
|
||||
<section title="Device Independent Bitmap">
|
||||
<p><strong>FIXME:</strong> Document Device Independent Bitmap format</p>
|
||||
</section>
|
||||
</section>
|
||||
|
||||
|
||||
<section title="Windows Metafile Format">
|
||||
|
||||
<p>The most common format for thumbnails on the Windows platform is the
|
||||
Windows metafile format. The Clipboard places and extra header in front of
|
||||
a the standard Windows Metafile Format data.</p>
|
||||
|
||||
<p>The Clipboard Data byte array looks like this when an image is stored in
|
||||
Windows' Clipboard WMF format.</p>
|
||||
|
||||
<table>
|
||||
<tr>
|
||||
<th>Identifier</th>
|
||||
<td>CF_METAFILEPICT</td>
|
||||
<td>mm</td>
|
||||
<td>width</td>
|
||||
<td>height</td>
|
||||
<td>handle</td>
|
||||
<td>WMF data</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Size</th>
|
||||
<td>32 bit unsigned int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>16 bit unsigned(?) int</td>
|
||||
<td>byte array - variable length</td>
|
||||
</tr>
|
||||
<tr>
|
||||
<th>Description</th>
|
||||
<td>Clipboard WMF</td>
|
||||
<td>Mapping Mode</td>
|
||||
<td>Image Width</td>
|
||||
<td>Image Height</td>
|
||||
<td>handle to the WMF data array in memory, or 0</td>
|
||||
<td>standard WMF byte stream</td>
|
||||
</tr>
|
||||
</table>
|
||||
</section>
|
||||
|
||||
|
||||
<section title="Device Independent Bitmap">
|
||||
<p><strong>FIXME:</strong> Describe the Device Independent Bitmap
|
||||
format!</p>
|
||||
</section>
|
||||
|
||||
|
||||
|
||||
<section title="Macintosh Clipboard Data">
|
||||
<p><strong>FIXME:</strong> Document Macintosh clipboard formats.</p>
|
||||
<p><strong>FIXME:</strong> Describe the Macintosh clipboard formats!</p>
|
||||
</section>
|
||||
|
||||
</body>
|
||||
</document>
|
||||
|
||||
<!-- Keep this comment at the end of the file
|
||||
Local variables:
|
||||
mode: xml
|
||||
sgml-omittag:nil
|
||||
sgml-shorttag:nil
|
||||
sgml-namecase-general:nil
|
||||
sgml-general-insert-case:lower
|
||||
sgml-minimize-attributes:nil
|
||||
sgml-always-quote-attributes:t
|
||||
sgml-indent-step:1
|
||||
sgml-indent-data:t
|
||||
sgml-parent-document:nil
|
||||
sgml-exposed-tags:nil
|
||||
sgml-local-catalogs:nil
|
||||
sgml-local-ecat-files:nil
|
||||
End:
|
||||
-->
|
||||
|
Loading…
Reference in New Issue
Block a user