- Started to document the reading of general property set streams.
- Minor documentation fixes. git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352993 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
a40b753e09
commit
b1d9d2e10f
@ -7,13 +7,13 @@
|
|||||||
<header>
|
<header>
|
||||||
<title>HPSF HOW-TO</title>
|
<title>HPSF HOW-TO</title>
|
||||||
<authors>
|
<authors>
|
||||||
<person name="Rainer Klute" email="klute@rainer-klute.de"/>
|
<person name="Rainer Klute" email="klute@apache.org"/>
|
||||||
</authors>
|
</authors>
|
||||||
</header>
|
</header>
|
||||||
<body>
|
<body>
|
||||||
<section title="How To Use the HPSF APIs">
|
<section title="How To Use the HPSF APIs">
|
||||||
|
|
||||||
<p>This HOW-TO is organized in three section. You should read them
|
<p>This HOW-TO is organized in three sections. You should read them
|
||||||
sequentially because the later sections build upon the earlier ones.</p>
|
sequentially because the later sections build upon the earlier ones.</p>
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
@ -40,12 +40,9 @@
|
|||||||
</li>
|
</li>
|
||||||
</ol>
|
</ol>
|
||||||
|
|
||||||
<p>Please note that there is separate document on <link
|
|
||||||
href="thumbnails.html">thumbnails</link>!</p>
|
|
||||||
|
|
||||||
|
|
||||||
|
<anchor id="sec1"/>
|
||||||
<anchor id="sec1" />
|
|
||||||
<section title="Reading Standard Properties">
|
<section title="Reading Standard Properties">
|
||||||
|
|
||||||
<note>This section explains how to read
|
<note>This section explains how to read
|
||||||
@ -56,19 +53,20 @@
|
|||||||
|
|
||||||
<p>The first thing you should understand is that properties are stored in
|
<p>The first thing you should understand is that properties are stored in
|
||||||
separate documents inside the POI filesystem. (If you don't know what a
|
separate documents inside the POI filesystem. (If you don't know what a
|
||||||
POI filesystem is, read its <link
|
POI filesystem is, read the <link href="../poifs/index.html">POIFS
|
||||||
href="../poifs/index.html">documentation</link>.) A document in a POI
|
documentation</link>.) A document in a POI filesystem is also called a
|
||||||
filesystem is also called a <strong>stream</strong>.</p>
|
<strong>stream</strong>.</p>
|
||||||
|
|
||||||
<p>The following example shows how to read a POI filesystem's
|
<p>The following example shows how to read a POI filesystem's
|
||||||
"title" property. Reading other properties is similar. Consider the API
|
"title" property. Reading other properties is similar. Consider the API
|
||||||
documentation of <code>org.apache.poi.hpsf.SummaryInformation</code>.</p>
|
documentation of <code>org.apache.poi.hpsf.SummaryInformation</code> to
|
||||||
|
learn which methods are available!</p>
|
||||||
|
|
||||||
<p>The standard properties this section focusses on can be
|
<p>The standard properties this section focusses on can be found in a
|
||||||
found in a document called <em>\005SummaryInformation</em> in the root of
|
document called <em>\005SummaryInformation</em> located in the root of the
|
||||||
the POI filesystem. The notation <em>\005</em> in the document's name
|
POI filesystem. The notation <em>\005</em> in the document's name means
|
||||||
means the character with the decimal value of 5. In order to read the
|
the character with the decimal value of 5. In order to read the title, an
|
||||||
title, an application has to perform the following steps:</p>
|
application has to perform the following steps:</p>
|
||||||
|
|
||||||
<ol>
|
<ol>
|
||||||
<li>
|
<li>
|
||||||
@ -76,9 +74,8 @@
|
|||||||
of the POI filesystem.</p>
|
of the POI filesystem.</p>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<p>Create an instance of the class
|
<p>Create an instance of the class <code>SummaryInformation</code> from
|
||||||
<code>SummaryInformation</code> from that
|
that document.</p>
|
||||||
document.</p>
|
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<p>Call the <code>SummaryInformation</code> instance's
|
<p>Call the <code>SummaryInformation</code> instance's
|
||||||
@ -96,7 +93,10 @@
|
|||||||
(POIFS) proceeds as shown by the following code fragment. (The full
|
(POIFS) proceeds as shown by the following code fragment. (The full
|
||||||
source code of the sample application is available in the
|
source code of the sample application is available in the
|
||||||
<em>examples</em> section of the POI source tree as
|
<em>examples</em> section of the POI source tree as
|
||||||
<em>ReadTitle.java</em>.)</p>
|
<em>ReadTitle.java</em>.</p>
|
||||||
|
|
||||||
|
<fixme>I just found out that <em>ReadTitle.java</em> is no longer there! I
|
||||||
|
shall look it up in the CVS and try to restore it.</fixme>
|
||||||
|
|
||||||
<source>
|
<source>
|
||||||
import java.io.*;
|
import java.io.*;
|
||||||
@ -141,7 +141,7 @@ r.registerListener(new MyPOIFSReaderListener(),
|
|||||||
<code>processPOIFSReaderEvent</code> method. The eventing POI filesystem
|
<code>processPOIFSReaderEvent</code> method. The eventing POI filesystem
|
||||||
calls this method when it finds the <em>\005SummaryInformation</em>
|
calls this method when it finds the <em>\005SummaryInformation</em>
|
||||||
document. In the sample application <code>MyPOIFSReaderListener</code> is
|
document. In the sample application <code>MyPOIFSReaderListener</code> is
|
||||||
a static class in the <em>ReadTitle.java</em> source file.)</p>
|
a static class in the <em>ReadTitle.java</em> source file.</p>
|
||||||
|
|
||||||
<p>Now everything is prepared and reading the POI filesystem can
|
<p>Now everything is prepared and reading the POI filesystem can
|
||||||
start:</p>
|
start:</p>
|
||||||
@ -209,10 +209,10 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
|||||||
case that the POI filesystem does not have a title.</p>
|
case that the POI filesystem does not have a title.</p>
|
||||||
|
|
||||||
<source>final String title = si.getTitle();
|
<source>final String title = si.getTitle();
|
||||||
if (title != null)
|
if (title != null)
|
||||||
System.out.println("Title: \"" + title + "\"");
|
System.out.println("Title: \"" + title + "\"");
|
||||||
else
|
else
|
||||||
System.out.println("Document has no title.");</source>
|
System.out.println("Document has no title.");</source>
|
||||||
|
|
||||||
<p>Please note that a Microsoft Office document does not necessarily
|
<p>Please note that a Microsoft Office document does not necessarily
|
||||||
contain the <em>\005SummaryInformation</em> stream. The documents created
|
contain the <em>\005SummaryInformation</em> stream. The documents created
|
||||||
@ -249,7 +249,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
|||||||
|
|
||||||
<p>And of course you cannot call <code>getTitle()</code> because
|
<p>And of course you cannot call <code>getTitle()</code> because
|
||||||
<code>DocumentSummaryInformation</code> has different query methods. See
|
<code>DocumentSummaryInformation</code> has different query methods. See
|
||||||
the API documentation for the details!</p>
|
the Javadoc API documentation for the details!</p>
|
||||||
|
|
||||||
<p>In the previous section the application simply caught all
|
<p>In the previous section the application simply caught all
|
||||||
<strong>exceptions</strong> and was in no way interested in any
|
<strong>exceptions</strong> and was in no way interested in any
|
||||||
@ -259,17 +259,19 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
|||||||
|
|
||||||
<dl>
|
<dl>
|
||||||
<dt><code>NoPropertySetStreamException</code>:</dt>
|
<dt><code>NoPropertySetStreamException</code>:</dt>
|
||||||
<dd><p>This exception is thrown if the application tries to create a
|
<dd>
|
||||||
<code>PropertySet</code> or one of its subclasses
|
<p>This exception is thrown if the application tries to create a
|
||||||
<code>SummaryInformation</code> and
|
<code>PropertySet</code> instance from a stream that is not a
|
||||||
<code>DocumentSummaryInformation</code> from a stream that is not a
|
property set stream. (<code>SummaryInformation</code> and
|
||||||
property set stream. A faulty property set stream counts as not being a
|
<code>DocumentSummaryInformation</code> are subclasses of
|
||||||
property set stream at all. An application should be prepared to deal
|
<code>PropertySet</code>.) A faulty property set stream counts as not
|
||||||
with this case even if opens streams named
|
being a property set stream at all. An application should be prepared to
|
||||||
|
deal with this case even if it opens streams named
|
||||||
<em>\005SummaryInformation</em> or
|
<em>\005SummaryInformation</em> or
|
||||||
<em>\005DocumentSummaryInformation</em> only. These are just names. A
|
<em>\005DocumentSummaryInformation</em> only. These are just names. A
|
||||||
stream's name by itself does not ensure that the stream contains the
|
stream's name by itself does not ensure that the stream contains the
|
||||||
expected contents and that this contents is correct.</p></dd>
|
expected contents and that this contents is correct.</p>
|
||||||
|
</dd>
|
||||||
|
|
||||||
<dt><code>UnexpectedPropertySetTypeException</code></dt>
|
<dt><code>UnexpectedPropertySetTypeException</code></dt>
|
||||||
<dd><p>This exception is thrown if a certain type of property set is
|
<dd><p>This exception is thrown if a certain type of property set is
|
||||||
@ -292,7 +294,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
|||||||
document. Embedded objects may have property sets of their own. An
|
document. Embedded objects may have property sets of their own. An
|
||||||
application can open these property set streams as described above. The
|
application can open these property set streams as described above. The
|
||||||
only difference is that they are not located in the POI filesystem's root
|
only difference is that they are not located in the POI filesystem's root
|
||||||
but in a nested directory instead. Just register a
|
but in a <strong>nested directory</strong> instead. Just register a
|
||||||
<code>POIFSReaderListener</code> for the property set streams you are
|
<code>POIFSReaderListener</code> for the property set streams you are
|
||||||
interested in. For example, the <em>POIBrowser</em> application in the
|
interested in. For example, the <em>POIBrowser</em> application in the
|
||||||
contrib section tries to open each and every document in a POI filesystem
|
contrib section tries to open each and every document in a POI filesystem
|
||||||
@ -303,12 +305,49 @@ static class MyPOIFSReaderListener implements POIFSReaderListener
|
|||||||
<anchor id="sec3"/>
|
<anchor id="sec3"/>
|
||||||
<section title="Reading Non-Standard Properties">
|
<section title="Reading Non-Standard Properties">
|
||||||
|
|
||||||
<note>This section tells how to read
|
<note>This section tells how to read non-standard properties. Non-standard
|
||||||
non-standard properties. Non-standard properties are application-specific
|
properties are application-specific name/type/value triples.</note>
|
||||||
name/value/type triples.</note>
|
|
||||||
|
|
||||||
<fixme author="Rainer Klute">Write this section!</fixme>
|
<p>Now comes the really hardcode stuff. As mentioned above,
|
||||||
|
<code>SummaryInformation</code> and
|
||||||
|
<code>DocumentSummaryInformation</code> are just special cases of the
|
||||||
|
general concept of a property set. The general concept says that a
|
||||||
|
property set consists of <strong>properties</strong>. Each property is an
|
||||||
|
entity that has a <strong>name</strong>, a <strong>type</strong>, and a
|
||||||
|
<strong>value</strong>.</p>
|
||||||
|
|
||||||
|
<p>Okay, that was still rather easy. However, to make things more
|
||||||
|
complicated Microsoft in its infinite wisdom decided that a property set
|
||||||
|
shalt be broken into <strong>sections</strong>. Each section holds a bunch
|
||||||
|
of properties. But since that's still not complicated enough: a section
|
||||||
|
can optionally have a dictionary that maps property IDs to property
|
||||||
|
names - we'll explain later what that means.</p>
|
||||||
|
|
||||||
|
<note>[To be continued.]</note>
|
||||||
|
|
||||||
|
<fixme>Let's consider a Java application that wants to read a stream
|
||||||
|
containing a general property set. It is modelled by the class
|
||||||
|
<code>PropertySet</code> in the <code>org.apache.poi.hpsf</code>
|
||||||
|
package.</fixme>
|
||||||
</section>
|
</section>
|
||||||
</section>
|
</section>
|
||||||
</body>
|
</body>
|
||||||
</document>
|
</document>
|
||||||
|
|
||||||
|
<!-- Keep this comment at the end of the file
|
||||||
|
Local variables:
|
||||||
|
mode: xml
|
||||||
|
sgml-omittag:nil
|
||||||
|
sgml-shorttag:nil
|
||||||
|
sgml-namecase-general:nil
|
||||||
|
sgml-general-insert-case:lower
|
||||||
|
sgml-minimize-attributes:nil
|
||||||
|
sgml-always-quote-attributes:t
|
||||||
|
sgml-indent-step:1
|
||||||
|
sgml-indent-data:t
|
||||||
|
sgml-parent-document:nil
|
||||||
|
sgml-exposed-tags:nil
|
||||||
|
sgml-local-catalogs:nil
|
||||||
|
sgml-local-ecat-files:nil
|
||||||
|
End:
|
||||||
|
-->
|
||||||
|
@ -7,27 +7,30 @@
|
|||||||
<title>HPSF (Horrible Property Set Format)</title>
|
<title>HPSF (Horrible Property Set Format)</title>
|
||||||
<subtitle>Overview</subtitle>
|
<subtitle>Overview</subtitle>
|
||||||
<authors>
|
<authors>
|
||||||
<person name="Rainer Klute" email="klute@rainer-klute.de"/>
|
<person name="Rainer Klute" email="klute@apache.org"/>
|
||||||
</authors>
|
</authors>
|
||||||
</header>
|
</header>
|
||||||
<body>
|
<body>
|
||||||
<section title="Overview">
|
<section title="Overview">
|
||||||
<p>Microsoft applications like "Word" or "Excel" let the user describe his
|
<p>Microsoft applications like "Word", "Excel" or "Powerpoint" let the user
|
||||||
document by properties like "title", "category" and so on. The application
|
describe his document by properties like "title", "category" and so on. The
|
||||||
itself adds further information: last author, creation date etc. These
|
application itself adds further information: last author, creation date
|
||||||
properties are stored in so-called <strong>property set streams</strong>. A
|
etc. These document properties are stored in so-called <strong>property set
|
||||||
property set stream is a separate document within a <link
|
streams</strong>. A property set stream is a separate document within a
|
||||||
href="../poifs/index.html">POI filesystem</link>. HPSF is POI's pure-Java
|
<link href="../poifs/index.html">POI filesystem</link>. We'll call property
|
||||||
implementation to read (and in future to write) property set streams.</p>
|
set streams mostly just "property sets". HPSF is POI's pure-Java
|
||||||
|
implementation to read (and in future to write) property sets.</p>
|
||||||
|
|
||||||
<p>The <link href="how-to.html">HPSF HOWTO</link> describes what a Java
|
<p>The <link href="how-to.html">HPSF HOWTO</link> describes what a Java
|
||||||
application should do to read a property set using HPSF and to retrieve the
|
application should do to read a property set using HPSF and to retrieve the
|
||||||
information it needs.</p>
|
information it needs.</p>
|
||||||
|
|
||||||
<p>HPSF supports OLE2 property set streams in general, not only the special
|
<p>HPSF supports OLE2 property set streams in general, and is not limited to
|
||||||
case of document properties mentioned above. The <link
|
the special case of document properties in the Microsoft Office files
|
||||||
href="internals.html">HPSF description</link> describes the internal
|
mentioned above. The <link href="internals.html">HPSF description</link>
|
||||||
structure of property set streams.</p>
|
describes the internal structure of property set streams. A separate
|
||||||
|
document explains the internal of <link href="thumbnails.html">thumbnail
|
||||||
|
images</link>.</p>
|
||||||
</section>
|
</section>
|
||||||
</body>
|
</body>
|
||||||
</document>
|
</document>
|
||||||
|
@ -13,18 +13,14 @@
|
|||||||
<body>
|
<body>
|
||||||
<section title="The VT_CF Format">
|
<section title="The VT_CF Format">
|
||||||
|
|
||||||
<p>
|
<p>Thumbnail information is stored as a VT_CF, or Thumbnail Variant. The
|
||||||
Thumbnail information is stored as a VT_CF, or Thumbnail Variant.
|
Thumbnail Variant is used to store various types of information in a
|
||||||
The Thumbnail Variant is used to store various types of information
|
clipboard. The VT_CF can store information in formats for the Macintosh or
|
||||||
in a clipboard. The VT_CF can store information in formats for the
|
Windows clipboard.</p>
|
||||||
Macintosh or Windows clipboard.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p>There are many types of data that can be copied to the clipboard, but the
|
||||||
There are many types of data that can be copied to the clipboard,
|
only types of information needed for thumbnail manipulation are the image
|
||||||
but the only types of information needed for thumbnail manipulation are
|
formats.</p>
|
||||||
the image formats.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>The <code>VT_CF</code> structure looks like this:</p>
|
<p>The <code>VT_CF</code> structure looks like this:</p>
|
||||||
|
|
||||||
@ -43,10 +39,8 @@
|
|||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
|
||||||
<p>
|
<p>The Clipboard Size refers to the size (in bytes) of Clipboard Data
|
||||||
The Clipboard Size refers to the size (in bytes) of Clipboard Data
|
(variable size) plus the Clipboard Format (four bytes).</p>
|
||||||
(variable size) plus the Clipboard Format (four bytes).
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>Clipboard Format Tag has four possible values:</p>
|
<p>Clipboard Format Tag has four possible values:</p>
|
||||||
|
|
||||||
@ -83,9 +77,7 @@
|
|||||||
|
|
||||||
<section title="Windows Clipboard Data">
|
<section title="Windows Clipboard Data">
|
||||||
|
|
||||||
<p>
|
<p>Windows clipboard data has four image formats for thumbnails:</p>
|
||||||
Windows clipboard data has four image formats for thumbnails:
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr>
|
<tr>
|
||||||
@ -114,63 +106,77 @@
|
|||||||
<td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td>
|
<td>Bitmap - Obsolete - Use <code>CF_DIB</code> instead</td>
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
|
</section>
|
||||||
|
|
||||||
<section title="Windows Metafile Format">
|
<section title="Windows Metafile Format">
|
||||||
|
|
||||||
<p>
|
<p>The most common format for thumbnails on the Windows platform is the
|
||||||
The most common format for thumbnails on the Windows platform
|
Windows metafile format. The Clipboard places and extra header in front of
|
||||||
is the Windows metafile format. The Clipboard places and extra
|
a the standard Windows Metafile Format data.</p>
|
||||||
header in front of a the standard Windows Metafile Format data.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<p>
|
<p>The Clipboard Data byte array looks like this when an image is stored in
|
||||||
The Clipboard Data byte array looks like this when an image is
|
Windows' Clipboard WMF format.</p>
|
||||||
stored in Windows' Clipboard WMF format.
|
|
||||||
</p>
|
|
||||||
|
|
||||||
<table>
|
<table>
|
||||||
<tr>
|
<tr>
|
||||||
<th>Identifier</th>
|
<th>Identifier</th>
|
||||||
<td>CF_METAFILEPICT</td>
|
<td>CF_METAFILEPICT</td>
|
||||||
<td>mm</td>
|
<td>mm</td>
|
||||||
<td>width</td>
|
<td>width</td>
|
||||||
<td>height</td>
|
<td>height</td>
|
||||||
<td>handle</td>
|
<td>handle</td>
|
||||||
<td>WMF data</td>
|
<td>WMF data</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<th>Size</th>
|
<th>Size</th>
|
||||||
<td>32 bit unsigned int</td>
|
<td>32 bit unsigned int</td>
|
||||||
<td>16 bit unsigned(?) int</td>
|
<td>16 bit unsigned(?) int</td>
|
||||||
<td>16 bit unsigned(?) int</td>
|
<td>16 bit unsigned(?) int</td>
|
||||||
<td>16 bit unsigned(?) int</td>
|
<td>16 bit unsigned(?) int</td>
|
||||||
<td>16 bit unsigned(?) int</td>
|
<td>16 bit unsigned(?) int</td>
|
||||||
<td>byte array - variable length</td>
|
<td>byte array - variable length</td>
|
||||||
</tr>
|
</tr>
|
||||||
<tr>
|
<tr>
|
||||||
<th>Description</th>
|
<th>Description</th>
|
||||||
<td>Clipboard WMF</td>
|
<td>Clipboard WMF</td>
|
||||||
<td>Mapping Mode</td>
|
<td>Mapping Mode</td>
|
||||||
<td>Image Width</td>
|
<td>Image Width</td>
|
||||||
<td>Image Height</td>
|
<td>Image Height</td>
|
||||||
<td>handle to the WMF data array in memory, or 0</td>
|
<td>handle to the WMF data array in memory, or 0</td>
|
||||||
<td>standard WMF byte stream</td>
|
<td>standard WMF byte stream</td>
|
||||||
</tr>
|
</tr>
|
||||||
</table>
|
</table>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
||||||
|
<section title="Device Independent Bitmap">
|
||||||
<section title="Device Independent Bitmap">
|
<p><strong>FIXME:</strong> Describe the Device Independent Bitmap
|
||||||
<p><strong>FIXME:</strong> Document Device Independent Bitmap format</p>
|
format!</p>
|
||||||
</section>
|
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
<section title="Macintosh Clipboard Data">
|
<section title="Macintosh Clipboard Data">
|
||||||
<p><strong>FIXME:</strong> Document Macintosh clipboard formats.</p>
|
<p><strong>FIXME:</strong> Describe the Macintosh clipboard formats!</p>
|
||||||
</section>
|
</section>
|
||||||
|
|
||||||
</body>
|
</body>
|
||||||
</document>
|
</document>
|
||||||
|
|
||||||
|
<!-- Keep this comment at the end of the file
|
||||||
|
Local variables:
|
||||||
|
mode: xml
|
||||||
|
sgml-omittag:nil
|
||||||
|
sgml-shorttag:nil
|
||||||
|
sgml-namecase-general:nil
|
||||||
|
sgml-general-insert-case:lower
|
||||||
|
sgml-minimize-attributes:nil
|
||||||
|
sgml-always-quote-attributes:t
|
||||||
|
sgml-indent-step:1
|
||||||
|
sgml-indent-data:t
|
||||||
|
sgml-parent-document:nil
|
||||||
|
sgml-exposed-tags:nil
|
||||||
|
sgml-local-catalogs:nil
|
||||||
|
sgml-local-ecat-files:nil
|
||||||
|
End:
|
||||||
|
-->
|
||||||
|
Loading…
Reference in New Issue
Block a user