From b1d9d2e10f198dc9dfaa4e0ae4c0556f801e533c Mon Sep 17 00:00:00 2001 From: Rainer Klute Date: Thu, 30 Jan 2003 17:13:15 +0000 Subject: [PATCH] - Started to document the reading of general property set streams. - Minor documentation fixes. git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@352993 13f79535-47bb-0310-9956-ffa450edef68 --- src/documentation/xdocs/hpsf/how-to.xml | 115 +++++++++----- src/documentation/xdocs/hpsf/index.xml | 27 ++-- src/documentation/xdocs/hpsf/thumbnails.xml | 162 ++++++++++---------- 3 files changed, 176 insertions(+), 128 deletions(-) diff --git a/src/documentation/xdocs/hpsf/how-to.xml b/src/documentation/xdocs/hpsf/how-to.xml index bdb49d044..826e60cb8 100644 --- a/src/documentation/xdocs/hpsf/how-to.xml +++ b/src/documentation/xdocs/hpsf/how-to.xml @@ -7,13 +7,13 @@
HPSF HOW-TO - +
-

This HOW-TO is organized in three section. You should read them +

This HOW-TO is organized in three sections. You should read them sequentially because the later sections build upon the earlier ones.

    @@ -40,12 +40,9 @@
-

Please note that there is separate document on thumbnails!

- - +
This section explains how to read @@ -56,19 +53,20 @@

The first thing you should understand is that properties are stored in separate documents inside the POI filesystem. (If you don't know what a - POI filesystem is, read its documentation.) A document in a POI - filesystem is also called a stream.

+ POI filesystem is, read the POIFS + documentation.) A document in a POI filesystem is also called a + stream.

The following example shows how to read a POI filesystem's "title" property. Reading other properties is similar. Consider the API - documentation of org.apache.poi.hpsf.SummaryInformation.

+ documentation of org.apache.poi.hpsf.SummaryInformation to + learn which methods are available!

-

The standard properties this section focusses on can be - found in a document called \005SummaryInformation in the root of - the POI filesystem. The notation \005 in the document's name - means the character with the decimal value of 5. In order to read the - title, an application has to perform the following steps:

+

The standard properties this section focusses on can be found in a + document called \005SummaryInformation located in the root of the + POI filesystem. The notation \005 in the document's name means + the character with the decimal value of 5. In order to read the title, an + application has to perform the following steps:

  1. @@ -76,9 +74,8 @@ of the POI filesystem.

  2. -

    Create an instance of the class - SummaryInformation from that - document.

    +

    Create an instance of the class SummaryInformation from + that document.

  3. Call the SummaryInformation instance's @@ -96,7 +93,10 @@ (POIFS) proceeds as shown by the following code fragment. (The full source code of the sample application is available in the examples section of the POI source tree as - ReadTitle.java.)

    + ReadTitle.java.

    + + I just found out that ReadTitle.java is no longer there! I + shall look it up in the CVS and try to restore it. import java.io.*; @@ -141,7 +141,7 @@ r.registerListener(new MyPOIFSReaderListener(), processPOIFSReaderEvent method. The eventing POI filesystem calls this method when it finds the \005SummaryInformation document. In the sample application MyPOIFSReaderListener is - a static class in the ReadTitle.java source file.)

    + a static class in the ReadTitle.java source file.

    Now everything is prepared and reading the POI filesystem can start:

    @@ -209,10 +209,10 @@ static class MyPOIFSReaderListener implements POIFSReaderListener case that the POI filesystem does not have a title.

    final String title = si.getTitle(); - if (title != null) - System.out.println("Title: \"" + title + "\""); - else - System.out.println("Document has no title."); +if (title != null) + System.out.println("Title: \"" + title + "\""); +else + System.out.println("Document has no title.");

    Please note that a Microsoft Office document does not necessarily contain the \005SummaryInformation stream. The documents created @@ -249,7 +249,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener

    And of course you cannot call getTitle() because DocumentSummaryInformation has different query methods. See - the API documentation for the details!

    + the Javadoc API documentation for the details!

    In the previous section the application simply caught all exceptions and was in no way interested in any @@ -259,17 +259,19 @@ static class MyPOIFSReaderListener implements POIFSReaderListener

    NoPropertySetStreamException:
    -

    This exception is thrown if the application tries to create a - PropertySet or one of its subclasses - SummaryInformation and - DocumentSummaryInformation from a stream that is not a - property set stream. A faulty property set stream counts as not being a - property set stream at all. An application should be prepared to deal - with this case even if opens streams named +

    +

    This exception is thrown if the application tries to create a + PropertySet instance from a stream that is not a + property set stream. (SummaryInformation and + DocumentSummaryInformation are subclasses of + PropertySet.) A faulty property set stream counts as not + being a property set stream at all. An application should be prepared to + deal with this case even if it opens streams named \005SummaryInformation or \005DocumentSummaryInformation only. These are just names. A stream's name by itself does not ensure that the stream contains the - expected contents and that this contents is correct.

    + expected contents and that this contents is correct.

    +
    UnexpectedPropertySetTypeException

    This exception is thrown if a certain type of property set is @@ -292,7 +294,7 @@ static class MyPOIFSReaderListener implements POIFSReaderListener document. Embedded objects may have property sets of their own. An application can open these property set streams as described above. The only difference is that they are not located in the POI filesystem's root - but in a nested directory instead. Just register a + but in a nested directory instead. Just register a POIFSReaderListener for the property set streams you are interested in. For example, the POIBrowser application in the contrib section tries to open each and every document in a POI filesystem @@ -303,12 +305,49 @@ static class MyPOIFSReaderListener implements POIFSReaderListener

    - This section tells how to read - non-standard properties. Non-standard properties are application-specific - name/value/type triples. + This section tells how to read non-standard properties. Non-standard + properties are application-specific name/type/value triples. - Write this section! +

    Now comes the really hardcode stuff. As mentioned above, + SummaryInformation and + DocumentSummaryInformation are just special cases of the + general concept of a property set. The general concept says that a + property set consists of properties. Each property is an + entity that has a name, a type, and a + value.

    + +

    Okay, that was still rather easy. However, to make things more + complicated Microsoft in its infinite wisdom decided that a property set + shalt be broken into sections. Each section holds a bunch + of properties. But since that's still not complicated enough: a section + can optionally have a dictionary that maps property IDs to property + names - we'll explain later what that means.

    + + [To be continued.] + + Let's consider a Java application that wants to read a stream + containing a general property set. It is modelled by the class + PropertySet in the org.apache.poi.hpsf + package.
+ + diff --git a/src/documentation/xdocs/hpsf/index.xml b/src/documentation/xdocs/hpsf/index.xml index 1842b5a85..0324452ca 100644 --- a/src/documentation/xdocs/hpsf/index.xml +++ b/src/documentation/xdocs/hpsf/index.xml @@ -7,27 +7,30 @@ HPSF (Horrible Property Set Format) Overview - +
-

Microsoft applications like "Word" or "Excel" let the user describe his - document by properties like "title", "category" and so on. The application - itself adds further information: last author, creation date etc. These - properties are stored in so-called property set streams. A - property set stream is a separate document within a POI filesystem. HPSF is POI's pure-Java - implementation to read (and in future to write) property set streams.

+

Microsoft applications like "Word", "Excel" or "Powerpoint" let the user + describe his document by properties like "title", "category" and so on. The + application itself adds further information: last author, creation date + etc. These document properties are stored in so-called property set + streams. A property set stream is a separate document within a + POI filesystem. We'll call property + set streams mostly just "property sets". HPSF is POI's pure-Java + implementation to read (and in future to write) property sets.

The HPSF HOWTO describes what a Java application should do to read a property set using HPSF and to retrieve the information it needs.

-

HPSF supports OLE2 property set streams in general, not only the special - case of document properties mentioned above. The HPSF description describes the internal - structure of property set streams.

+

HPSF supports OLE2 property set streams in general, and is not limited to + the special case of document properties in the Microsoft Office files + mentioned above. The HPSF description + describes the internal structure of property set streams. A separate + document explains the internal of thumbnail + images.

diff --git a/src/documentation/xdocs/hpsf/thumbnails.xml b/src/documentation/xdocs/hpsf/thumbnails.xml index 5e80945d2..032736727 100644 --- a/src/documentation/xdocs/hpsf/thumbnails.xml +++ b/src/documentation/xdocs/hpsf/thumbnails.xml @@ -13,21 +13,17 @@
-

- Thumbnail information is stored as a VT_CF, or Thumbnail Variant. - The Thumbnail Variant is used to store various types of information - in a clipboard. The VT_CF can store information in formats for the - Macintosh or Windows clipboard. -

+

Thumbnail information is stored as a VT_CF, or Thumbnail Variant. The + Thumbnail Variant is used to store various types of information in a + clipboard. The VT_CF can store information in formats for the Macintosh or + Windows clipboard.

+ +

There are many types of data that can be copied to the clipboard, but the + only types of information needed for thumbnail manipulation are the image + formats.

-

- There are many types of data that can be copied to the clipboard, - but the only types of information needed for thumbnail manipulation are - the image formats. -

-

The VT_CF structure looks like this:

- + @@ -43,11 +39,9 @@
Element:
-

- The Clipboard Size refers to the size (in bytes) of Clipboard Data - (variable size) plus the Clipboard Format (four bytes). -

- +

The Clipboard Size refers to the size (in bytes) of Clipboard Data + (variable size) plus the Clipboard Format (four bytes).

+

Clipboard Format Tag has four possible values:

@@ -83,16 +77,14 @@
-

- Windows clipboard data has four image formats for thumbnails: -

- +

Windows clipboard data has four image formats for thumbnails:

+
- + @@ -102,75 +94,89 @@ - + - + - +
Value Identifier Description
3 CF_METAFILEPICT8 CF_DIB Device Independent Bitmap
14 CF_ENHMETAFILE Enhanced Windows metafile format
2 CF_BITMAP Bitmap - Obsolete - Use CF_DIB instead
- -
- -

- The most common format for thumbnails on the Windows platform - is the Windows metafile format. The Clipboard places and extra - header in front of a the standard Windows Metafile Format data. -

- -

- The Clipboard Data byte array looks like this when an image is - stored in Windows' Clipboard WMF format. -

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
IdentifierCF_METAFILEPICTmmwidthheighthandleWMF data
Size32 bit unsigned int16 bit unsigned(?) int16 bit unsigned(?) int16 bit unsigned(?) int16 bit unsigned(?) intbyte array - variable length
DescriptionClipboard WMFMapping ModeImage WidthImage Heighthandle to the WMF data array in memory, or 0standard WMF byte stream
-
- - - -
-

FIXME: Document Device Independent Bitmap format

-
- + +
+ +

The most common format for thumbnails on the Windows platform is the + Windows metafile format. The Clipboard places and extra header in front of + a the standard Windows Metafile Format data.

+ +

The Clipboard Data byte array looks like this when an image is stored in + Windows' Clipboard WMF format.

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
IdentifierCF_METAFILEPICTmmwidthheighthandleWMF data
Size32 bit unsigned int16 bit unsigned(?) int16 bit unsigned(?) int16 bit unsigned(?) int16 bit unsigned(?) intbyte array - variable length
DescriptionClipboard WMFMapping ModeImage WidthImage Heighthandle to the WMF data array in memory, or 0standard WMF byte stream
+
+ + +
+

FIXME: Describe the Device Independent Bitmap + format!

+
+
-

FIXME: Document Macintosh clipboard formats.

+

FIXME: Describe the Macintosh clipboard formats!

+ +