HPSF: codepage support added

git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353460 13f79535-47bb-0310-9956-ffa450edef68
2003-12-02 17:46:01 +00:00 · 2003-12-02 17:46:01 +00:00 · 131bb9d0bd
commit 131bb9d0bd
parent 6385296f3f
15 changed files with 276 additions and 103 deletions
--- a/src/documentation/content/xdocs/changes.xml
+++ b/src/documentation/content/xdocs/changes.xml
@ -12,7 +12,11 @@
        <person id="MJ" name="Marc Johnson" email="mjohnson@apache.org"/>
        <person id="NKB" name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
        <person id="POI-DEVELOPERS" name="POI Developers" email="poi-dev@jakarta.apache.org"/>
+        <person id="RK" name="Rainer Klute" email="klute@apache.org"/>
    </devs>
+    <release version="2.0-pre3" date="unreleased">
+        <action dev="RK" type="add">HPSF: Much better codepage support</action>
+    </release>
    <release version="2.0-pre1" date="unreleased">
        <action dev="POI-DEVELOPERS" type="add">Patch applied for deep cloning of worksheets was provided</action>
        <action dev="POI-DEVELOPERS" type="add">Patch applied to allow sheet reordering</action>
--- a/src/documentation/content/xdocs/hpsf/how-to.xml
+++ b/src/documentation/content/xdocs/hpsf/how-to.xml
@ -708,8 +708,9 @@ No property set stream: "/1Table"</source>
       <td>The property's value is the number of a <strong>codepage</strong>,
        i.e. a mapping from character codes to characters. All strings in the
        section containing this property must be interpreted using this
-        codepage. Typical property values are 1252 (8-bit "western" characters)
-        or 1200 (16-bit Unicode characters).</td>
+        codepage. Typical property values are 1252 (8-bit "western" characters,
+	ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit
+	Unicode characters, UFT-8).</td>
      </tr>
     </table>
    </section>
@ -833,18 +834,34 @@ No property set stream: "/1Table"</source>
    </section>

    <section><title>Codepage support</title>
-     <fixme author="Rainer Klute">Improve codepage support!</fixme>

     <p>The property with ID 1 holds the number of the codepage which was used
-      to encode the strings in this section. The present HPSF codepage support
-      is still very limited: When reading property value strings, HPSF
-      distinguishes between 16-bit characters and 8-bit characters. 16-bit
-      characters should be Unicode characters and thus be okay. 8-bit
-      characters are interpreted according to the platform's default character
-      set. This is fine as long as the document being read has been written on
-      a platform with the same default character set. However, if you receive a
-      document from another region of the world and want to process it with
-      HPSF you are in trouble - unless the creator used Unicode, of course.</p>
+      to encode the strings in this section. If this property is not available
+      in a section, the platform's default character encoding will be
+      used. This works fine as long as the document being read has been written
+      on a platform with the same default character encoding. However, if you
+      receive a document from another region of the world and the codepage is
+      undefined, you are in trouble.</p>
+
+     <p>HPSF's codepage support is as good as the character encoding support of
+      the Java Virtual Machine (JVM) the application runs on. If HPSF
+      encounters a codepage number it assumes that the JVM has a character
+      encoding with a corresponding name. For example, if the codepage is 1252,
+      HPSF uses the character encoding "cp1252" to read or write strings. If
+      the JVM does not have that character encoding installed or if the
+      codepage number is illegal, an UnsupportedEncodingException will be
+      thrown.</p>
+
+     <p>There are two exceptions to the rule that a character encoding's name
+      is derived from the codepage number by prepending the string "cp" to
+      it:</p>
+
+     <dl>
+      <dt>Codepage 1200</dt>
+      <dd>is mapped to the character encoding "UTF-16".</dd>
+      <dt>Codepage 65001</dt>
+      <dd>is mapped to the character encoding "UTF-8".</dd>
+     </dl>
    </section>
   </section>

--- a/src/documentation/content/xdocs/hpsf/internals.xml
+++ b/src/documentation/content/xdocs/hpsf/internals.xml
@ -944,6 +944,60 @@



+   <section>
+    <title>The Dictionary</title>
+
+    <p>What a dictionary is good for is explained in the <link
+      href="how-to.html">HPSF HOW-TO</link>. This chapter explains how it is
+     organized internally.</p>
+
+    <p>The dictionary has a simple header consisting of a single UInt value. It
+    tells how many entries the dictionary comprises:</p>
+
+    <table>
+     <tr>
+      <th>Name</th>
+      <th>Data type</th>
+      <th>Description</th>
+     </tr>
+     <tr>
+      <td>nrEntries</td>
+      <th>UInt</th>
+      <td>Number of dictionary entries</td>
+     </tr>
+    </table>
+
+    <p>The dictionary entries follow the header. Each one looks like this:</p>
+
+    <table>
+     <tr>
+      <th>Name</th>
+      <td>Data type</td>
+      <th>Description</th>
+     </tr>
+     <tr>
+      <td>key</td>
+      <td>UInt</td>
+      <td>The unique number of this property, i.e. the PID</td>
+     </tr>
+     <tr>
+      <td>length</td>
+      <td>UInt</td>
+      <td>The length of the property name associated with the key</td>
+     </tr>
+     <tr>
+      <td>value</td>
+      <td>String</td>
+      <td>The property's name, terminated with a 0x00 character</td>
+     </tr>
+    </table>
+
+    <p>The entries are not aligned, i.e. each one follows its predecessor
+     without any gap or fill characters.</p>
+   </section>
+
+
+
   <section><title>References</title>

    <p>In order to assemble the HPSF description I used information publically
--- a/src/documentation/content/xdocs/hpsf/todo.xml
+++ b/src/documentation/content/xdocs/hpsf/todo.xml
@ -20,11 +20,6 @@
     easily writing summary information streams and document summary
     information streams.
    </li>
-    <li>
-     Add codepage support: Presently the bytes making out the string in a
-      property's value are interpreted using the platform's default character
-      set.
-    </li>
    <li>
     Add resource bundles to
     <code>org.apache.poi.hpsf.wellknown</code> to ease
@ -38,8 +33,8 @@
     arrays.
    </li>
    <li>
-     Add WMF to <code>java.awt.Image</code> example code in <link
-     href="thumbnails.html">Thumbnail HOW TO</link>.
+     Add WMF to <code>java.awt.Image</code> example code in the <link
+      href="thumbnails.html">Thumbnail HOW-TO</link>.
    </li>
   </ol>
  </section>
--- a/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java
+++ b/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java
@ -558,7 +558,10 @@ public class CopyCompare
                 * exists. However, since we have full control about directory
                 * creation we can ensure that this will never happen. */
                ex.printStackTrace(System.err);
-                throw new RuntimeException(ex);
+                throw new RuntimeException(ex.toString());
+                /* FIXME (2): Replace the previous line by the following once we
+                 * no longer need JDK 1.3 compatibility. */
+                // throw new RuntimeException(ex);
            }
        }
    }
--- a/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java
+++ b/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java
@ -444,7 +444,10 @@ public class WriteAuthorAndTitle
                 * exists. However, since we have full control about directory
                 * creation we can ensure that this will never happen. */
                ex.printStackTrace(System.err);
-                throw new RuntimeException(ex);
+                throw new RuntimeException(ex.toString());
+                /* FIXME (2): Replace the previous line by the following once we
+                 * no longer need JDK 1.3 compatibility. */
+                // throw new RuntimeException(ex);
            }
        }
    }
--- a/src/java/org/apache/poi/hpsf/MutableProperty.java
+++ b/src/java/org/apache/poi/hpsf/MutableProperty.java
@ -80,19 +80,20 @@ public class MutableProperty extends Property
     * <p>Writes the property to an output stream.</p>
     * 
     * @param out The output stream to write to.
+     * @param codepage The codepage to use for writing non-wide strings
     * @return the number of bytes written to the stream
     * 
     * @exception IOException if an I/O error occurs
     * @exception WritingNotSupportedException if a variant type is to be
     * written that is not yet supported
     */
-    public int write(final OutputStream out)
+    public int write(final OutputStream out, final int codepage)
        throws IOException, WritingNotSupportedException
    {
        int length = 0;
        long variantType = getType();
        length += TypeWriter.writeUIntToStream(out, variantType);
-        length += VariantSupport.write(out, variantType, getValue());
+        length += VariantSupport.write(out, variantType, getValue(), codepage);
        return length;
    }

--- a/src/java/org/apache/poi/hpsf/MutableSection.java
+++ b/src/java/org/apache/poi/hpsf/MutableSection.java
@ -420,16 +420,16 @@ public class MutableSection extends Section

            /* If the property ID is not equal 0 we write the property and all
             * is fine. However, if it equals 0 we have to write the section's
-             * dictionary which does not have a type but just a value. */
+             * dictionary which has an implicit type only and an explicit
+             * value. */
            if (id != 0)
                /* Write the property and update the position to the next
                 * property. */
-                position += p.write(propertyStream);
+                position += p.write(propertyStream, getCodepage());
            else
            {
-                final Integer codepage =
-                    (Integer) getProperty(PropertyIDMap.PID_CODEPAGE);
-                if (codepage == null)
+                final int codepage = getCodepage();
+                if (codepage == -1)
                    throw new IllegalPropertySetDataException
                        ("Codepage (property 1) is undefined.");
                position += writeDictionary(propertyStream, dictionary);
--- a/src/java/org/apache/poi/hpsf/Property.java
+++ b/src/java/org/apache/poi/hpsf/Property.java
@ -62,9 +62,11 @@
 */
 package org.apache.poi.hpsf;

+import java.io.UnsupportedEncodingException;
 import java.util.HashMap;
 import java.util.Map;

+import org.apache.poi.util.HexDump;
 import org.apache.poi.util.LittleEndian;

 /**
@ -161,9 +163,13 @@ public class Property
     * @param length The property's type/value pair's length in bytes.
     * @param codepage The section's and thus the property's
     * codepage. It is needed only when reading string values.
+     * 
+     * @exception UnsupportedEncodingException if the specified codepage is not
+     * supported
     */
    public Property(final long id, final byte[] src, final long offset,
                    final int length, final int codepage)
+    throws UnsupportedEncodingException
    {
        this.id = id;

@ -183,7 +189,7 @@ public class Property

        try
        {
-            value = VariantSupport.read(src, o, length, (int) type);
+            value = VariantSupport.read(src, o, length, (int) type, codepage);
        }
        catch (UnsupportedVariantTypeException ex)
        {
@ -382,8 +388,27 @@ public class Property
        b.append(getID());
        b.append(", type: ");
        b.append(getType());
+        final Object value = getValue();
        b.append(", value: ");
-        b.append(getValue());
+        b.append(value.toString());
+        if (value instanceof String)
+        {
+            final String s = (String) value;
+            final int l = s.length();
+            final byte[] bytes = new byte[l * 2];
+            for (int i = 0; i < l; i++)
+            {
+                final char c = s.charAt(i);
+                final byte high = (byte) ((c & 0x00ff00) >> 8);
+                final byte low  = (byte) ((c & 0x0000ff) >> 0);
+                bytes[i * 2]     = high;
+                bytes[i * 2 + 1] = low;
+            }
+            final String hex = HexDump.dump(bytes, 0L, 0);
+            b.append(" [");
+            b.append(hex);
+            b.append("]");
+        }
        b.append(']');
        return b.toString();
    }
--- a/src/java/org/apache/poi/hpsf/PropertySet.java
+++ b/src/java/org/apache/poi/hpsf/PropertySet.java
@ -56,6 +56,7 @@ package org.apache.poi.hpsf;

 import java.io.IOException;
 import java.io.InputStream;
+import java.io.UnsupportedEncodingException;
 import java.util.ArrayList;
 import java.util.List;

@ -300,9 +301,11 @@ public class PropertySet
     * @param length The length of the stream data.
     * @throws NoPropertySetStreamException if the byte array is not a
     * property set stream.
+     * 
+     * @exception UnsupportedEncodingException if the codepage is not supported
     */
    public PropertySet(final byte[] stream, final int offset, final int length)
-        throws NoPropertySetStreamException
+        throws NoPropertySetStreamException, UnsupportedEncodingException
    {
        if (isPropertySetStream(stream, offset, length))
            init(stream, offset, length);
@ -321,8 +324,11 @@ public class PropertySet
     * complete byte array contents is the stream data.
     * @throws NoPropertySetStreamException if the byte array is not a
     * property set stream.
+     * 
+     * @exception UnsupportedEncodingException if the codepage is not supported
     */
-    public PropertySet(final byte[] stream) throws NoPropertySetStreamException
+    public PropertySet(final byte[] stream)
+    throws NoPropertySetStreamException, UnsupportedEncodingException
    {
        this(stream, 0, stream.length);
    }
@ -435,6 +441,7 @@ public class PropertySet
     * @param length Length of the property set stream.
     */
    private void init(final byte[] src, final int offset, final int length)
+    throws UnsupportedEncodingException
    {
        /* FIXME (3): Ensure that at most "length" bytes are read. */
        
@ -651,7 +658,7 @@ public class PropertySet
        final PropertySet ps = (PropertySet) o;
        int byteOrder1 = ps.getByteOrder();
        int byteOrder2 = getByteOrder();
-        ClassID classId1 = ps.getClassID();
+        ClassID classID1 = ps.getClassID();
        ClassID classID2 = getClassID();
        int format1 = ps.getFormat();
        int format2 = getFormat();
@ -660,7 +667,7 @@ public class PropertySet
        int sectionCount1 = ps.getSectionCount();
        int sectionCount2 = getSectionCount();
        if (byteOrder1 != byteOrder2      ||
-            !classId1.equals(classID2)    ||
+            !classID1.equals(classID2)    ||
            format1 != format2            ||
            osVersion1 != osVersion2      ||
            sectionCount1 != sectionCount2)
--- a/src/java/org/apache/poi/hpsf/Section.java
+++ b/src/java/org/apache/poi/hpsf/Section.java
@ -54,6 +54,7 @@
 */
 package org.apache.poi.hpsf;

+import java.io.UnsupportedEncodingException;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Iterator;
@ -193,8 +194,12 @@ public class Section
     * @param src Contains the complete property set stream.
     * @param offset The position in the stream that points to the
     * section's format ID.
+     * 
+     * @exception UnsupportedEncodingException if the section's codepage is not
+     * supported.
     */
    public Section(final byte[] src, final int offset)
+    throws UnsupportedEncodingException
    {
        int o1 = offset;

@ -638,4 +643,18 @@ public class Section
        return dictionary;
    }

+
+
+    /**
+     * <p>Gets the section's codepage, if any.</p>
+     *
+     * @return The section's codepage if one is defined, else -1.
+     */
+    public int getCodepage()
+    {
+        final Integer codepage =
+            (Integer) getProperty(PropertyIDMap.PID_CODEPAGE);
+        return codepage != null ? codepage.intValue() : -1;
+    }
+
 }
--- a/src/java/org/apache/poi/hpsf/TypeWriter.java
+++ b/src/java/org/apache/poi/hpsf/TypeWriter.java
@ -185,7 +185,8 @@ public class TypeWriter
     * @exception IOException if an I/O error occurs
     */
    public static void writeToStream(final OutputStream out,
-                                     final Property[] properties)
+                                     final Property[] properties,
+                                     final int codepage)
        throws IOException, UnsupportedVariantTypeException
    {
        /* If there are no properties don't write anything. */
@ -207,7 +208,7 @@ public class TypeWriter
            final Property p = (Property) properties[i];
            long type = p.getType();
            writeUIntToStream(out, type);
-            VariantSupport.write(out, (int) type, p.getValue());
+            VariantSupport.write(out, (int) type, p.getValue(), codepage);
        }
    }

--- a/src/java/org/apache/poi/hpsf/VariantSupport.java
+++ b/src/java/org/apache/poi/hpsf/VariantSupport.java
@ -64,6 +64,7 @@ package org.apache.poi.hpsf;

 import java.io.IOException;
 import java.io.OutputStream;
+import java.io.UnsupportedEncodingException;
 import java.util.Date;
 import java.util.LinkedList;
 import java.util.List;
@ -163,17 +164,21 @@ public class VariantSupport extends Variant
     * @param length The length of the variant including the variant
     * type field
     * @param type The variant type to read
+     * @param codepage The codepage to use to write non-wide strings
     * @return A Java object that corresponds best to the variant
     * field. For example, a VT_I4 is returned as a {@link Long}, a
     * VT_LPSTR as a {@link String}.
     * @exception ReadingNotSupportedException if a property is to be written
     * who's variant type HPSF does not yet support
+     * @exception UnsupportedEncodingException if the specified codepage is not
+     * supported
     *
     * @see Variant
     */
    public static Object read(final byte[] src, final int offset,
-                              final int length, final long type)
-        throws ReadingNotSupportedException
+                              final int length, final long type,
+                              final int codepage)
+        throws ReadingNotSupportedException, UnsupportedEncodingException
    {
        Object value;
        int o1 = offset;
@ -221,18 +226,18 @@ public class VariantSupport extends Variant
                 * Read a byte string. In Java it is represented as a
                 * String object. The 0x00 bytes at the end must be
                 * stripped.
-                 *
-                 * FIXME (2): Reading an 8-bit string should pay attention
-                 * to the codepage. Currently the byte making out the
-                 * property's value are interpreted according to the
-                 * platform's default character set.
                 */
                final int first = o1 + LittleEndian.INT_SIZE;
                long last = first + LittleEndian.getUInt(src, o1) - 1;
                o1 += LittleEndian.INT_SIZE;
+                final int rawLength = (int) (last - first + 1);
                while (src[(int) last] == 0 && first <= last)
                    last--;
-                value = new String(src, (int) first, (int) (last - first + 1));
+                final int l = (int) (last - first + 1);
+                value = codepage != -1 ?
+                    new String(src, (int) first, l,
+                               codepageToEncoding(codepage)) :
+                    new String(src, (int) first, l);
                break;
            }
            case Variant.VT_LPWSTR:
@ -298,6 +303,38 @@ public class VariantSupport extends Variant



+    /**
+     * <p>Turns a codepage number into the equivalent character encoding's 
+     * name.</p>
+     *
+     * @param codepage The codepage number
+     * 
+     * @return The character encoding's name. If the codepage number is 65001, 
+     * the encoding name is "UTF-8". All other positive numbers are mapped to
+     * "cp" followed by the number, e.g. if the codepage number is 1252 the 
+     * returned character encoding name will be "cp1252".
+     * 
+     * @exception UnsupportedEncodingException if the specified codepage is
+     * less than zero.
+     */
+    public static String codepageToEncoding(final int codepage)
+    throws UnsupportedEncodingException
+    {
+        if (codepage <= 0)
+            throw new UnsupportedEncodingException
+                ("Codepage number may not be " + codepage);
+        switch (codepage)
+        {
+            case 1200:
+                return "UTF-16";
+            case 65001:
+                return "UTF-8";
+            default:
+                return "cp" + codepage;
+        }
+    }
+
+
    /**
     * <p>Writes a variant value to an output stream. This method ensures that
     * always a multiple of 4 bytes is written.</p>
@ -305,6 +342,7 @@ public class VariantSupport extends Variant
     * @param out The stream to write the value to.
     * @param type The variant's type.
     * @param value The variant's value.
+     * @param codepage The codepage to use to write non-wide strings
     * @return The number of entities that have been written. In many cases an
     * "entity" is a byte but this is not always the case.
     * @exception IOException if an I/O exceptions occurs
@ -312,7 +350,7 @@ public class VariantSupport extends Variant
     * who's variant type HPSF does not yet support
     */
    public static int write(final OutputStream out, final long type,
-                            final Object value)
+                            final Object value, final int codepage)
        throws IOException, WritingNotSupportedException
    {
        int length = 0;
@ -330,16 +368,13 @@ public class VariantSupport extends Variant
            }
            case Variant.VT_LPSTR:
            {
-                length = TypeWriter.writeUIntToStream
-                    (out, ((String) value).length() + 1);
-                char[] s = Util.pad4((String) value);
-                /* FIXME (2): The following line forces characters to bytes.
-                 * This is generally wrong and should only be done according to
-                 * a codepage. Alternatively Unicode could be written (see 
-                 * Variant.VT_LPWSTR). */
-                byte[] b = new byte[s.length + 1];
-                for (int i = 0; i < s.length; i++)
-                    b[i] = (byte) s[i];
+                final byte[] bytes =
+                    (codepage == -1 ?
+                    ((String) value).getBytes() :
+                    ((String) value).getBytes(codepageToEncoding(codepage)));
+                length = TypeWriter.writeUIntToStream(out, bytes.length + 1);
+                final byte[] b = new byte[bytes.length + 1];
+                System.arraycopy(bytes, 0, b, 0, bytes.length);
                b[b.length - 1] = 0x00;
                out.write(b);
                length += b.length;
@ -419,12 +454,13 @@ public class VariantSupport extends Variant
            }
        }

-        /* Add 0x00 character to write a multiple of four bytes: */
-        while (length % 4 != 0)
-        {
-            out.write(0);
-            length++;
-        }
+        /* Add 0x00 characters to write a multiple of four bytes: */
+        // FIXME (1) Try this!
+//        while (length % 4 != 0)
+//        {
+//            out.write(0);
+//            length++;
+//        }
        return length;
    }

--- a/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java
+++ b/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java
@ -357,7 +357,10 @@ public class TestWrite extends TestCase
                    catch (Exception ex)
                    {
                        ex.printStackTrace();
-                        throw new RuntimeException(ex);
+                        throw new RuntimeException(ex.toString());
+                        /* FIXME (2): Replace the previous line by the following
+                         * one once we no longer need JDK 1.3 compatibility. */
+                        // throw new RuntimeException(ex);
                    }
                }
            },
@ -398,37 +401,40 @@ public class TestWrite extends TestCase
    public void testVariantTypes()
    {
        Throwable t = null;
+        final int codepage = -1;
+        /* FIXME (2): Add tests for various codepages! */
        try
        {
-            check(Variant.VT_EMPTY, null);
-            check(Variant.VT_BOOL, new Boolean(true));
-            check(Variant.VT_BOOL, new Boolean(false));
-            check(Variant.VT_CF, new byte[]{0});
-            check(Variant.VT_CF, new byte[]{0, 1});
-            check(Variant.VT_CF, new byte[]{0, 1, 2});
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3});
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4});
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5});
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
-            check(Variant.VT_I2, new Integer(27));
-            check(Variant.VT_I4, new Long(28));
-            check(Variant.VT_FILETIME, new Date());
-            check(Variant.VT_LPSTR, "");
-            check(Variant.VT_LPSTR, "ä");
-            check(Variant.VT_LPSTR, "äö");
-            check(Variant.VT_LPSTR, "äöü");
-            check(Variant.VT_LPSTR, "äöüÄ");
-            check(Variant.VT_LPSTR, "äöüÄÖ");
-            check(Variant.VT_LPSTR, "äöüÄÖÜ");
-            check(Variant.VT_LPSTR, "äöüÄÖÜß");
-            check(Variant.VT_LPWSTR, "");
-            check(Variant.VT_LPWSTR, "ä");
-            check(Variant.VT_LPWSTR, "äö");
-            check(Variant.VT_LPWSTR, "äöü");
-            check(Variant.VT_LPWSTR, "äöüÄ");
-            check(Variant.VT_LPWSTR, "äöüÄÖ");
-            check(Variant.VT_LPWSTR, "äöüÄÖÜ");
-            check(Variant.VT_LPWSTR, "äöüÄÖÜß");
+            check(Variant.VT_EMPTY, null, codepage);
+            check(Variant.VT_BOOL, new Boolean(true), codepage);
+            check(Variant.VT_BOOL, new Boolean(false), codepage);
+            check(Variant.VT_CF, new byte[]{0}, codepage);
+            check(Variant.VT_CF, new byte[]{0, 1}, codepage);
+            check(Variant.VT_CF, new byte[]{0, 1, 2}, codepage);
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3}, codepage);
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4}, codepage);
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5}, codepage);
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 
+                  codepage);
+            check(Variant.VT_I2, new Integer(27), codepage);
+            check(Variant.VT_I4, new Long(28), codepage);
+            check(Variant.VT_FILETIME, new Date(), codepage);
+            check(Variant.VT_LPSTR, "", codepage);
+            check(Variant.VT_LPSTR, "ä", codepage);
+            check(Variant.VT_LPSTR, "äö", codepage);
+            check(Variant.VT_LPSTR, "äöü", codepage);
+            check(Variant.VT_LPSTR, "äöüÄ", codepage);
+            check(Variant.VT_LPSTR, "äöüÄÖ", codepage);
+            check(Variant.VT_LPSTR, "äöüÄÖÜ", codepage);
+            check(Variant.VT_LPSTR, "äöüÄÖÜß", codepage);
+            check(Variant.VT_LPWSTR, "", codepage);
+            check(Variant.VT_LPWSTR, "ä", codepage);
+            check(Variant.VT_LPWSTR, "äö", codepage);
+            check(Variant.VT_LPWSTR, "äöü", codepage);
+            check(Variant.VT_LPWSTR, "äöüÄ", codepage);
+            check(Variant.VT_LPWSTR, "äöüÄÖ", codepage);
+            check(Variant.VT_LPWSTR, "äöüÄÖÜ", codepage);
+            check(Variant.VT_LPWSTR, "äöüÄÖÜß", codepage);
        }
        catch (Exception ex)
        {
@ -466,20 +472,22 @@ public class TestWrite extends TestCase
     * @throws UnsupportedVariantTypeException if the variant is not supported.
     * @throws IOException if an I/O exception occurs.
     */
-    private void check(final long variantType, final Object value)
+    private void check(final long variantType, final Object value, 
+                       final int codepage)
        throws UnsupportedVariantTypeException, IOException
    {
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
-        VariantSupport.write(out, variantType, value);
+        VariantSupport.write(out, variantType, value, codepage);
        out.close();
        final byte[] b = out.toByteArray();
        final Object objRead =
            VariantSupport.read(b, 0, b.length + LittleEndian.INT_SIZE,
-                                variantType);
+                                variantType, -1);
        if (objRead instanceof byte[])
        {
-            final int diff = diff(org.apache.poi.hpsf.Util.pad4
-                ((byte[]) value), (byte[]) objRead);
+//            final int diff = diff(org.apache.poi.hpsf.Util.pad4
+//                ((byte[]) value), (byte[]) objRead);
+            final int diff = diff((byte[]) value, (byte[]) objRead);
            if (diff >= 0)
                fail("Byte arrays are different. First different byte is at " +
                     "index " + diff + ".");
--- a/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc
+++ b/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc