HPSF: codepage support added

git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353460 13f79535-47bb-0310-9956-ffa450edef68
2003-12-02 17:46:01 +00:00 · 2003-12-02 17:46:01 +00:00 · 131bb9d0bd
commit 131bb9d0bd
parent 6385296f3f
15 changed files with 276 additions and 103 deletions
--- a/src/documentation/content/xdocs/changes.xml
+++ b/src/documentation/content/xdocs/changes.xml
@ -12,7 +12,11 @@
        <person id="MJ" name="Marc Johnson" email="mjohnson@apache.org"/>
        <person id="NKB" name="Nicola Ken Barozzi" email="barozzi@nicolaken.com"/>
        <person id="POI-DEVELOPERS" name="POI Developers" email="poi-dev@jakarta.apache.org"/>
        <person id="RK" name="Rainer Klute" email="klute@apache.org"/>
    </devs>
    <release version="2.0-pre3" date="unreleased">
        <action dev="RK" type="add">HPSF: Much better codepage support</action>
    </release>
    <release version="2.0-pre1" date="unreleased">
        <action dev="POI-DEVELOPERS" type="add">Patch applied for deep cloning of worksheets was provided</action>
        <action dev="POI-DEVELOPERS" type="add">Patch applied to allow sheet reordering</action>
--- a/src/documentation/content/xdocs/hpsf/how-to.xml
+++ b/src/documentation/content/xdocs/hpsf/how-to.xml
@ -708,8 +708,9 @@ No property set stream: "/1Table"</source>
       <td>The property's value is the number of a <strong>codepage</strong>,
        i.e. a mapping from character codes to characters. All strings in the
        section containing this property must be interpreted using this
-        codepage. Typical property values are 1252 (8-bit "western" characters)
+        codepage. Typical property values are 1252 (8-bit "western" characters,
-        or 1200 (16-bit Unicode characters).</td>
+	ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit
 	Unicode characters, UFT-8).</td>
      </tr>
     </table>
    </section>
@ -833,18 +834,34 @@ No property set stream: "/1Table"</source>
    </section>
    <section><title>Codepage support</title>
     <fixme author="Rainer Klute">Improve codepage support!</fixme>
     <p>The property with ID 1 holds the number of the codepage which was used
-      to encode the strings in this section. The present HPSF codepage support
+      to encode the strings in this section. If this property is not available
-      is still very limited: When reading property value strings, HPSF
+      in a section, the platform's default character encoding will be
-      distinguishes between 16-bit characters and 8-bit characters. 16-bit
+      used. This works fine as long as the document being read has been written
-      characters should be Unicode characters and thus be okay. 8-bit
+      on a platform with the same default character encoding. However, if you
-      characters are interpreted according to the platform's default character
+      receive a document from another region of the world and the codepage is
-      set. This is fine as long as the document being read has been written on
+      undefined, you are in trouble.</p>
-      a platform with the same default character set. However, if you receive a
+
-      document from another region of the world and want to process it with
+     <p>HPSF's codepage support is as good as the character encoding support of
-      HPSF you are in trouble - unless the creator used Unicode, of course.</p>
+      the Java Virtual Machine (JVM) the application runs on. If HPSF
      encounters a codepage number it assumes that the JVM has a character
      encoding with a corresponding name. For example, if the codepage is 1252,
      HPSF uses the character encoding "cp1252" to read or write strings. If
      the JVM does not have that character encoding installed or if the
      codepage number is illegal, an UnsupportedEncodingException will be
      thrown.</p>
     <p>There are two exceptions to the rule that a character encoding's name
      is derived from the codepage number by prepending the string "cp" to
      it:</p>
     <dl>
      <dt>Codepage 1200</dt>
      <dd>is mapped to the character encoding "UTF-16".</dd>
      <dt>Codepage 65001</dt>
      <dd>is mapped to the character encoding "UTF-8".</dd>
     </dl>
    </section>
   </section>
--- a/src/documentation/content/xdocs/hpsf/internals.xml
+++ b/src/documentation/content/xdocs/hpsf/internals.xml
@ -944,6 +944,60 @@
   <section>
    <title>The Dictionary</title>
    <p>What a dictionary is good for is explained in the <link
      href="how-to.html">HPSF HOW-TO</link>. This chapter explains how it is
     organized internally.</p>
    <p>The dictionary has a simple header consisting of a single UInt value. It
    tells how many entries the dictionary comprises:</p>
    <table>
     <tr>
      <th>Name</th>
      <th>Data type</th>
      <th>Description</th>
     </tr>
     <tr>
      <td>nrEntries</td>
      <th>UInt</th>
      <td>Number of dictionary entries</td>
     </tr>
    </table>
    <p>The dictionary entries follow the header. Each one looks like this:</p>
    <table>
     <tr>
      <th>Name</th>
      <td>Data type</td>
      <th>Description</th>
     </tr>
     <tr>
      <td>key</td>
      <td>UInt</td>
      <td>The unique number of this property, i.e. the PID</td>
     </tr>
     <tr>
      <td>length</td>
      <td>UInt</td>
      <td>The length of the property name associated with the key</td>
     </tr>
     <tr>
      <td>value</td>
      <td>String</td>
      <td>The property's name, terminated with a 0x00 character</td>
     </tr>
    </table>
    <p>The entries are not aligned, i.e. each one follows its predecessor
     without any gap or fill characters.</p>
   </section>
   <section><title>References</title>
    <p>In order to assemble the HPSF description I used information publically
--- a/src/documentation/content/xdocs/hpsf/todo.xml
+++ b/src/documentation/content/xdocs/hpsf/todo.xml
@ -20,11 +20,6 @@
     easily writing summary information streams and document summary
     information streams.
    </li>
    <li>
     Add codepage support: Presently the bytes making out the string in a
      property's value are interpreted using the platform's default character
      set.
    </li>
    <li>
     Add resource bundles to
     <code>org.apache.poi.hpsf.wellknown</code> to ease
@ -38,8 +33,8 @@
     arrays.
    </li>
    <li>
-     Add WMF to <code>java.awt.Image</code> example code in <link
+     Add WMF to <code>java.awt.Image</code> example code in the <link
-     href="thumbnails.html">Thumbnail HOW TO</link>.
+      href="thumbnails.html">Thumbnail HOW-TO</link>.
    </li>
   </ol>
  </section>
--- a/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java
+++ b/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java
@ -558,7 +558,10 @@ public class CopyCompare
                 * exists. However, since we have full control about directory
                 * creation we can ensure that this will never happen. */
                ex.printStackTrace(System.err);
-                throw new RuntimeException(ex);
+                throw new RuntimeException(ex.toString());
                /* FIXME (2): Replace the previous line by the following once we
                 * no longer need JDK 1.3 compatibility. */
                // throw new RuntimeException(ex);
            }
        }
    }
--- a/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java
+++ b/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java
@ -444,7 +444,10 @@ public class WriteAuthorAndTitle
                 * exists. However, since we have full control about directory
                 * creation we can ensure that this will never happen. */
                ex.printStackTrace(System.err);
-                throw new RuntimeException(ex);
+                throw new RuntimeException(ex.toString());
                /* FIXME (2): Replace the previous line by the following once we
                 * no longer need JDK 1.3 compatibility. */
                // throw new RuntimeException(ex);
            }
        }
    }
--- a/src/java/org/apache/poi/hpsf/MutableProperty.java
+++ b/src/java/org/apache/poi/hpsf/MutableProperty.java
@ -80,19 +80,20 @@ public class MutableProperty extends Property
     * <p>Writes the property to an output stream.</p>
     * 
     * @param out The output stream to write to.
     * @param codepage The codepage to use for writing non-wide strings
     * @return the number of bytes written to the stream
     * 
     * @exception IOException if an I/O error occurs
     * @exception WritingNotSupportedException if a variant type is to be
     * written that is not yet supported
     */
-    public int write(final OutputStream out)
+    public int write(final OutputStream out, final int codepage)
        throws IOException, WritingNotSupportedException
    {
        int length = 0;
        long variantType = getType();
        length += TypeWriter.writeUIntToStream(out, variantType);
-        length += VariantSupport.write(out, variantType, getValue());
+        length += VariantSupport.write(out, variantType, getValue(), codepage);
        return length;
    }
--- a/src/java/org/apache/poi/hpsf/MutableSection.java
+++ b/src/java/org/apache/poi/hpsf/MutableSection.java
@ -420,16 +420,16 @@ public class MutableSection extends Section
            /* If the property ID is not equal 0 we write the property and all
             * is fine. However, if it equals 0 we have to write the section's
-             * dictionary which does not have a type but just a value. */
+             * dictionary which has an implicit type only and an explicit
             * value. */
            if (id != 0)
                /* Write the property and update the position to the next
                 * property. */
-                position += p.write(propertyStream);
+                position += p.write(propertyStream, getCodepage());
            else
            {
-                final Integer codepage =
+                final int codepage = getCodepage();
-                    (Integer) getProperty(PropertyIDMap.PID_CODEPAGE);
+                if (codepage == -1)
                if (codepage == null)
                    throw new IllegalPropertySetDataException
                        ("Codepage (property 1) is undefined.");
                position += writeDictionary(propertyStream, dictionary);
--- a/src/java/org/apache/poi/hpsf/Property.java
+++ b/src/java/org/apache/poi/hpsf/Property.java
@ -62,9 +62,11 @@
 */
 package org.apache.poi.hpsf;
 import java.io.UnsupportedEncodingException;
 import java.util.HashMap;
 import java.util.Map;
 import org.apache.poi.util.HexDump;
 import org.apache.poi.util.LittleEndian;
 /**
@ -161,9 +163,13 @@ public class Property
     * @param length The property's type/value pair's length in bytes.
     * @param codepage The section's and thus the property's
     * codepage. It is needed only when reading string values.
     * 
     * @exception UnsupportedEncodingException if the specified codepage is not
     * supported
     */
    public Property(final long id, final byte[] src, final long offset,
                    final int length, final int codepage)
    throws UnsupportedEncodingException
    {
        this.id = id;
@ -183,7 +189,7 @@ public class Property
        try
        {
-            value = VariantSupport.read(src, o, length, (int) type);
+            value = VariantSupport.read(src, o, length, (int) type, codepage);
        }
        catch (UnsupportedVariantTypeException ex)
        {
@ -382,8 +388,27 @@ public class Property
        b.append(getID());
        b.append(", type: ");
        b.append(getType());
        final Object value = getValue();
        b.append(", value: ");
-        b.append(getValue());
+        b.append(value.toString());
        if (value instanceof String)
        {
            final String s = (String) value;
            final int l = s.length();
            final byte[] bytes = new byte[l * 2];
            for (int i = 0; i < l; i++)
            {
                final char c = s.charAt(i);
                final byte high = (byte) ((c & 0x00ff00) >> 8);
                final byte low  = (byte) ((c & 0x0000ff) >> 0);
                bytes[i * 2]     = high;
                bytes[i * 2 + 1] = low;
            }
            final String hex = HexDump.dump(bytes, 0L, 0);
            b.append(" [");
            b.append(hex);
            b.append("]");
        }
        b.append(']');
        return b.toString();
    }
--- a/src/java/org/apache/poi/hpsf/PropertySet.java
+++ b/src/java/org/apache/poi/hpsf/PropertySet.java
@ -56,6 +56,7 @@ package org.apache.poi.hpsf;
 import java.io.IOException;
 import java.io.InputStream;
 import java.io.UnsupportedEncodingException;
 import java.util.ArrayList;
 import java.util.List;
@ -300,9 +301,11 @@ public class PropertySet
     * @param length The length of the stream data.
     * @throws NoPropertySetStreamException if the byte array is not a
     * property set stream.
     * 
     * @exception UnsupportedEncodingException if the codepage is not supported
     */
    public PropertySet(final byte[] stream, final int offset, final int length)
-        throws NoPropertySetStreamException
+        throws NoPropertySetStreamException, UnsupportedEncodingException
    {
        if (isPropertySetStream(stream, offset, length))
            init(stream, offset, length);
@ -321,8 +324,11 @@ public class PropertySet
     * complete byte array contents is the stream data.
     * @throws NoPropertySetStreamException if the byte array is not a
     * property set stream.
     * 
     * @exception UnsupportedEncodingException if the codepage is not supported
     */
-    public PropertySet(final byte[] stream) throws NoPropertySetStreamException
+    public PropertySet(final byte[] stream)
    throws NoPropertySetStreamException, UnsupportedEncodingException
    {
        this(stream, 0, stream.length);
    }
@ -435,6 +441,7 @@ public class PropertySet
     * @param length Length of the property set stream.
     */
    private void init(final byte[] src, final int offset, final int length)
    throws UnsupportedEncodingException
    {
        /* FIXME (3): Ensure that at most "length" bytes are read. */
@ -651,7 +658,7 @@ public class PropertySet
        final PropertySet ps = (PropertySet) o;
        int byteOrder1 = ps.getByteOrder();
        int byteOrder2 = getByteOrder();
-        ClassID classId1 = ps.getClassID();
+        ClassID classID1 = ps.getClassID();
        ClassID classID2 = getClassID();
        int format1 = ps.getFormat();
        int format2 = getFormat();
@ -660,7 +667,7 @@ public class PropertySet
        int sectionCount1 = ps.getSectionCount();
        int sectionCount2 = getSectionCount();
        if (byteOrder1 != byteOrder2      ||
-            !classId1.equals(classID2)    ||
+            !classID1.equals(classID2)    ||
            format1 != format2            ||
            osVersion1 != osVersion2      ||
            sectionCount1 != sectionCount2)
--- a/src/java/org/apache/poi/hpsf/Section.java
+++ b/src/java/org/apache/poi/hpsf/Section.java
@ -54,6 +54,7 @@
 */
 package org.apache.poi.hpsf;
 import java.io.UnsupportedEncodingException;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Iterator;
@ -193,8 +194,12 @@ public class Section
     * @param src Contains the complete property set stream.
     * @param offset The position in the stream that points to the
     * section's format ID.
     * 
     * @exception UnsupportedEncodingException if the section's codepage is not
     * supported.
     */
    public Section(final byte[] src, final int offset)
    throws UnsupportedEncodingException
    {
        int o1 = offset;
@ -638,4 +643,18 @@ public class Section
        return dictionary;
    }
    /**
     * <p>Gets the section's codepage, if any.</p>
     *
     * @return The section's codepage if one is defined, else -1.
     */
    public int getCodepage()
    {
        final Integer codepage =
            (Integer) getProperty(PropertyIDMap.PID_CODEPAGE);
        return codepage != null ? codepage.intValue() : -1;
    }
 }
--- a/src/java/org/apache/poi/hpsf/TypeWriter.java
+++ b/src/java/org/apache/poi/hpsf/TypeWriter.java
@ -185,7 +185,8 @@ public class TypeWriter
     * @exception IOException if an I/O error occurs
     */
    public static void writeToStream(final OutputStream out,
-                                     final Property[] properties)
+                                     final Property[] properties,
                                     final int codepage)
        throws IOException, UnsupportedVariantTypeException
    {
        /* If there are no properties don't write anything. */
@ -207,7 +208,7 @@ public class TypeWriter
            final Property p = (Property) properties[i];
            long type = p.getType();
            writeUIntToStream(out, type);
-            VariantSupport.write(out, (int) type, p.getValue());
+            VariantSupport.write(out, (int) type, p.getValue(), codepage);
        }
    }
--- a/src/java/org/apache/poi/hpsf/VariantSupport.java
+++ b/src/java/org/apache/poi/hpsf/VariantSupport.java
@ -64,6 +64,7 @@ package org.apache.poi.hpsf;
 import java.io.IOException;
 import java.io.OutputStream;
 import java.io.UnsupportedEncodingException;
 import java.util.Date;
 import java.util.LinkedList;
 import java.util.List;
@ -163,17 +164,21 @@ public class VariantSupport extends Variant
     * @param length The length of the variant including the variant
     * type field
     * @param type The variant type to read
     * @param codepage The codepage to use to write non-wide strings
     * @return A Java object that corresponds best to the variant
     * field. For example, a VT_I4 is returned as a {@link Long}, a
     * VT_LPSTR as a {@link String}.
     * @exception ReadingNotSupportedException if a property is to be written
     * who's variant type HPSF does not yet support
     * @exception UnsupportedEncodingException if the specified codepage is not
     * supported
     *
     * @see Variant
     */
    public static Object read(final byte[] src, final int offset,
-                              final int length, final long type)
+                              final int length, final long type,
-        throws ReadingNotSupportedException
+                              final int codepage)
        throws ReadingNotSupportedException, UnsupportedEncodingException
    {
        Object value;
        int o1 = offset;
@ -221,18 +226,18 @@ public class VariantSupport extends Variant
                 * Read a byte string. In Java it is represented as a
                 * String object. The 0x00 bytes at the end must be
                 * stripped.
                 *
                 * FIXME (2): Reading an 8-bit string should pay attention
                 * to the codepage. Currently the byte making out the
                 * property's value are interpreted according to the
                 * platform's default character set.
                 */
                final int first = o1 + LittleEndian.INT_SIZE;
                long last = first + LittleEndian.getUInt(src, o1) - 1;
                o1 += LittleEndian.INT_SIZE;
                final int rawLength = (int) (last - first + 1);
                while (src[(int) last] == 0 && first <= last)
                    last--;
-                value = new String(src, (int) first, (int) (last - first + 1));
+                final int l = (int) (last - first + 1);
                value = codepage != -1 ?
                    new String(src, (int) first, l,
                               codepageToEncoding(codepage)) :
                    new String(src, (int) first, l);
                break;
            }
            case Variant.VT_LPWSTR:
@ -298,6 +303,38 @@ public class VariantSupport extends Variant
    /**
     * <p>Turns a codepage number into the equivalent character encoding's 
     * name.</p>
     *
     * @param codepage The codepage number
     * 
     * @return The character encoding's name. If the codepage number is 65001, 
     * the encoding name is "UTF-8". All other positive numbers are mapped to
     * "cp" followed by the number, e.g. if the codepage number is 1252 the 
     * returned character encoding name will be "cp1252".
     * 
     * @exception UnsupportedEncodingException if the specified codepage is
     * less than zero.
     */
    public static String codepageToEncoding(final int codepage)
    throws UnsupportedEncodingException
    {
        if (codepage <= 0)
            throw new UnsupportedEncodingException
                ("Codepage number may not be " + codepage);
        switch (codepage)
        {
            case 1200:
                return "UTF-16";
            case 65001:
                return "UTF-8";
            default:
                return "cp" + codepage;
        }
    }
    /**
     * <p>Writes a variant value to an output stream. This method ensures that
     * always a multiple of 4 bytes is written.</p>
@ -305,6 +342,7 @@ public class VariantSupport extends Variant
     * @param out The stream to write the value to.
     * @param type The variant's type.
     * @param value The variant's value.
     * @param codepage The codepage to use to write non-wide strings
     * @return The number of entities that have been written. In many cases an
     * "entity" is a byte but this is not always the case.
     * @exception IOException if an I/O exceptions occurs
@ -312,7 +350,7 @@ public class VariantSupport extends Variant
     * who's variant type HPSF does not yet support
     */
    public static int write(final OutputStream out, final long type,
-                            final Object value)
+                            final Object value, final int codepage)
        throws IOException, WritingNotSupportedException
    {
        int length = 0;
@ -330,16 +368,13 @@ public class VariantSupport extends Variant
            }
            case Variant.VT_LPSTR:
            {
-                length = TypeWriter.writeUIntToStream
+                final byte[] bytes =
-                    (out, ((String) value).length() + 1);
+                    (codepage == -1 ?
-                char[] s = Util.pad4((String) value);
+                    ((String) value).getBytes() :
-                /* FIXME (2): The following line forces characters to bytes.
+                    ((String) value).getBytes(codepageToEncoding(codepage)));
-                 * This is generally wrong and should only be done according to
+                length = TypeWriter.writeUIntToStream(out, bytes.length + 1);
-                 * a codepage. Alternatively Unicode could be written (see 
+                final byte[] b = new byte[bytes.length + 1];
-                 * Variant.VT_LPWSTR). */
+                System.arraycopy(bytes, 0, b, 0, bytes.length);
                byte[] b = new byte[s.length + 1];
                for (int i = 0; i < s.length; i++)
                    b[i] = (byte) s[i];
                b[b.length - 1] = 0x00;
                out.write(b);
                length += b.length;
@ -419,12 +454,13 @@ public class VariantSupport extends Variant
            }
        }
-        /* Add 0x00 character to write a multiple of four bytes: */
+        /* Add 0x00 characters to write a multiple of four bytes: */
-        while (length % 4 != 0)
+        // FIXME (1) Try this!
-        {
+//        while (length % 4 != 0)
-            out.write(0);
+//        {
-            length++;
+//            out.write(0);
-        }
+//            length++;
 //        }
        return length;
    }
--- a/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java
+++ b/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java
@ -357,7 +357,10 @@ public class TestWrite extends TestCase
                    catch (Exception ex)
                    {
                        ex.printStackTrace();
-                        throw new RuntimeException(ex);
+                        throw new RuntimeException(ex.toString());
                        /* FIXME (2): Replace the previous line by the following
                         * one once we no longer need JDK 1.3 compatibility. */
                        // throw new RuntimeException(ex);
                    }
                }
            },
@ -398,37 +401,40 @@ public class TestWrite extends TestCase
    public void testVariantTypes()
    {
        Throwable t = null;
        final int codepage = -1;
        /* FIXME (2): Add tests for various codepages! */
        try
        {
-            check(Variant.VT_EMPTY, null);
+            check(Variant.VT_EMPTY, null, codepage);
-            check(Variant.VT_BOOL, new Boolean(true));
+            check(Variant.VT_BOOL, new Boolean(true), codepage);
-            check(Variant.VT_BOOL, new Boolean(false));
+            check(Variant.VT_BOOL, new Boolean(false), codepage);
-            check(Variant.VT_CF, new byte[]{0});
+            check(Variant.VT_CF, new byte[]{0}, codepage);
-            check(Variant.VT_CF, new byte[]{0, 1});
+            check(Variant.VT_CF, new byte[]{0, 1}, codepage);
-            check(Variant.VT_CF, new byte[]{0, 1, 2});
+            check(Variant.VT_CF, new byte[]{0, 1, 2}, codepage);
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3});
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3}, codepage);
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4});
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4}, codepage);
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5});
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5}, codepage);
-            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10});
+            check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, 
-            check(Variant.VT_I2, new Integer(27));
+                  codepage);
-            check(Variant.VT_I4, new Long(28));
+            check(Variant.VT_I2, new Integer(27), codepage);
-            check(Variant.VT_FILETIME, new Date());
+            check(Variant.VT_I4, new Long(28), codepage);
-            check(Variant.VT_LPSTR, "");
+            check(Variant.VT_FILETIME, new Date(), codepage);
-            check(Variant.VT_LPSTR, "ä");
+            check(Variant.VT_LPSTR, "", codepage);
-            check(Variant.VT_LPSTR, "äö");
+            check(Variant.VT_LPSTR, "ä", codepage);
-            check(Variant.VT_LPSTR, "äöü");
+            check(Variant.VT_LPSTR, "äö", codepage);
-            check(Variant.VT_LPSTR, "äöüÄ");
+            check(Variant.VT_LPSTR, "äöü", codepage);
-            check(Variant.VT_LPSTR, "äöüÄÖ");
+            check(Variant.VT_LPSTR, "äöüÄ", codepage);
-            check(Variant.VT_LPSTR, "äöüÄÖÜ");
+            check(Variant.VT_LPSTR, "äöüÄÖ", codepage);
-            check(Variant.VT_LPSTR, "äöüÄÖÜß");
+            check(Variant.VT_LPSTR, "äöüÄÖÜ", codepage);
-            check(Variant.VT_LPWSTR, "");
+            check(Variant.VT_LPSTR, "äöüÄÖÜß", codepage);
-            check(Variant.VT_LPWSTR, "ä");
+            check(Variant.VT_LPWSTR, "", codepage);
-            check(Variant.VT_LPWSTR, "äö");
+            check(Variant.VT_LPWSTR, "ä", codepage);
-            check(Variant.VT_LPWSTR, "äöü");
+            check(Variant.VT_LPWSTR, "äö", codepage);
-            check(Variant.VT_LPWSTR, "äöüÄ");
+            check(Variant.VT_LPWSTR, "äöü", codepage);
-            check(Variant.VT_LPWSTR, "äöüÄÖ");
+            check(Variant.VT_LPWSTR, "äöüÄ", codepage);
-            check(Variant.VT_LPWSTR, "äöüÄÖÜ");
+            check(Variant.VT_LPWSTR, "äöüÄÖ", codepage);
-            check(Variant.VT_LPWSTR, "äöüÄÖÜß");
+            check(Variant.VT_LPWSTR, "äöüÄÖÜ", codepage);
            check(Variant.VT_LPWSTR, "äöüÄÖÜß", codepage);
        }
        catch (Exception ex)
        {
@ -466,20 +472,22 @@ public class TestWrite extends TestCase
     * @throws UnsupportedVariantTypeException if the variant is not supported.
     * @throws IOException if an I/O exception occurs.
     */
-    private void check(final long variantType, final Object value)
+    private void check(final long variantType, final Object value, 
                       final int codepage)
        throws UnsupportedVariantTypeException, IOException
    {
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
-        VariantSupport.write(out, variantType, value);
+        VariantSupport.write(out, variantType, value, codepage);
        out.close();
        final byte[] b = out.toByteArray();
        final Object objRead =
            VariantSupport.read(b, 0, b.length + LittleEndian.INT_SIZE,
-                                variantType);
+                                variantType, -1);
        if (objRead instanceof byte[])
        {
-            final int diff = diff(org.apache.poi.hpsf.Util.pad4
+//            final int diff = diff(org.apache.poi.hpsf.Util.pad4
-                ((byte[]) value), (byte[]) objRead);
+//                ((byte[]) value), (byte[]) objRead);
            final int diff = diff((byte[]) value, (byte[]) objRead);
            if (diff >= 0)
                fail("Byte arrays are different. First different byte is at " +
                     "index " + diff + ".");
--- a/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc
+++ b/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc