From 131bb9d0bd199ac8c739d047516082d81c75f8f0 Mon Sep 17 00:00:00 2001 From: Rainer Klute Date: Tue, 2 Dec 2003 17:46:01 +0000 Subject: [PATCH] HPSF: codepage support added git-svn-id: https://svn.apache.org/repos/asf/jakarta/poi/trunk@353460 13f79535-47bb-0310-9956-ffa450edef68 --- src/documentation/content/xdocs/changes.xml | 4 + .../content/xdocs/hpsf/how-to.xml | 41 ++++++--- .../content/xdocs/hpsf/internals.xml | 54 +++++++++++ src/documentation/content/xdocs/hpsf/todo.xml | 23 ++--- .../apache/poi/hpsf/examples/CopyCompare.java | 5 +- .../hpsf/examples/WriteAuthorAndTitle.java | 5 +- .../org/apache/poi/hpsf/MutableProperty.java | 5 +- .../org/apache/poi/hpsf/MutableSection.java | 10 +- src/java/org/apache/poi/hpsf/Property.java | 29 +++++- src/java/org/apache/poi/hpsf/PropertySet.java | 15 ++- src/java/org/apache/poi/hpsf/Section.java | 19 ++++ src/java/org/apache/poi/hpsf/TypeWriter.java | 5 +- .../org/apache/poi/hpsf/VariantSupport.java | 86 +++++++++++++----- .../org/apache/poi/hpsf/basic/TestWrite.java | 78 +++++++++------- .../poi/hpsf/data/TestChineseProperties.doc | Bin 0 -> 31744 bytes 15 files changed, 276 insertions(+), 103 deletions(-) create mode 100644 src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc diff --git a/src/documentation/content/xdocs/changes.xml b/src/documentation/content/xdocs/changes.xml index 77cb10f35..25dc4b6df 100644 --- a/src/documentation/content/xdocs/changes.xml +++ b/src/documentation/content/xdocs/changes.xml @@ -12,7 +12,11 @@ + + + HPSF: Much better codepage support + Patch applied for deep cloning of worksheets was provided Patch applied to allow sheet reordering diff --git a/src/documentation/content/xdocs/hpsf/how-to.xml b/src/documentation/content/xdocs/hpsf/how-to.xml index d632a4c97..d12c12a7b 100644 --- a/src/documentation/content/xdocs/hpsf/how-to.xml +++ b/src/documentation/content/xdocs/hpsf/how-to.xml @@ -708,8 +708,9 @@ No property set stream: "/1Table" The property's value is the number of a codepage, i.e. a mapping from character codes to characters. All strings in the section containing this property must be interpreted using this - codepage. Typical property values are 1252 (8-bit "western" characters) - or 1200 (16-bit Unicode characters). + codepage. Typical property values are 1252 (8-bit "western" characters, + ISO-8859-1), 1200 (16-bit Unicode characters, UFT-16), or 65001 (8-bit + Unicode characters, UFT-8). @@ -833,18 +834,34 @@ No property set stream: "/1Table"
Codepage support - Improve codepage support!

The property with ID 1 holds the number of the codepage which was used - to encode the strings in this section. The present HPSF codepage support - is still very limited: When reading property value strings, HPSF - distinguishes between 16-bit characters and 8-bit characters. 16-bit - characters should be Unicode characters and thus be okay. 8-bit - characters are interpreted according to the platform's default character - set. This is fine as long as the document being read has been written on - a platform with the same default character set. However, if you receive a - document from another region of the world and want to process it with - HPSF you are in trouble - unless the creator used Unicode, of course.

+ to encode the strings in this section. If this property is not available + in a section, the platform's default character encoding will be + used. This works fine as long as the document being read has been written + on a platform with the same default character encoding. However, if you + receive a document from another region of the world and the codepage is + undefined, you are in trouble.

+ +

HPSF's codepage support is as good as the character encoding support of + the Java Virtual Machine (JVM) the application runs on. If HPSF + encounters a codepage number it assumes that the JVM has a character + encoding with a corresponding name. For example, if the codepage is 1252, + HPSF uses the character encoding "cp1252" to read or write strings. If + the JVM does not have that character encoding installed or if the + codepage number is illegal, an UnsupportedEncodingException will be + thrown.

+ +

There are two exceptions to the rule that a character encoding's name + is derived from the codepage number by prepending the string "cp" to + it:

+ +
+
Codepage 1200
+
is mapped to the character encoding "UTF-16".
+
Codepage 65001
+
is mapped to the character encoding "UTF-8".
+
diff --git a/src/documentation/content/xdocs/hpsf/internals.xml b/src/documentation/content/xdocs/hpsf/internals.xml index 67fb61cea..88f4133f4 100644 --- a/src/documentation/content/xdocs/hpsf/internals.xml +++ b/src/documentation/content/xdocs/hpsf/internals.xml @@ -944,6 +944,60 @@ +
+ The Dictionary + +

What a dictionary is good for is explained in the HPSF HOW-TO. This chapter explains how it is + organized internally.

+ +

The dictionary has a simple header consisting of a single UInt value. It + tells how many entries the dictionary comprises:

+ + + + + + + + + + + + +
NameData typeDescription
nrEntriesUIntNumber of dictionary entries
+ +

The dictionary entries follow the header. Each one looks like this:

+ + + + + + + + + + + + + + + + + + + + + + +
NameData typeDescription
keyUIntThe unique number of this property, i.e. the PID
lengthUIntThe length of the property name associated with the key
valueStringThe property's name, terminated with a 0x00 character
+ +

The entries are not aligned, i.e. each one follows its predecessor + without any gap or fill characters.

+
+ + +
References

In order to assemble the HPSF description I used information publically diff --git a/src/documentation/content/xdocs/hpsf/todo.xml b/src/documentation/content/xdocs/hpsf/todo.xml index d99e7ff5d..3f50f8f4d 100644 --- a/src/documentation/content/xdocs/hpsf/todo.xml +++ b/src/documentation/content/xdocs/hpsf/todo.xml @@ -21,25 +21,20 @@ information streams.

  • - Add codepage support: Presently the bytes making out the string in a - property's value are interpreted using the platform's default character - set. -
  • -
  • - Add resource bundles to - org.apache.poi.hpsf.wellknown to ease - localizations. This would be useful for mapping standard property IDs to - localized strings. Example: The property ID 4 could be mapped to "Author" - in English or "Verfasser" in German. + Add resource bundles to + org.apache.poi.hpsf.wellknown to ease + localizations. This would be useful for mapping standard property IDs to + localized strings. Example: The property ID 4 could be mapped to "Author" + in English or "Verfasser" in German.
  • Implement reading functionality for those property types that are not - yet supported. HPSF should return proper Java types instead of just byte - arrays. + yet supported. HPSF should return proper Java types instead of just byte + arrays.
  • - Add WMF to java.awt.Image example code in Thumbnail HOW TO. + Add WMF to java.awt.Image example code in the Thumbnail HOW-TO.
  • diff --git a/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java b/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java index 94b3a426c..5b14e7ad4 100644 --- a/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java +++ b/src/examples/src/org/apache/poi/hpsf/examples/CopyCompare.java @@ -558,7 +558,10 @@ public class CopyCompare * exists. However, since we have full control about directory * creation we can ensure that this will never happen. */ ex.printStackTrace(System.err); - throw new RuntimeException(ex); + throw new RuntimeException(ex.toString()); + /* FIXME (2): Replace the previous line by the following once we + * no longer need JDK 1.3 compatibility. */ + // throw new RuntimeException(ex); } } } diff --git a/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java b/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java index dd550dae0..7b8d58f41 100644 --- a/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java +++ b/src/examples/src/org/apache/poi/hpsf/examples/WriteAuthorAndTitle.java @@ -444,7 +444,10 @@ public class WriteAuthorAndTitle * exists. However, since we have full control about directory * creation we can ensure that this will never happen. */ ex.printStackTrace(System.err); - throw new RuntimeException(ex); + throw new RuntimeException(ex.toString()); + /* FIXME (2): Replace the previous line by the following once we + * no longer need JDK 1.3 compatibility. */ + // throw new RuntimeException(ex); } } } diff --git a/src/java/org/apache/poi/hpsf/MutableProperty.java b/src/java/org/apache/poi/hpsf/MutableProperty.java index 3f6079876..da3fece52 100644 --- a/src/java/org/apache/poi/hpsf/MutableProperty.java +++ b/src/java/org/apache/poi/hpsf/MutableProperty.java @@ -80,19 +80,20 @@ public class MutableProperty extends Property *

    Writes the property to an output stream.

    * * @param out The output stream to write to. + * @param codepage The codepage to use for writing non-wide strings * @return the number of bytes written to the stream * * @exception IOException if an I/O error occurs * @exception WritingNotSupportedException if a variant type is to be * written that is not yet supported */ - public int write(final OutputStream out) + public int write(final OutputStream out, final int codepage) throws IOException, WritingNotSupportedException { int length = 0; long variantType = getType(); length += TypeWriter.writeUIntToStream(out, variantType); - length += VariantSupport.write(out, variantType, getValue()); + length += VariantSupport.write(out, variantType, getValue(), codepage); return length; } diff --git a/src/java/org/apache/poi/hpsf/MutableSection.java b/src/java/org/apache/poi/hpsf/MutableSection.java index 871c13360..c2fb33d6e 100644 --- a/src/java/org/apache/poi/hpsf/MutableSection.java +++ b/src/java/org/apache/poi/hpsf/MutableSection.java @@ -420,16 +420,16 @@ public class MutableSection extends Section /* If the property ID is not equal 0 we write the property and all * is fine. However, if it equals 0 we have to write the section's - * dictionary which does not have a type but just a value. */ + * dictionary which has an implicit type only and an explicit + * value. */ if (id != 0) /* Write the property and update the position to the next * property. */ - position += p.write(propertyStream); + position += p.write(propertyStream, getCodepage()); else { - final Integer codepage = - (Integer) getProperty(PropertyIDMap.PID_CODEPAGE); - if (codepage == null) + final int codepage = getCodepage(); + if (codepage == -1) throw new IllegalPropertySetDataException ("Codepage (property 1) is undefined."); position += writeDictionary(propertyStream, dictionary); diff --git a/src/java/org/apache/poi/hpsf/Property.java b/src/java/org/apache/poi/hpsf/Property.java index e4d70bd1c..2b04da852 100644 --- a/src/java/org/apache/poi/hpsf/Property.java +++ b/src/java/org/apache/poi/hpsf/Property.java @@ -62,9 +62,11 @@ */ package org.apache.poi.hpsf; +import java.io.UnsupportedEncodingException; import java.util.HashMap; import java.util.Map; +import org.apache.poi.util.HexDump; import org.apache.poi.util.LittleEndian; /** @@ -161,9 +163,13 @@ public class Property * @param length The property's type/value pair's length in bytes. * @param codepage The section's and thus the property's * codepage. It is needed only when reading string values. + * + * @exception UnsupportedEncodingException if the specified codepage is not + * supported */ public Property(final long id, final byte[] src, final long offset, final int length, final int codepage) + throws UnsupportedEncodingException { this.id = id; @@ -183,7 +189,7 @@ public class Property try { - value = VariantSupport.read(src, o, length, (int) type); + value = VariantSupport.read(src, o, length, (int) type, codepage); } catch (UnsupportedVariantTypeException ex) { @@ -382,8 +388,27 @@ public class Property b.append(getID()); b.append(", type: "); b.append(getType()); + final Object value = getValue(); b.append(", value: "); - b.append(getValue()); + b.append(value.toString()); + if (value instanceof String) + { + final String s = (String) value; + final int l = s.length(); + final byte[] bytes = new byte[l * 2]; + for (int i = 0; i < l; i++) + { + final char c = s.charAt(i); + final byte high = (byte) ((c & 0x00ff00) >> 8); + final byte low = (byte) ((c & 0x0000ff) >> 0); + bytes[i * 2] = high; + bytes[i * 2 + 1] = low; + } + final String hex = HexDump.dump(bytes, 0L, 0); + b.append(" ["); + b.append(hex); + b.append("]"); + } b.append(']'); return b.toString(); } diff --git a/src/java/org/apache/poi/hpsf/PropertySet.java b/src/java/org/apache/poi/hpsf/PropertySet.java index dae88e73c..a92c3b9fd 100644 --- a/src/java/org/apache/poi/hpsf/PropertySet.java +++ b/src/java/org/apache/poi/hpsf/PropertySet.java @@ -56,6 +56,7 @@ package org.apache.poi.hpsf; import java.io.IOException; import java.io.InputStream; +import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.List; @@ -300,9 +301,11 @@ public class PropertySet * @param length The length of the stream data. * @throws NoPropertySetStreamException if the byte array is not a * property set stream. + * + * @exception UnsupportedEncodingException if the codepage is not supported */ public PropertySet(final byte[] stream, final int offset, final int length) - throws NoPropertySetStreamException + throws NoPropertySetStreamException, UnsupportedEncodingException { if (isPropertySetStream(stream, offset, length)) init(stream, offset, length); @@ -321,8 +324,11 @@ public class PropertySet * complete byte array contents is the stream data. * @throws NoPropertySetStreamException if the byte array is not a * property set stream. + * + * @exception UnsupportedEncodingException if the codepage is not supported */ - public PropertySet(final byte[] stream) throws NoPropertySetStreamException + public PropertySet(final byte[] stream) + throws NoPropertySetStreamException, UnsupportedEncodingException { this(stream, 0, stream.length); } @@ -435,6 +441,7 @@ public class PropertySet * @param length Length of the property set stream. */ private void init(final byte[] src, final int offset, final int length) + throws UnsupportedEncodingException { /* FIXME (3): Ensure that at most "length" bytes are read. */ @@ -651,7 +658,7 @@ public class PropertySet final PropertySet ps = (PropertySet) o; int byteOrder1 = ps.getByteOrder(); int byteOrder2 = getByteOrder(); - ClassID classId1 = ps.getClassID(); + ClassID classID1 = ps.getClassID(); ClassID classID2 = getClassID(); int format1 = ps.getFormat(); int format2 = getFormat(); @@ -660,7 +667,7 @@ public class PropertySet int sectionCount1 = ps.getSectionCount(); int sectionCount2 = getSectionCount(); if (byteOrder1 != byteOrder2 || - !classId1.equals(classID2) || + !classID1.equals(classID2) || format1 != format2 || osVersion1 != osVersion2 || sectionCount1 != sectionCount2) diff --git a/src/java/org/apache/poi/hpsf/Section.java b/src/java/org/apache/poi/hpsf/Section.java index 2cc4cc148..97b30ea26 100644 --- a/src/java/org/apache/poi/hpsf/Section.java +++ b/src/java/org/apache/poi/hpsf/Section.java @@ -54,6 +54,7 @@ */ package org.apache.poi.hpsf; +import java.io.UnsupportedEncodingException; import java.util.ArrayList; import java.util.Collections; import java.util.Iterator; @@ -193,8 +194,12 @@ public class Section * @param src Contains the complete property set stream. * @param offset The position in the stream that points to the * section's format ID. + * + * @exception UnsupportedEncodingException if the section's codepage is not + * supported. */ public Section(final byte[] src, final int offset) + throws UnsupportedEncodingException { int o1 = offset; @@ -638,4 +643,18 @@ public class Section return dictionary; } + + + /** + *

    Gets the section's codepage, if any.

    + * + * @return The section's codepage if one is defined, else -1. + */ + public int getCodepage() + { + final Integer codepage = + (Integer) getProperty(PropertyIDMap.PID_CODEPAGE); + return codepage != null ? codepage.intValue() : -1; + } + } diff --git a/src/java/org/apache/poi/hpsf/TypeWriter.java b/src/java/org/apache/poi/hpsf/TypeWriter.java index 2433353f4..aed32cefa 100644 --- a/src/java/org/apache/poi/hpsf/TypeWriter.java +++ b/src/java/org/apache/poi/hpsf/TypeWriter.java @@ -185,7 +185,8 @@ public class TypeWriter * @exception IOException if an I/O error occurs */ public static void writeToStream(final OutputStream out, - final Property[] properties) + final Property[] properties, + final int codepage) throws IOException, UnsupportedVariantTypeException { /* If there are no properties don't write anything. */ @@ -207,7 +208,7 @@ public class TypeWriter final Property p = (Property) properties[i]; long type = p.getType(); writeUIntToStream(out, type); - VariantSupport.write(out, (int) type, p.getValue()); + VariantSupport.write(out, (int) type, p.getValue(), codepage); } } diff --git a/src/java/org/apache/poi/hpsf/VariantSupport.java b/src/java/org/apache/poi/hpsf/VariantSupport.java index 17892abd2..29360420d 100644 --- a/src/java/org/apache/poi/hpsf/VariantSupport.java +++ b/src/java/org/apache/poi/hpsf/VariantSupport.java @@ -64,6 +64,7 @@ package org.apache.poi.hpsf; import java.io.IOException; import java.io.OutputStream; +import java.io.UnsupportedEncodingException; import java.util.Date; import java.util.LinkedList; import java.util.List; @@ -163,17 +164,21 @@ public class VariantSupport extends Variant * @param length The length of the variant including the variant * type field * @param type The variant type to read + * @param codepage The codepage to use to write non-wide strings * @return A Java object that corresponds best to the variant * field. For example, a VT_I4 is returned as a {@link Long}, a * VT_LPSTR as a {@link String}. * @exception ReadingNotSupportedException if a property is to be written * who's variant type HPSF does not yet support + * @exception UnsupportedEncodingException if the specified codepage is not + * supported * * @see Variant */ public static Object read(final byte[] src, final int offset, - final int length, final long type) - throws ReadingNotSupportedException + final int length, final long type, + final int codepage) + throws ReadingNotSupportedException, UnsupportedEncodingException { Object value; int o1 = offset; @@ -221,18 +226,18 @@ public class VariantSupport extends Variant * Read a byte string. In Java it is represented as a * String object. The 0x00 bytes at the end must be * stripped. - * - * FIXME (2): Reading an 8-bit string should pay attention - * to the codepage. Currently the byte making out the - * property's value are interpreted according to the - * platform's default character set. */ final int first = o1 + LittleEndian.INT_SIZE; long last = first + LittleEndian.getUInt(src, o1) - 1; o1 += LittleEndian.INT_SIZE; + final int rawLength = (int) (last - first + 1); while (src[(int) last] == 0 && first <= last) last--; - value = new String(src, (int) first, (int) (last - first + 1)); + final int l = (int) (last - first + 1); + value = codepage != -1 ? + new String(src, (int) first, l, + codepageToEncoding(codepage)) : + new String(src, (int) first, l); break; } case Variant.VT_LPWSTR: @@ -298,6 +303,38 @@ public class VariantSupport extends Variant + /** + *

    Turns a codepage number into the equivalent character encoding's + * name.

    + * + * @param codepage The codepage number + * + * @return The character encoding's name. If the codepage number is 65001, + * the encoding name is "UTF-8". All other positive numbers are mapped to + * "cp" followed by the number, e.g. if the codepage number is 1252 the + * returned character encoding name will be "cp1252". + * + * @exception UnsupportedEncodingException if the specified codepage is + * less than zero. + */ + public static String codepageToEncoding(final int codepage) + throws UnsupportedEncodingException + { + if (codepage <= 0) + throw new UnsupportedEncodingException + ("Codepage number may not be " + codepage); + switch (codepage) + { + case 1200: + return "UTF-16"; + case 65001: + return "UTF-8"; + default: + return "cp" + codepage; + } + } + + /** *

    Writes a variant value to an output stream. This method ensures that * always a multiple of 4 bytes is written.

    @@ -305,6 +342,7 @@ public class VariantSupport extends Variant * @param out The stream to write the value to. * @param type The variant's type. * @param value The variant's value. + * @param codepage The codepage to use to write non-wide strings * @return The number of entities that have been written. In many cases an * "entity" is a byte but this is not always the case. * @exception IOException if an I/O exceptions occurs @@ -312,7 +350,7 @@ public class VariantSupport extends Variant * who's variant type HPSF does not yet support */ public static int write(final OutputStream out, final long type, - final Object value) + final Object value, final int codepage) throws IOException, WritingNotSupportedException { int length = 0; @@ -330,16 +368,13 @@ public class VariantSupport extends Variant } case Variant.VT_LPSTR: { - length = TypeWriter.writeUIntToStream - (out, ((String) value).length() + 1); - char[] s = Util.pad4((String) value); - /* FIXME (2): The following line forces characters to bytes. - * This is generally wrong and should only be done according to - * a codepage. Alternatively Unicode could be written (see - * Variant.VT_LPWSTR). */ - byte[] b = new byte[s.length + 1]; - for (int i = 0; i < s.length; i++) - b[i] = (byte) s[i]; + final byte[] bytes = + (codepage == -1 ? + ((String) value).getBytes() : + ((String) value).getBytes(codepageToEncoding(codepage))); + length = TypeWriter.writeUIntToStream(out, bytes.length + 1); + final byte[] b = new byte[bytes.length + 1]; + System.arraycopy(bytes, 0, b, 0, bytes.length); b[b.length - 1] = 0x00; out.write(b); length += b.length; @@ -419,12 +454,13 @@ public class VariantSupport extends Variant } } - /* Add 0x00 character to write a multiple of four bytes: */ - while (length % 4 != 0) - { - out.write(0); - length++; - } + /* Add 0x00 characters to write a multiple of four bytes: */ + // FIXME (1) Try this! +// while (length % 4 != 0) +// { +// out.write(0); +// length++; +// } return length; } diff --git a/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java b/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java index d62378dff..95b78ea66 100644 --- a/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java +++ b/src/testcases/org/apache/poi/hpsf/basic/TestWrite.java @@ -357,7 +357,10 @@ public class TestWrite extends TestCase catch (Exception ex) { ex.printStackTrace(); - throw new RuntimeException(ex); + throw new RuntimeException(ex.toString()); + /* FIXME (2): Replace the previous line by the following + * one once we no longer need JDK 1.3 compatibility. */ + // throw new RuntimeException(ex); } } }, @@ -398,37 +401,40 @@ public class TestWrite extends TestCase public void testVariantTypes() { Throwable t = null; + final int codepage = -1; + /* FIXME (2): Add tests for various codepages! */ try { - check(Variant.VT_EMPTY, null); - check(Variant.VT_BOOL, new Boolean(true)); - check(Variant.VT_BOOL, new Boolean(false)); - check(Variant.VT_CF, new byte[]{0}); - check(Variant.VT_CF, new byte[]{0, 1}); - check(Variant.VT_CF, new byte[]{0, 1, 2}); - check(Variant.VT_CF, new byte[]{0, 1, 2, 3}); - check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4}); - check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5}); - check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}); - check(Variant.VT_I2, new Integer(27)); - check(Variant.VT_I4, new Long(28)); - check(Variant.VT_FILETIME, new Date()); - check(Variant.VT_LPSTR, ""); - check(Variant.VT_LPSTR, "ä"); - check(Variant.VT_LPSTR, "äö"); - check(Variant.VT_LPSTR, "äöü"); - check(Variant.VT_LPSTR, "äöüÄ"); - check(Variant.VT_LPSTR, "äöüÄÖ"); - check(Variant.VT_LPSTR, "äöüÄÖÜ"); - check(Variant.VT_LPSTR, "äöüÄÖÜß"); - check(Variant.VT_LPWSTR, ""); - check(Variant.VT_LPWSTR, "ä"); - check(Variant.VT_LPWSTR, "äö"); - check(Variant.VT_LPWSTR, "äöü"); - check(Variant.VT_LPWSTR, "äöüÄ"); - check(Variant.VT_LPWSTR, "äöüÄÖ"); - check(Variant.VT_LPWSTR, "äöüÄÖÜ"); - check(Variant.VT_LPWSTR, "äöüÄÖÜß"); + check(Variant.VT_EMPTY, null, codepage); + check(Variant.VT_BOOL, new Boolean(true), codepage); + check(Variant.VT_BOOL, new Boolean(false), codepage); + check(Variant.VT_CF, new byte[]{0}, codepage); + check(Variant.VT_CF, new byte[]{0, 1}, codepage); + check(Variant.VT_CF, new byte[]{0, 1, 2}, codepage); + check(Variant.VT_CF, new byte[]{0, 1, 2, 3}, codepage); + check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4}, codepage); + check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5}, codepage); + check(Variant.VT_CF, new byte[]{0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10}, + codepage); + check(Variant.VT_I2, new Integer(27), codepage); + check(Variant.VT_I4, new Long(28), codepage); + check(Variant.VT_FILETIME, new Date(), codepage); + check(Variant.VT_LPSTR, "", codepage); + check(Variant.VT_LPSTR, "ä", codepage); + check(Variant.VT_LPSTR, "äö", codepage); + check(Variant.VT_LPSTR, "äöü", codepage); + check(Variant.VT_LPSTR, "äöüÄ", codepage); + check(Variant.VT_LPSTR, "äöüÄÖ", codepage); + check(Variant.VT_LPSTR, "äöüÄÖÜ", codepage); + check(Variant.VT_LPSTR, "äöüÄÖÜß", codepage); + check(Variant.VT_LPWSTR, "", codepage); + check(Variant.VT_LPWSTR, "ä", codepage); + check(Variant.VT_LPWSTR, "äö", codepage); + check(Variant.VT_LPWSTR, "äöü", codepage); + check(Variant.VT_LPWSTR, "äöüÄ", codepage); + check(Variant.VT_LPWSTR, "äöüÄÖ", codepage); + check(Variant.VT_LPWSTR, "äöüÄÖÜ", codepage); + check(Variant.VT_LPWSTR, "äöüÄÖÜß", codepage); } catch (Exception ex) { @@ -466,20 +472,22 @@ public class TestWrite extends TestCase * @throws UnsupportedVariantTypeException if the variant is not supported. * @throws IOException if an I/O exception occurs. */ - private void check(final long variantType, final Object value) + private void check(final long variantType, final Object value, + final int codepage) throws UnsupportedVariantTypeException, IOException { final ByteArrayOutputStream out = new ByteArrayOutputStream(); - VariantSupport.write(out, variantType, value); + VariantSupport.write(out, variantType, value, codepage); out.close(); final byte[] b = out.toByteArray(); final Object objRead = VariantSupport.read(b, 0, b.length + LittleEndian.INT_SIZE, - variantType); + variantType, -1); if (objRead instanceof byte[]) { - final int diff = diff(org.apache.poi.hpsf.Util.pad4 - ((byte[]) value), (byte[]) objRead); +// final int diff = diff(org.apache.poi.hpsf.Util.pad4 +// ((byte[]) value), (byte[]) objRead); + final int diff = diff((byte[]) value, (byte[]) objRead); if (diff >= 0) fail("Byte arrays are different. First different byte is at " + "index " + diff + "."); diff --git a/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc b/src/testcases/org/apache/poi/hpsf/data/TestChineseProperties.doc new file mode 100644 index 0000000000000000000000000000000000000000..fd2b478e4c55c4bee567f2f1b34aedf816632482 GIT binary patch literal 31744 zcmeHQ30#!b-aa$K0E!DK;;ti_A}X7tXf7b2;DX4esL8%52m-EYpyryXsiCM9m0PJv zZr3uk#mvg>c8yzJyIHx=&6J8yz221XdER$k9bkr#?~|$9ocTTP+0Xf(|JncVdC#0V z{m6xv-)!}bsH6r6jksH0Pgonct6=V<*zJXQ9A>)REiW&pvo`>vMg4&|aCP&0Vw_Js zDa!l#!Dl)R(RC2auiN3LPVyjB-E={!Mlp3yu9%} zH&bRMz+%ae!sZ1<-O#MKRn{HopjWf?ajmqngE74gymauVu7tmWQrIJIAl$!zuCtNj zKd2szbkv9%qUTCY^_%sD2t~x8hS2r#N692qcq;r*=&Ilei{frdSLjjVDd{P?5>N40 z>@TA{j&Pi1EBjQ|8LpQ>eJSxOr%V1ADkU95x0;xh^p*6OP76e=oZc!uy$T(QUXwhG zg;CN|^6!WC(f|QV$K}1VYwHL4)+dD0?jn&ts#YlRlr+!5JqMP*fFJJA+5V`Ma2wDL zsA3R~T~o0u{*osiO8ZrGrF^DzB_7jZzfsav+?9A+;O_-TrJRbc@K>=bx)R@%u7sV?LKc2QRb{F9y3rwJL9lc!44cR0LpV9S%x@#W1)zA#M(SnWC?7!;ciPv@Eyr)ZL`q zNWYtyj{LH~xdfO8R!S)rDWrlnV-z|eX9rN`ncfVh9zfmG7mre!rmaV?-DvaCmLeuq zLt83Y3mLLf>HxY+^p9-I3X_Vj?l8$je?$?(RyC?r~N;0Y1vixnWImdxm%p|5; z%$^MUIn9JkjmOR`)R(c;+{6gcUwC4r`+(+i?Tz~lWf!PbXr|5;*B}5E@k1u?fhbPkUK!{#&3x%7z z6e~y!k^J4oMEqjWrurZhXBw9Zvy0MNQ!sBSv!2}cIFE2{^TQk$0jdI!ZxC$Zu!X`_ zb;eW1+EnDm(LsS?fbvypzwb3N0|F_SCY zHY|z4C7J6{;EB8j=>TfYH=|d#+9(VykLwa;NFC)oGq98T0^m;cJa7@X3|s}S0Z(B^ zng=`$ECiMT%YjzdezgH~fD_;hxB&Zs1He1LVc;lm3<$LoA{-b4j0IwVIN%yk3fus0 z0e1mzrI!KAfmOg7U_G!Ah{TR~3@{#;03-lOfKwwOIswjr3*ZiT0*8Tjfuq1N;5hIJ z@EOEWzuqdlb@7W&5AHfxc<@1=yAJLWZ5>mJ=Aa`FZ7lqNkwY~i1c-$>!&&7h`&RZ5 z?KKm*2(RiJVXI=)mQ{T!r`X&eujWFumt#C;JJ)aMeXsGhb7NrzCDx89^S%B7v7Tc} zSo(TN|1vrY3s%&>QtzyD@@5Cb3~vBMz$}QyW&^hkHw_gmKNNjA&ELv*X}fkTaEY-#kC2rV z1$vcKj1f178=7M-&3dU?6{EQd3dbe1u4rvj(b~46wP&%lP&I}y!%H^w5V1F4%PelE zBqtG~Bz`ZOr6><76-AWGhmNZ0q`GH~?W7Wr_9iE=`h_Imr=0hWl$ho5=iz{r6$E4| zj)7-^V&E0@_ie!IfZ-UhANB)4;3Gl|1tNhlzyu%$xC&eY%77cdufSbEGyz%(u@qPa ztOQm8>wxt@=hhgj0aw5s=mmHKr+_oS7r;5-0`M&`8RJkUFb$Xia9nx{xPI=FbJq{= zfBl8)hp#_#ozK~P&eH3$?cVe7eho(1>c?#-c{II0s;i|@ebi93m;i7ruEiMLNEy`~ zkOD-mCT9mZ+#nZq<%uIRh27lQaQ3JaN58;l<*y>weL6tp`yTov9 zQn$3Fiy76>wrx!Al1t&{qmHsDuDT~<#hq3Sow-YX%#O-hlxRLmDY-LU#{={Gq};?F zUtZ1Ih>%IwbXyfQ? zns4%$Y(DDvUdA+ag`uTs3-gr`8QW-!`V(mLL>Ws>EKT#Ht39mGq;>TSaJWvx?cz^1{~UAUjjG*od6f0J8%Ga z8#oNS3mgNE1L2THL;_=h@jx7q0F(kh1Gj+NfM|~s8NhO2C9np_2Q~sP0xmk_1God8 zfH%+=I0hUCP5`HXvp_Kr*8%we6M-}!3&;T=SQcu)7N`%fJ=+76ag<&8`=@_@=kE{t z{B-Y5r2D|CL9-N!SoHF?%6q@+_n#(rhK4&o!_8rB<%WF?`BS0aFL>Yr|Is7dmx@XK zzv$1<|J6F-u`)%~zUYjbuINJl4n?*$PC&<%h1yQQqeXx8O102mJqIWa_s9D?S>XkM zfx2LS!M>9!*1*NMv9Yi!eF|Q?t?HLD1r>Ud!26zfVOKg9Mk`+xp4#`k@U&K`@hpdf zt(f}(%||r0FhBU4L_~$MFJ)#>d2zy~bSW&>)+6Q(T)V!0qpf%R21jaN*zQ&?lHg7n z{H34}O-^oCIstA~LM{mj9@4NrUu=nnL4p1}e;W?8vE`w{|;`xC3bzI@Hzf8ObH zm!Da=KV{a|727@xnYJl&#a`i-D!Psnu1UhtQ}ms^=kVD}m#;M%m@Axu8~e>qo4!S~ z=oJz-tGM`L$kbPK0a>DF)}$F*R^-3EVENwHcN|Zix#jC`FK>P8uSZUQy}aO^XICBA zz4w%8J>aFSyIq2pzVyobUw!j~Xzn#Kew}bl-oEo=$H7lG8~n7U@9Yt=>qLiPnYr8N zt=MNbAXl{Pv-6FQRsC~y1CwH&diA@DKQ3LdZ`0NfMYovKXV2Mtj9s?&?HO}-#AGcw z@z-zOKKNP7Aq$^hchG&*v*3xj)bvK(UP9igxM4&5#bRySKv){Yr-y_L^-Irm@9p6U zyCzU%zk}sKe2~9yc;K*o&)clt;BLQQ`p9~HJNP(1a(?s8e?EB{kGtg~KOQ&N<7hjZ ziFrm-~D)Pwg0c}=e^sn?9PpAOTIn#R)$c|`uWoQ>8Fv6j2MNA|gNF3R)N%=h~x9F3Y}9nv`b+NftMUE&%SuhngRvd@fPJ!jdO(6!G0QfvLTv&!#SyY30mO~29RZaOmb`_Pmg_Tq=gUAED7{|YZ0bmi8GgKl5lxR5`kTZgqt z?kQ)7ZvG;DYU;eK{=+cfb>-!yx6{kQ&~`8hux?$d8U>L!oTZy$R0)$5;r z`T3d4(UaEC{?DSl+a3CCPA~TiDZALTaP;-I?R+CVPM%u+tosa?wiCLHPe0_XDtRG# z$Pk}{j|OIhdIpg}C-0WFSmyi9hYprl7 zx%gA-?T>Z|dv5l(z1bgv0)qYD7!)(_wF>VURrx}6m?ktSs>h18LZvycg|1R-X0?T` z)@U+qI!+U+^V*KnRO(r6J5E!>rEjF2@@`VuDm1@=9J&>&g$56`#xN}nhPbM44Quqm zaO*=QRYg)&-HSDN<3xi041}^FSd@hW77kcAVBvs;0~QWgIPg2Ya8Ygo3rAsf~2db=Mp{EDdv&sf?06C7~ z(_i@d{9OJr_TUY;6_=DFY>7Ax%ro9;5WDjWA^BUPWaJbJsXz+c1|#3e6v{(|id3Ip z%J)FOpGpJl(PO1^$NXZbB9(1KbDo-$&d}A@q{#*?T%^NvYGH>ooy8=dvhw$!OF{xE z9~&th9Zpd5YbfT;@=|$;h?rNzPtbX`kB2{bj4lvooHJ1tehI`c1SUzY16)-4lQ^&n zejkmTT?9@)@Dp*mr2~_JY#;%ik>K2DU<}Y76379-K)?sEMtmE<7O(@>A&&IPDD^x& z+-VUiqkc1Fzt?=A%i;9#P$80^PSx*8mAeXOy{K4X)jgIJR_j;bgTR&3L>YvT)6`Ux zW^Md_D?e>D`PIg+y8No+H91h5x*dReU|gz-^|02awtg?`+Z>vir1Ds|QXLg)nWoah zjpbJxKPw+d&OQj2q;AXx=UJ8fq_!kw4fl%q$?dE*eshtZMmkxfL9uL*+x`3~(W|KP zMVjv#HI6&m@VG3`LGoP~-xHy0NqxI1BfyXENco(3#9&!&Eed9`+NYZeQXO5<}DboS%-fkvn=*6nzJ_4+!%I=uz3K8K**SeH8h)?*XY z6YDSrVEu7ye;v39uzm)E_pF;W0PCd$V4Zk_->eUg{j7_x0M)>%K1wU=j^*A7;PaWW2+nJSF zFfTP~Q^AV{X1Z0j=fkbsM?IWzsbG~xHdMGY$f82o*fFXzW$MzH1fA;$+A$VMy1ib< z5osjJ(E$vIkQNm@pfw#Px=P6~2$UR(Iy04*7KUarZ;0bcB&AkcYqVAnI9e_0D|C#j zC@CXY!m)tir_I^$4u(D*dMNVc!jBAn4C$za7P3o~jkQ`U*C~30k`DAdn!~UpO2@K{ z2Ua1N>@4MOXKkgXGMT$C8Ua6@w%TKj0F?Vj3SH*Z*^cmv+qBzj#RnPg4 zu$~P2j^cG+50s3~rR9yWgcyhq;?RAtcFHvc4n5`NTozDSi4Vdwc^@q+NYsj=#0hye z^Hz;LyIw7N1TRWt@0713s66b+aV<{^2kMjqcQIjLB(IVuhuRAZUi$gpBU9|RKW8ht zb=mW!JKWa5gP-=$e{)a< zzd+z+6Wtl-a(+?C+!sq0FS_*ls_(b2k;0Rv?>8(cDOfHRGaT&~5A3|0zeu(-58409 z{@qu$7hGDr@cWlv6dH!*nm;%rD?K*NH6bHM6wr-j^Yhh3WG7}x>65nJLKi_l?iWU+ z#%E<@XC&q5MrULt=setA-GvX_#b-ej7szG@*Dgahf1vWAgVpGLyh)0@MN#ifNfNI? zEOm>raKOR=3kNJ5uyDY_0SgB#9I$Y}!T}2hEF7?Kpe7tJwg2bdRsH$N&o{cZwqLy# z`~U9$e#0Mb+DdMD*qHVMxKHOEm9LQ*HxHm2Wdz*27Xp+C90e#7H~~Q;K4Vn^Daw68DG2BS&69U`;*Rb&ti z;6Xx36bl}nnv<3&CClJqBbcD<<8`pp(HmeCAAok6S?1`FAphvV;X%Q}!X!7A-52iR zNCSt9rF2G$v64PaZ-*w?$@}&I@9%$Dx1E~$&KOxrc}oR`ndJ=gEMwbOz}AQ9m=#Y3lw8V{``U+b8>VUDvy*J3x%lIaG|7=<8($I!)c9VvNoxef#Wb z20BgMNn?!85qf^Qn}JSKH~bi*a|lKxw#yzS=o|p|hczGl_MJ&p$5CGqDbed=C7m5i(D^vr zA67f9?rvFQv`)cEQ_|Yn1g+Dn(E8tf>#;E&jS%A`9teZn%{=JmT_&x4b$J_wy$XEvMXN{HV zVYVq#_vV9?Dsw|Lc2}g2d&at^)6~BY7^8Cx{)>b3@hm`H(`o7-9E{P)8`Mc3x9@dL zr*bp9a<{m0H@h)9N9x-rX{>8HP5qOIF*-Ymeii6r7}BeZ4G(j!t?nOLD$`2)f4icr zgYqTvpK>qO0oys{X*|!uzR9yWl+AHGp?r&SJ&u8pY)LXaj`5V$@r(=SG5#MUWqO=@ zTLP55ajk-5^a58#N&b|x=m4&_oPdr1*NBe-jL&tgGtd>_TD%+39pFBK{%+8@_wfKc z0q)nlfX4vtG5H}r!`%V;>dmwh7ImKkJlLBdeR&mtgR(g|d^;1|s=0E+=`kfWwjt|! zY)wMED?Rb)mVwILzDoLMJjzt^^zQyxqvrs%RrPT`Tgjh#P12z}S!%q{JPf`at4Z9- zVHzQS_96C}n&wYYu#*31aG$Glf9Ub}3XTV_m2ce`?6*`a9q|)Lz zG;`(QZq|Jea_7I>$lv+#U%&Y6ooVG<3;uUzavVIMv{g#}VDxw+e~w!$JC&5^Z#-P# z--ZJglnUP;@MwD`_Nj6Bzm*47O)z@OIm^$Q#((gfOs8C%{FT($b}eI~AqW3lW)*5* zQ(x{smB((*-<8g+`e(#XO;4PhBdrNWgz+3JoTSc7v#a9o>MhFp?4F{uGK>Cu<-q>} DaZhe2 literal 0 HcmV?d00001