cbf86ed0bc
Some tweaks to the decompression, and more tests, but mostly work on the compression side. We can now compress small streams properly, and these round-trip fine. However, some longer streams don't compress correctly, and more work on that is still needed (see the disabled unit test) git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1049805 13f79535-47bb-0310-9956-ffa450edef68
102 lines
4.9 KiB
XML
102 lines
4.9 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
|
<!--
|
|
====================================================================
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
this work for additional information regarding copyright ownership.
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
(the "License"); you may not use this file except in compliance with
|
|
the License. You may obtain a copy of the License at
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
See the License for the specific language governing permissions and
|
|
limitations under the License.
|
|
====================================================================
|
|
-->
|
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
|
|
|
<document>
|
|
<header>
|
|
<title>POI-HDGF - Java API To Access Microsoft Visio Format Files</title>
|
|
<subtitle>Overview</subtitle>
|
|
<authors>
|
|
<person name="Nick Burch" email="nick at apache dot org"/>
|
|
</authors>
|
|
</header>
|
|
|
|
<body>
|
|
<section>
|
|
<title>Overview</title>
|
|
|
|
<p>HDGF is the POI Project's pure Java implementation of the Visio file format.</p>
|
|
<p>Currently, HDGF provides a low-level, read-only api for
|
|
accessing Visio documents. It also provides a
|
|
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hdgf/extractor/">way</link>
|
|
to extract the textual content from a file.
|
|
</p>
|
|
<p>At this time, there is no <em>usermodel</em> api or similar,
|
|
only low level access to the streams, chunks and chunk commands.
|
|
Users are advised to check the unit tests to see how everything
|
|
works. They are also well advised to read the documentation
|
|
supplied with
|
|
<link href="http://web.archive.org/web/20071212220759/http://www.gnome.ru/projects/vsdump_en.html">vsdump</link>
|
|
to get a feel for how Visio files are structured.</p>
|
|
<p>To get a feel for the contents of a file, and to track down
|
|
where data of interest is stored, HDGF comes with
|
|
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hdgf/dev/">VSDDumper</link>
|
|
to print out the contents of the file. Users should also make
|
|
use of
|
|
<link href="http://web.archive.org/web/20071212220759/http://www.gnome.ru/projects/vsdump_en.html">vsdump</link>
|
|
to probe the structure of files.</p>
|
|
<note>
|
|
This code currently lives the
|
|
<link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link>
|
|
of the POI SVN repository.
|
|
Ensure that you have the scratchpad jar or the scratchpad
|
|
build area in your
|
|
classpath before experimenting with this code.
|
|
</note>
|
|
|
|
<section>
|
|
<title>Steps required for write support</title>
|
|
<p>Currently, HDGF is only able to read visio files, it is
|
|
not able to write them back out again. We believe the
|
|
following are the steps that would need to be taken to
|
|
implement it.</p>
|
|
<ol>
|
|
<li>Re-write the decompression support in LZW4HDGF as
|
|
HDGFLZW, which will be much better documented, and also
|
|
under the ASL. <strong>Completed October 2007</strong></li>
|
|
<li>Add compression support to HDGFLZW.
|
|
<strong>In progress - works for small streams but encoding
|
|
goes wrong on larger ones</strong></li>
|
|
<li>Have HDGF just write back the raw bytes it read in, and
|
|
have a test to ensure the file is un-changed.</li>
|
|
<li>Have HDGF generate the bytes to write out from the
|
|
Stream stores, using the compressed data as appropriate,
|
|
without re-compressing. Plus test to ensure file is
|
|
un-changed.</li>
|
|
<li>Have HDGF generate the bytes to write out from the
|
|
Stream stores, re-compressing any streams that were
|
|
decompressed. Plus test to ensure file is un-changed.</li>
|
|
<li>Have HDGF re-generate the offsets in pointers for the
|
|
locations of the streams. Plus test to ensure file is
|
|
un-changed.</li>
|
|
<li>Have HDGF re-generate the bytes for all the chunks, from
|
|
the chunk commands. Tests to ensure the chunks are
|
|
serialized properly, and then that the file is un-changed</li>
|
|
<li>Alter the data of one command, but keep it the same
|
|
length, and check visio can open the file when written
|
|
out.</li>
|
|
<li>Alter the data of one command, to a new length, and
|
|
check that visio can open the file when written out.</li>
|
|
</ol>
|
|
</section>
|
|
</section>
|
|
</body>
|
|
</document>
|