63 lines
2.4 KiB
HTML
63 lines
2.4 KiB
HTML
<!doctype html public "-//W3C//DTD HTML 4.0//EN//">
|
|
|
|
<html>
|
|
<head>
|
|
<title></title>
|
|
</head>
|
|
|
|
<body>
|
|
<div>
|
|
<p>The <strong>POI Browser</strong> is a very simple Swing GUI tool that
|
|
displays the internal structure of a Microsoft Office file. It concentrates
|
|
on streams in the <em>Horrible Property Set Format (HPSF)</em>. In order to
|
|
access these streams the POI Browser uses the package
|
|
<tt>org.apache.poi.hpsf</tt>.</p>
|
|
|
|
<p>A file in Microsoft's Office format can be seen as a filesystem within a
|
|
file. For example, a Word document like <var>sample.doc</var> is just a
|
|
simple file from the operation system's point of view. However, internally
|
|
it is organized into various directories and files. For example,
|
|
<var>sample.doc</var> might consist of the three internal files (or
|
|
"streams", as Microsoft calls them) <tt>\001CompObj</tt>,
|
|
<tt>\005SummaryInformation</tt>, and <tt>WordDocument</tt>. (In these names
|
|
\001 and \005 denote the unprintable characters with the character codes 1
|
|
and 5, respectively.) A more complicated Word file typically contains a
|
|
directory named <tt>ObjectPool</tt> with more directories and files nested
|
|
within it.</p>
|
|
|
|
<p>The POI Browser makes these internal structures visible. It takes one or
|
|
more Microsoft files as input on the command line and shows directories and
|
|
files in a tree-like structure. On the top-level POI Browser displays the
|
|
(operating system) filenames. An internal file (i.e. a "stream" or a
|
|
"document") is shown with its name, its size and a hexadecimal dump of its
|
|
first bytes.</p>
|
|
|
|
<p>The POI Browser pays special attention to property set streams. For
|
|
example, the <tt>\005SummaryInformation</tt> stream contains information
|
|
like title and author of the document. The POI Browser opens every stream
|
|
in a POI filesystem. If it encounters a property set stream, it displays
|
|
not just its first bytes but analyses the whole stream and displays its
|
|
contents in a more or less readable manner.</p>
|
|
</div>
|
|
</body>
|
|
</html>
|
|
|
|
<!-- Keep this comment at the end of the file
|
|
Local variables:
|
|
sgml-default-dtd-file:"HTML_4.0_Strict.ced"
|
|
mode: html
|
|
sgml-omittag:t
|
|
sgml-shorttag:nil
|
|
sgml-namecase-general:t
|
|
sgml-general-insert-case:lower
|
|
sgml-minimize-attributes:nil
|
|
sgml-always-quote-attributes:t
|
|
sgml-indent-step:1
|
|
sgml-indent-data:t
|
|
sgml-parent-document:nil
|
|
sgml-exposed-tags:nil
|
|
sgml-local-catalogs:nil
|
|
sgml-local-ecat-files:nil
|
|
End:
|
|
-->
|