Add documentation for the HMEF (TNEF/winmail.dat) support so far.
Also add a little bit to the HPBF docs, and tweak build.xml to check the right files when deciding if the docs are up to date. git-svn-id: https://svn.apache.org/repos/asf/poi/trunk@1077891 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
8f66d00de6
commit
63077a605f
@ -748,7 +748,7 @@ under the License.
|
|||||||
|
|
||||||
<target name="-check-docs">
|
<target name="-check-docs">
|
||||||
<uptodate property="main.docs.notRequired" targetfile="${build.site}/index.html">
|
<uptodate property="main.docs.notRequired" targetfile="${build.site}/index.html">
|
||||||
<srcfiles dir="${build.site.src}"/>
|
<srcfiles dir="${main.documentation}" />
|
||||||
</uptodate>
|
</uptodate>
|
||||||
</target>
|
</target>
|
||||||
|
|
||||||
|
@ -35,19 +35,15 @@
|
|||||||
<p>HMEF is the POI Project's pure Java implementation of the
|
<p>HMEF is the POI Project's pure Java implementation of the
|
||||||
TNEF (Transport Neurtral Encoding Format), aka winmail.dat,
|
TNEF (Transport Neurtral Encoding Format), aka winmail.dat,
|
||||||
which is used by Outlook and Exchange in some situations.</p>
|
which is used by Outlook and Exchange in some situations.</p>
|
||||||
<p>Currently, HMEF provides a low-level, read-only api for
|
<p>Currently, HMEF provides a read-only api for accessing common
|
||||||
accessing core TNEF attributes. It is able to provide access
|
message and attachment attributes, including the message body
|
||||||
to both TNEF and MAPI attributes, and low level access to
|
and attachment files. In addition, it's possible to have
|
||||||
attachments. Compressed RTF is not yet fully supported, and
|
read-only access to all of the underlying TNEF and MAPI
|
||||||
user-facing access to common attributes and attachment contents
|
attributes of the message and attachments.</p>
|
||||||
is not yet present.</p>
|
<p>HMEF also provides a command line tool for extracting out
|
||||||
<p>HMEF is currently very much a work-in-progress, and we hope
|
the message body and attachment files from a TNEF (winmail.dat)
|
||||||
to add a text extractor and attachment extractor in the not
|
file.</p>
|
||||||
too distant future.</p>
|
|
||||||
<p>To get a feel for the contents of a file, and to track down
|
|
||||||
where data of interest is stored, HMEF comes with
|
|
||||||
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link>
|
|
||||||
to print out the contents of the file.</p>
|
|
||||||
<note>
|
<note>
|
||||||
This code currently lives the
|
This code currently lives the
|
||||||
<link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link>
|
<link href="http://svn.apache.org/viewcvs.cgi/poi/trunk/src/scratchpad/">scratchpad area</link>
|
||||||
@ -55,7 +51,167 @@
|
|||||||
Ensure that you have the scratchpad jar or the scratchpad
|
Ensure that you have the scratchpad jar or the scratchpad
|
||||||
build area in your classpath before experimenting with this code.
|
build area in your classpath before experimenting with this code.
|
||||||
</note>
|
</note>
|
||||||
|
<note>
|
||||||
|
This code is a new POI feature, and the first release that will
|
||||||
|
contain it will be POI 3.8 beta 2. Until then, you will need to
|
||||||
|
build your own jars from a <link href="../subversion.html">svn
|
||||||
|
checkout</link>.
|
||||||
|
</note>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Using HMEF to access TNEF (winmail.dat) files</title>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Easy extraction of message body and attachment files</title>
|
||||||
|
|
||||||
|
<p>The class <em>org.apache.poi.hmef.extractor.HMEFContentsExtractor</em>
|
||||||
|
provides both command line and Java extraction. It allows the
|
||||||
|
saving of the message body (an RTF file), and all of the
|
||||||
|
attachment files, to a single directory as specified.</p>
|
||||||
|
|
||||||
|
<p>From the command line, simply call the class specifying the
|
||||||
|
TNEF file to extract, and the directory to place the extracted
|
||||||
|
files into, eg:</p>
|
||||||
|
<source>
|
||||||
|
java -classpath poi-3.8-FINAL.jar:poi-scratchpad-3.8-FINAL.jar org.apache.poi.hmef.extractor.HMEFContentsExtractor winmail.dat /tmp/extracted/
|
||||||
|
</source>
|
||||||
|
|
||||||
|
<p>From Java, there are two method calls on the class, one to
|
||||||
|
extract the message body RTF to a file, and the other to extract
|
||||||
|
all the attachments to a directory. A typical use would be:</p>
|
||||||
|
<source>
|
||||||
|
public void extract(String winmailFilename, String directoryName) throws Exception {
|
||||||
|
HMEFContentsExtractor ext = new HMEFContentsExtractor(new File(winmailFilename));
|
||||||
|
|
||||||
|
File dir = new File(directoryName);
|
||||||
|
File rtf = new File(dir, "message.rtf");
|
||||||
|
if(! dir.exists()) {
|
||||||
|
throw new FileNotFoundException("Output directory " + dir.getName() + " not found");
|
||||||
|
}
|
||||||
|
|
||||||
|
System.out.println("Extracting...");
|
||||||
|
ext.extractMessageBody(rtf);
|
||||||
|
ext.extractAttachments(dir);
|
||||||
|
System.out.println("Extraction completed");
|
||||||
|
}
|
||||||
|
</source>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Attachment attributes and contents</title>
|
||||||
|
|
||||||
|
<p>To get at your attachments, simply call the
|
||||||
|
<em>getAttachments()</em> method on a <em>HMEFMessage</em>
|
||||||
|
instance, and you'll receive a list of all the attachments.</p>
|
||||||
|
<p>When you have a <em>org.apache.poi.hmef.Attachment</em> object,
|
||||||
|
there are several helper methods available. These will all
|
||||||
|
return the value of the appropriate underlying attachment
|
||||||
|
attributes, or null if for some reason the attribute isn't
|
||||||
|
present in your file.</p>
|
||||||
|
<ul>
|
||||||
|
<li><em>getFilename()</em> - returns the name of the attachment
|
||||||
|
file, possibly in 8.3 format</li>
|
||||||
|
<li><em>getLongFilename()</em> - returns the full name of the
|
||||||
|
attachment file</li>
|
||||||
|
<li><em>getExtension()</em> - returns the extension of the
|
||||||
|
attachment file, including the "."</li>
|
||||||
|
<li><em>getModifiedDate()</em> - returns the date that the
|
||||||
|
attachment file was last edited on</li>
|
||||||
|
<li><em>getContents()</em> - returns a byte array of the contents
|
||||||
|
of the attached file</li>
|
||||||
|
<li><em>getRenderedMetaFile()</em> - returns a byte array of
|
||||||
|
a windows meta file representation of the attached file</li>
|
||||||
|
</ul>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Message attributes and message body</title>
|
||||||
|
|
||||||
|
<p>A <em>org.apache.poi.hmef.HMEFMessage</em> instance is created
|
||||||
|
from an <em>InputStream</em> of the underlying TNEF (winmail.dat)
|
||||||
|
file.</p>
|
||||||
|
<p>From a <em>HMEFMessage</em>, there are three main methods of
|
||||||
|
interest to call:</p>
|
||||||
|
<ul>
|
||||||
|
<li><em>getBody()</em> - returns a String containing the RTF
|
||||||
|
contents of the message body.
|
||||||
|
<em>Note - see limitations</em></li>
|
||||||
|
<li><em>getSubject()</em> - returns the message subject</li>
|
||||||
|
<li><em>getAttachments()</em> - returns the list of
|
||||||
|
<em>Attachment</em> objects for the message</li>
|
||||||
|
</ul>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Low level attribute access</title>
|
||||||
|
|
||||||
|
<p>Both Messages and Attachments contain two kinds of attributes.
|
||||||
|
These are <em>TNEFAttribute</em> and <em>MAPIAttribute</em>.</p>
|
||||||
|
<p>TNEFAttribute is specific to TNEF files in terms of the
|
||||||
|
available types and properties. In general, Attachments have a
|
||||||
|
few more useful ones of these then Messages.</p>
|
||||||
|
<p>MAPIAttributes hold standard MAPI properties and values, and
|
||||||
|
work in a similar way to <link href="../hsmf/">HSMF
|
||||||
|
(Outlook)</link> does. There are typically many of these on both
|
||||||
|
Messages and Attachments. <em>Note - see limitations</em></p>
|
||||||
|
<p>Both <em>HMEFMessage</em> and <em>Attachment</em> supports
|
||||||
|
support two different ways of getting to attributes of interest.
|
||||||
|
Firstly, they support list getters, to return all attributes
|
||||||
|
(either TNEF or MAPI). Secondly, they support specific getters by
|
||||||
|
TNEF or MAPI property.</p>
|
||||||
|
<source>
|
||||||
|
HMEFMessage msg = new HMEFMessage(new FileInputStream(file));
|
||||||
|
for(TNEFAttribute attr : msg.getMessageAttributes) {
|
||||||
|
System.out.println("TNEF : " + attr);
|
||||||
|
}
|
||||||
|
for(MAPIAttribute attr : msg.getMessageMAPIAttributes) {
|
||||||
|
System.out.println("MAPI : " + attr);
|
||||||
|
}
|
||||||
|
System.out.println("Subject is " + msg.getMessageMAPIAttribute(MAPIProperty.CONVERSATION_TOPIC));
|
||||||
|
|
||||||
|
for(Attachment attach : msg.getAttachments()) {
|
||||||
|
for(TNEFAttribute attr : attach.getAttributes) {
|
||||||
|
System.out.println("A.TNEF : " + attr);
|
||||||
|
}
|
||||||
|
for(MAPIAttribute attr : attach.getMAPIAttributes) {
|
||||||
|
System.out.println("A.MAPI : " + attr);
|
||||||
|
}
|
||||||
|
System.out.println("Filename is " + attach.getAttribute(TNEFProperty.CID_ATTACHTITLE));
|
||||||
|
System.out.println("Extension is " + attach.getMAPIAttribute(MAPIProperty.ATTACH_EXTENSION));
|
||||||
|
}
|
||||||
|
</source>
|
||||||
|
</section>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Investigating a TNEF file</title>
|
||||||
|
|
||||||
|
<p>To get a feel for the contents of a file, and to track down
|
||||||
|
where data of interest is stored, HMEF comes with
|
||||||
|
<link href="http://svn.apache.org/repos/asf/poi/trunk/src/scratchpad/src/org/apache/poi/hmef/dev/">HMEFDumper</link>
|
||||||
|
to print out the contents of the file.</p>
|
||||||
|
</section>
|
||||||
|
|
||||||
|
<section>
|
||||||
|
<title>Limitations</title>
|
||||||
|
|
||||||
|
<p>HMEF is currently a work-in-progress, and not everything
|
||||||
|
works yet. The current limitations are:</p>
|
||||||
|
<ul>
|
||||||
|
<li>Compressed RTF Message Bodies are not correctly
|
||||||
|
decompressed. This means that a call to
|
||||||
|
<em>HMEFMessage.getBody()</em> is unlikely to return the
|
||||||
|
correct RTF.</li>
|
||||||
|
<li>Non-standard MAPI properties from the range 0x8000 to 0x8fff
|
||||||
|
may not be being quite correctly turned into attributes.
|
||||||
|
The values show up, but the name and type may not always
|
||||||
|
be correct.</li>
|
||||||
|
<li>All testing so far has been performed on a small number of
|
||||||
|
English documents. We think we're correctly turning bytes into
|
||||||
|
Java unicode strings, but we need a few non-English sample
|
||||||
|
files in the test suite to verify this!</li>
|
||||||
|
</ul>
|
||||||
</section>
|
</section>
|
||||||
</body>
|
</body>
|
||||||
</document>
|
</document>
|
||||||
|
35
src/documentation/content/xdocs/hpbf/book.xml
Normal file
35
src/documentation/content/xdocs/hpbf/book.xml
Normal file
@ -0,0 +1,35 @@
|
|||||||
|
<?xml version="1.0"?>
|
||||||
|
<!--
|
||||||
|
====================================================================
|
||||||
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
||||||
|
contributor license agreements. See the NOTICE file distributed with
|
||||||
|
this work for additional information regarding copyright ownership.
|
||||||
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
||||||
|
(the "License"); you may not use this file except in compliance with
|
||||||
|
the License. You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License.
|
||||||
|
====================================================================
|
||||||
|
-->
|
||||||
|
<!DOCTYPE book PUBLIC "-//APACHE//DTD Cocoon Documentation Book V1.0//EN" "../dtd/book-cocoon-v10.dtd">
|
||||||
|
|
||||||
|
<book software="POI Project"
|
||||||
|
title="HPBF"
|
||||||
|
copyright="@year@ POI Project">
|
||||||
|
|
||||||
|
<menu label="Apache POI">
|
||||||
|
<menu-item label="Top" href="../index.html"/>
|
||||||
|
</menu>
|
||||||
|
|
||||||
|
<menu label="HPBF">
|
||||||
|
<menu-item label="Overview" href="index.html"/>
|
||||||
|
<menu-item label="File Format" href="file-format.xml"/>
|
||||||
|
</menu>
|
||||||
|
|
||||||
|
</book>
|
@ -45,7 +45,10 @@
|
|||||||
the document (partly supported). Additional low level
|
the document (partly supported). Additional low level
|
||||||
code to process the file format may follow, if there
|
code to process the file format may follow, if there
|
||||||
is demand and developer interest warrant it.</p>
|
is demand and developer interest warrant it.</p>
|
||||||
<p>At this time, there is no <em>usermodel</em> api or similar.
|
<p>Text Extraction is available via the
|
||||||
|
<em>org.apache.poi.hpbf.extractor.PublisherTextExtractor</em>
|
||||||
|
class.</p>
|
||||||
|
<p>At this time, there is no <em>usermodel</em> api or similar.
|
||||||
There is only low level support for certain parts of
|
There is only low level support for certain parts of
|
||||||
the file, but by no means all of it.</p>
|
the file, but by no means all of it.</p>
|
||||||
<p>Our current understanding of the file format is documented
|
<p>Our current understanding of the file format is documented
|
||||||
|
Loading…
Reference in New Issue
Block a user