Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647579 via svnmerge from
https://svn.apache.org:443/repos/asf/poi/trunk
........
r647278 | josh | 2008-04-11 20:36:37 +0100 (Fri, 11 Apr 2008) | 1 line
fixed typo and formatting in class javadoc
........
r647567 | nick | 2008-04-13 14:16:36 +0100 (Sun, 13 Apr 2008) | 1 line
Various new bits of documentation on embeded files and text extraction
........
r647574 | nick | 2008-04-13 15:58:27 +0100 (Sun, 13 Apr 2008) | 1 line
Start on a eventusermodel based excel text extractor
........
r647576 | nick | 2008-04-13 16:09:42 +0100 (Sun, 13 Apr 2008) | 1 line
Finish off eventusermodel based Excel Extractor, and update the xls to csv converter (moved to correct place) based on discoveries for the text extractor
........
r647577 | nick | 2008-04-13 16:13:17 +0100 (Sun, 13 Apr 2008) | 1 line
Add information of EventBasedExcelExtractor to the documentation
........
git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@647580 13f79535-47bb-0310-9956-ffa450edef68
2008-04-13 11:25:33 -04:00
|
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
|
|
<!--
|
|
|
|
====================================================================
|
|
|
|
Licensed to the Apache Software Foundation (ASF) under one or more
|
|
|
|
contributor license agreements. See the NOTICE file distributed with
|
|
|
|
this work for additional information regarding copyright ownership.
|
|
|
|
The ASF licenses this file to You under the Apache License, Version 2.0
|
|
|
|
(the "License"); you may not use this file except in compliance with
|
|
|
|
the License. You may obtain a copy of the License at
|
|
|
|
|
|
|
|
http://www.apache.org/licenses/LICENSE-2.0
|
|
|
|
|
|
|
|
Unless required by applicable law or agreed to in writing, software
|
|
|
|
distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
|
See the License for the specific language governing permissions and
|
|
|
|
limitations under the License.
|
|
|
|
====================================================================
|
|
|
|
-->
|
|
|
|
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
|
|
|
|
<document>
|
|
|
|
<header>
|
|
|
|
<title>Apache POI - POIFS - Documents embeded in other documents</title>
|
|
|
|
<subtitle>Overview</subtitle>
|
|
|
|
<authors>
|
|
|
|
<person name="Nick Burch" email="nick@apache.org"/>
|
|
|
|
<person name="Yegor Kozlov" email="yegor@apache.org"/>
|
|
|
|
</authors>
|
|
|
|
</header>
|
|
|
|
<body>
|
|
|
|
<section><title>Overview</title>
|
|
|
|
<p>It is possible for one OLE 2 based document to have other
|
|
|
|
OLE 2 documents embeded in it. For example, and Excel file
|
|
|
|
may have a word document and a powerpoint slideshow
|
|
|
|
embeded as part of it.</p>
|
|
|
|
<p>Normally, these other documents are stored in subdirectories
|
|
|
|
of the OLE 2 (POIFS) filesystem. The exact location of the
|
|
|
|
embeded documents will vary depending on the type of the
|
|
|
|
master document, and the exact directory names will differ
|
|
|
|
each time. To figure out exactly which directory to look
|
|
|
|
in, you will either need to process the appropriate OLE 2
|
|
|
|
linking entry in the master document, or simple iterate
|
|
|
|
over all the directories in the filesystem.</p>
|
|
|
|
<p>As a general rule, you will find the same OLE 2 entries
|
|
|
|
in the subdirectories, as you would've found at the root
|
|
|
|
of the filesystem were a document to not be embeded.</p>
|
|
|
|
|
|
|
|
<section><title>Files embeded in Excel</title>
|
|
|
|
<p>Excel normally stores embeded files in subdirectories
|
|
|
|
of the filesystem root. Typically these subdirectories
|
|
|
|
are named starting with MBD, with 8 hex characters following.</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Files embeded in Word</title>
|
|
|
|
<p>Word normally stores embeded files in subdirectories
|
|
|
|
of the ObjectPool directory, itself a subdirectory of the
|
|
|
|
filesystem root. Typically these subdirectories and named
|
|
|
|
starting with an underscore, followed by 10 numbers.</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Files embeded in PowerPoint</title>
|
|
|
|
<p>PowerPoint does not normally store embeded files
|
|
|
|
in the OLE2 layer. Instead, they are held within records
|
Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647277,647279-647566,647568-647573,647575,647578-647711,647714-647737,647739-647823,647825-648155,648157-648202,648204-648273,648275,648277-648302,648304-648333,648335-650914,650916-650920 via svnmerge from
https://svn.apache.org:443/repos/asf/poi/trunk
........
r648589 | yegor | 2008-04-16 08:47:16 +0100 (Wed, 16 Apr 2008) | 1 line
bug #41071 is fixed in trunk. Added a unit test and resolved.
........
r648623 | yegor | 2008-04-16 09:43:08 +0100 (Wed, 16 Apr 2008) | 1 line
Rich text in HSSFTextbox must have at least one format run. Make sure it is so and apply th default fopnt if no formats were applied.
........
r648624 | yegor | 2008-04-16 09:44:07 +0100 (Wed, 16 Apr 2008) | 1 line
Misc improvements in Freeform shape
........
r648674 | yegor | 2008-04-16 12:57:15 +0100 (Wed, 16 Apr 2008) | 1 line
Support for getting OLE object data from slide show
........
r649142 | yegor | 2008-04-17 16:06:01 +0100 (Thu, 17 Apr 2008) | 1 line
added a unit test and closed bug #28774
........
r649143 | yegor | 2008-04-17 16:08:03 +0100 (Thu, 17 Apr 2008) | 1 line
initial support for rendering powerpoint slides into images
........
r649145 | yegor | 2008-04-17 16:09:37 +0100 (Thu, 17 Apr 2008) | 1 line
updated the list of changes
........
r649557 | yegor | 2008-04-18 15:57:07 +0100 (Fri, 18 Apr 2008) | 1 line
improved rendering of text
........
r649796 | yegor | 2008-04-19 12:09:59 +0100 (Sat, 19 Apr 2008) | 1 line
Support for getting embedded sounds from slide show
........
r649797 | yegor | 2008-04-19 12:16:53 +0100 (Sat, 19 Apr 2008) | 1 line
properly set shapeId for new shapes
........
r649798 | yegor | 2008-04-19 12:17:37 +0100 (Sat, 19 Apr 2008) | 1 line
misc improvements in slide rendering
........
r649800 | yegor | 2008-04-19 12:52:36 +0100 (Sat, 19 Apr 2008) | 1 line
updated the docs
........
r649911 | yegor | 2008-04-20 12:17:48 +0100 (Sun, 20 Apr 2008) | 1 line
more improvements in slide rendering
........
r649914 | yegor | 2008-04-20 12:58:08 +0100 (Sun, 20 Apr 2008) | 1 line
set version.id=3.0.3-beta1
........
r650129 | yegor | 2008-04-21 13:51:47 +0100 (Mon, 21 Apr 2008) | 1 line
more improvements in slide rendering
........
r650130 | yegor | 2008-04-21 13:52:23 +0100 (Mon, 21 Apr 2008) | 1 line
a couple of HSLF examples
........
r650133 | yegor | 2008-04-21 14:10:33 +0100 (Mon, 21 Apr 2008) | 1 line
update current version to 3.1-beta1
........
r650138 | yegor | 2008-04-21 14:29:59 +0100 (Mon, 21 Apr 2008) | 1 line
unfinished release guide. It would be nice to have a html version.
........
r650139 | yegor | 2008-04-21 14:31:53 +0100 (Mon, 21 Apr 2008) | 1 line
unfinished release guide. It would be nice to have a html version.
........
git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@650938 13f79535-47bb-0310-9956-ffa450edef68
2008-04-23 12:49:18 -04:00
|
|
|
of the main PowerPoint file.
|
|
|
|
<br/>See the <link href="./../hslf/how-to-shapes.html#OLE">HSLF Tutorial</link>
|
|
|
|
for how to retrieve embedded OLE objects from a presentation</p>
|
Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647579 via svnmerge from
https://svn.apache.org:443/repos/asf/poi/trunk
........
r647278 | josh | 2008-04-11 20:36:37 +0100 (Fri, 11 Apr 2008) | 1 line
fixed typo and formatting in class javadoc
........
r647567 | nick | 2008-04-13 14:16:36 +0100 (Sun, 13 Apr 2008) | 1 line
Various new bits of documentation on embeded files and text extraction
........
r647574 | nick | 2008-04-13 15:58:27 +0100 (Sun, 13 Apr 2008) | 1 line
Start on a eventusermodel based excel text extractor
........
r647576 | nick | 2008-04-13 16:09:42 +0100 (Sun, 13 Apr 2008) | 1 line
Finish off eventusermodel based Excel Extractor, and update the xls to csv converter (moved to correct place) based on discoveries for the text extractor
........
r647577 | nick | 2008-04-13 16:13:17 +0100 (Sun, 13 Apr 2008) | 1 line
Add information of EventBasedExcelExtractor to the documentation
........
git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@647580 13f79535-47bb-0310-9956-ffa450edef68
2008-04-13 11:25:33 -04:00
|
|
|
</section>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Listing POIFS contents</title>
|
|
|
|
<p>POIFS provides a simple tool for listing the contents of
|
|
|
|
OLE2 files. This can allow you to see what your POIFS file
|
|
|
|
contents, and hence if it has any embeded documents in it,
|
|
|
|
and where.</p>
|
|
|
|
<p>The tool to use is <em>org.apache.poi.poifs.dev.POIFSLister</em>.
|
|
|
|
This tool may be run from the command line, and takes a filename
|
|
|
|
as its parameter. It will print out all the directories and
|
|
|
|
files contained within the POIFS file.</p>
|
|
|
|
</section>
|
|
|
|
|
|
|
|
<section><title>Opening embeded files</title>
|
|
|
|
<p>All of the POIDocument classes (HSSFWorkbook, HSLFSlideShow,
|
|
|
|
HWPFDocument and HDGFDiagram) can either be opened from
|
|
|
|
a POIFSFileSystem, or from a specific directory within a
|
|
|
|
POIFSFileSystem. So, to open embeded files, simply locate the
|
|
|
|
appropriate DirectoryNode that represents the subdirectory
|
|
|
|
of interest, and pass this + the overall POIFSFileSystem to
|
|
|
|
the constructor.</p>
|
|
|
|
<p>I you want to extract the textual contents of the embeded file,
|
|
|
|
then open the appropriate POIDocument, and then pass this to
|
|
|
|
the extractor class, instead of simply passing the POIFSFilesystem
|
|
|
|
to the extractor.</p>
|
|
|
|
</section>
|
|
|
|
</body>
|
|
|
|
</document>
|