poi/src/documentation/content/xdocs/poifs/embeded.xml

96 lines
4.7 KiB
XML
Raw Normal View History

Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647579 via svnmerge from https://svn.apache.org:443/repos/asf/poi/trunk ........ r647278 | josh | 2008-04-11 20:36:37 +0100 (Fri, 11 Apr 2008) | 1 line fixed typo and formatting in class javadoc ........ r647567 | nick | 2008-04-13 14:16:36 +0100 (Sun, 13 Apr 2008) | 1 line Various new bits of documentation on embeded files and text extraction ........ r647574 | nick | 2008-04-13 15:58:27 +0100 (Sun, 13 Apr 2008) | 1 line Start on a eventusermodel based excel text extractor ........ r647576 | nick | 2008-04-13 16:09:42 +0100 (Sun, 13 Apr 2008) | 1 line Finish off eventusermodel based Excel Extractor, and update the xls to csv converter (moved to correct place) based on discoveries for the text extractor ........ r647577 | nick | 2008-04-13 16:13:17 +0100 (Sun, 13 Apr 2008) | 1 line Add information of EventBasedExcelExtractor to the documentation ........ git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@647580 13f79535-47bb-0310-9956-ffa450edef68
2008-04-13 11:25:33 -04:00
<?xml version="1.0" encoding="UTF-8"?>
<!--
====================================================================
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
====================================================================
-->
<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V1.1//EN" "../dtd/document-v11.dtd">
<document>
<header>
<title>Apache POI - POIFS - Documents embeded in other documents</title>
<subtitle>Overview</subtitle>
<authors>
<person name="Nick Burch" email="nick@apache.org"/>
<person name="Yegor Kozlov" email="yegor@apache.org"/>
</authors>
</header>
<body>
<section><title>Overview</title>
<p>It is possible for one OLE 2 based document to have other
OLE 2 documents embeded in it. For example, and Excel file
may have a word document and a powerpoint slideshow
embeded as part of it.</p>
<p>Normally, these other documents are stored in subdirectories
of the OLE 2 (POIFS) filesystem. The exact location of the
embeded documents will vary depending on the type of the
master document, and the exact directory names will differ
each time. To figure out exactly which directory to look
in, you will either need to process the appropriate OLE 2
linking entry in the master document, or simple iterate
over all the directories in the filesystem.</p>
<p>As a general rule, you will find the same OLE 2 entries
in the subdirectories, as you would've found at the root
of the filesystem were a document to not be embeded.</p>
<section><title>Files embeded in Excel</title>
<p>Excel normally stores embeded files in subdirectories
of the filesystem root. Typically these subdirectories
are named starting with MBD, with 8 hex characters following.</p>
</section>
<section><title>Files embeded in Word</title>
<p>Word normally stores embeded files in subdirectories
of the ObjectPool directory, itself a subdirectory of the
filesystem root. Typically these subdirectories and named
starting with an underscore, followed by 10 numbers.</p>
</section>
<section><title>Files embeded in PowerPoint</title>
<p>PowerPoint does not normally store embeded files
in the OLE2 layer. Instead, they are held within records
Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647277,647279-647566,647568-647573,647575,647578-647711,647714-647737,647739-647823,647825-648155,648157-648202,648204-648273,648275,648277-648302,648304-648333,648335-650914,650916-650920 via svnmerge from https://svn.apache.org:443/repos/asf/poi/trunk ........ r648589 | yegor | 2008-04-16 08:47:16 +0100 (Wed, 16 Apr 2008) | 1 line bug #41071 is fixed in trunk. Added a unit test and resolved. ........ r648623 | yegor | 2008-04-16 09:43:08 +0100 (Wed, 16 Apr 2008) | 1 line Rich text in HSSFTextbox must have at least one format run. Make sure it is so and apply th default fopnt if no formats were applied. ........ r648624 | yegor | 2008-04-16 09:44:07 +0100 (Wed, 16 Apr 2008) | 1 line Misc improvements in Freeform shape ........ r648674 | yegor | 2008-04-16 12:57:15 +0100 (Wed, 16 Apr 2008) | 1 line Support for getting OLE object data from slide show ........ r649142 | yegor | 2008-04-17 16:06:01 +0100 (Thu, 17 Apr 2008) | 1 line added a unit test and closed bug #28774 ........ r649143 | yegor | 2008-04-17 16:08:03 +0100 (Thu, 17 Apr 2008) | 1 line initial support for rendering powerpoint slides into images ........ r649145 | yegor | 2008-04-17 16:09:37 +0100 (Thu, 17 Apr 2008) | 1 line updated the list of changes ........ r649557 | yegor | 2008-04-18 15:57:07 +0100 (Fri, 18 Apr 2008) | 1 line improved rendering of text ........ r649796 | yegor | 2008-04-19 12:09:59 +0100 (Sat, 19 Apr 2008) | 1 line Support for getting embedded sounds from slide show ........ r649797 | yegor | 2008-04-19 12:16:53 +0100 (Sat, 19 Apr 2008) | 1 line properly set shapeId for new shapes ........ r649798 | yegor | 2008-04-19 12:17:37 +0100 (Sat, 19 Apr 2008) | 1 line misc improvements in slide rendering ........ r649800 | yegor | 2008-04-19 12:52:36 +0100 (Sat, 19 Apr 2008) | 1 line updated the docs ........ r649911 | yegor | 2008-04-20 12:17:48 +0100 (Sun, 20 Apr 2008) | 1 line more improvements in slide rendering ........ r649914 | yegor | 2008-04-20 12:58:08 +0100 (Sun, 20 Apr 2008) | 1 line set version.id=3.0.3-beta1 ........ r650129 | yegor | 2008-04-21 13:51:47 +0100 (Mon, 21 Apr 2008) | 1 line more improvements in slide rendering ........ r650130 | yegor | 2008-04-21 13:52:23 +0100 (Mon, 21 Apr 2008) | 1 line a couple of HSLF examples ........ r650133 | yegor | 2008-04-21 14:10:33 +0100 (Mon, 21 Apr 2008) | 1 line update current version to 3.1-beta1 ........ r650138 | yegor | 2008-04-21 14:29:59 +0100 (Mon, 21 Apr 2008) | 1 line unfinished release guide. It would be nice to have a html version. ........ r650139 | yegor | 2008-04-21 14:31:53 +0100 (Mon, 21 Apr 2008) | 1 line unfinished release guide. It would be nice to have a html version. ........ git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@650938 13f79535-47bb-0310-9956-ffa450edef68
2008-04-23 12:49:18 -04:00
of the main PowerPoint file.
<br/>See the <link href="./../slideshow/how-to-shapes.html#OLE">HSLF Tutorial</link>
Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647277,647279-647566,647568-647573,647575,647578-647711,647714-647737,647739-647823,647825-648155,648157-648202,648204-648273,648275,648277-648302,648304-648333,648335-650914,650916-650920 via svnmerge from https://svn.apache.org:443/repos/asf/poi/trunk ........ r648589 | yegor | 2008-04-16 08:47:16 +0100 (Wed, 16 Apr 2008) | 1 line bug #41071 is fixed in trunk. Added a unit test and resolved. ........ r648623 | yegor | 2008-04-16 09:43:08 +0100 (Wed, 16 Apr 2008) | 1 line Rich text in HSSFTextbox must have at least one format run. Make sure it is so and apply th default fopnt if no formats were applied. ........ r648624 | yegor | 2008-04-16 09:44:07 +0100 (Wed, 16 Apr 2008) | 1 line Misc improvements in Freeform shape ........ r648674 | yegor | 2008-04-16 12:57:15 +0100 (Wed, 16 Apr 2008) | 1 line Support for getting OLE object data from slide show ........ r649142 | yegor | 2008-04-17 16:06:01 +0100 (Thu, 17 Apr 2008) | 1 line added a unit test and closed bug #28774 ........ r649143 | yegor | 2008-04-17 16:08:03 +0100 (Thu, 17 Apr 2008) | 1 line initial support for rendering powerpoint slides into images ........ r649145 | yegor | 2008-04-17 16:09:37 +0100 (Thu, 17 Apr 2008) | 1 line updated the list of changes ........ r649557 | yegor | 2008-04-18 15:57:07 +0100 (Fri, 18 Apr 2008) | 1 line improved rendering of text ........ r649796 | yegor | 2008-04-19 12:09:59 +0100 (Sat, 19 Apr 2008) | 1 line Support for getting embedded sounds from slide show ........ r649797 | yegor | 2008-04-19 12:16:53 +0100 (Sat, 19 Apr 2008) | 1 line properly set shapeId for new shapes ........ r649798 | yegor | 2008-04-19 12:17:37 +0100 (Sat, 19 Apr 2008) | 1 line misc improvements in slide rendering ........ r649800 | yegor | 2008-04-19 12:52:36 +0100 (Sat, 19 Apr 2008) | 1 line updated the docs ........ r649911 | yegor | 2008-04-20 12:17:48 +0100 (Sun, 20 Apr 2008) | 1 line more improvements in slide rendering ........ r649914 | yegor | 2008-04-20 12:58:08 +0100 (Sun, 20 Apr 2008) | 1 line set version.id=3.0.3-beta1 ........ r650129 | yegor | 2008-04-21 13:51:47 +0100 (Mon, 21 Apr 2008) | 1 line more improvements in slide rendering ........ r650130 | yegor | 2008-04-21 13:52:23 +0100 (Mon, 21 Apr 2008) | 1 line a couple of HSLF examples ........ r650133 | yegor | 2008-04-21 14:10:33 +0100 (Mon, 21 Apr 2008) | 1 line update current version to 3.1-beta1 ........ r650138 | yegor | 2008-04-21 14:29:59 +0100 (Mon, 21 Apr 2008) | 1 line unfinished release guide. It would be nice to have a html version. ........ r650139 | yegor | 2008-04-21 14:31:53 +0100 (Mon, 21 Apr 2008) | 1 line unfinished release guide. It would be nice to have a html version. ........ git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@650938 13f79535-47bb-0310-9956-ffa450edef68
2008-04-23 12:49:18 -04:00
for how to retrieve embedded OLE objects from a presentation</p>
Merged revisions 638786-638802,638805-638811,638813-638814,638816-639230,639233-639241,639243-639253,639255-639486,639488-639601,639603-639835,639837-639917,639919-640056,640058-640710,640712-641156,641158-641184,641186-641795,641797-641798,641800-641933,641935-641963,641965-641966,641968-641995,641997-642230,642232-642562,642564-642565,642568-642570,642572-642573,642576-642736,642739-642877,642879,642881-642890,642892-642903,642905-642945,642947-643624,643626-643653,643655-643669,643671,643673-643830,643832-643833,643835-644342,644344-644472,644474-644508,644510-645347,645349-645351,645353-645559,645561-645565,645568-645951,645953-646193,646195-646311,646313-646404,646406-646665,646667-646853,646855-646869,646871-647151,647153-647185,647187-647579 via svnmerge from https://svn.apache.org:443/repos/asf/poi/trunk ........ r647278 | josh | 2008-04-11 20:36:37 +0100 (Fri, 11 Apr 2008) | 1 line fixed typo and formatting in class javadoc ........ r647567 | nick | 2008-04-13 14:16:36 +0100 (Sun, 13 Apr 2008) | 1 line Various new bits of documentation on embeded files and text extraction ........ r647574 | nick | 2008-04-13 15:58:27 +0100 (Sun, 13 Apr 2008) | 1 line Start on a eventusermodel based excel text extractor ........ r647576 | nick | 2008-04-13 16:09:42 +0100 (Sun, 13 Apr 2008) | 1 line Finish off eventusermodel based Excel Extractor, and update the xls to csv converter (moved to correct place) based on discoveries for the text extractor ........ r647577 | nick | 2008-04-13 16:13:17 +0100 (Sun, 13 Apr 2008) | 1 line Add information of EventBasedExcelExtractor to the documentation ........ git-svn-id: https://svn.apache.org/repos/asf/poi/branches/ooxml@647580 13f79535-47bb-0310-9956-ffa450edef68
2008-04-13 11:25:33 -04:00
</section>
</section>
<section><title>Listing POIFS contents</title>
<p>POIFS provides a simple tool for listing the contents of
OLE2 files. This can allow you to see what your POIFS file
contents, and hence if it has any embeded documents in it,
and where.</p>
<p>The tool to use is <em>org.apache.poi.poifs.dev.POIFSLister</em>.
This tool may be run from the command line, and takes a filename
as its parameter. It will print out all the directories and
files contained within the POIFS file.</p>
</section>
<section><title>Opening embeded files</title>
<p>All of the POIDocument classes (HSSFWorkbook, HSLFSlideShow,
HWPFDocument and HDGFDiagram) can either be opened from
a POIFSFileSystem, or from a specific directory within a
POIFSFileSystem. So, to open embeded files, simply locate the
appropriate DirectoryNode that represents the subdirectory
of interest, and pass this + the overall POIFSFileSystem to
the constructor.</p>
<p>I you want to extract the textual contents of the embeded file,
then open the appropriate POIDocument, and then pass this to
the extractor class, instead of simply passing the POIFSFilesystem
to the extractor.</p>
</section>
</body>
</document>