From 59704a65192f5c0d6eb3511ac77da2fc5001789c Mon Sep 17 00:00:00 2001
From: "Andrew C. Oliver"
+ Maybe it's unwise to advertise your competitors but we believe
+ competition is good and we have the best support reading and
+ write Excel workbooks currently available.
+
+ This section is intended for diagrams (UML/etc) that help
+ explain HSSF.
+
+ Have more? Add a new "bug" to the bug database with [DOCUMENTATION]
+ prefacing the description and a link to the file on an http server
+ somewhere. If you don't have your own webserver, then you can email it
+ to (acoliver at apache dot org) provided its < 5MB. Diagrams should be
+ in some format that can be read at least on Linux and Windows. Diagrams
+ that can be edited are preferrable, but lets face it, there aren't too
+ many good affordable UML tools yet! And no they don't HAVE to be UML...
+ just useful.
+
+ This document describes the current state of formula support in POI.
+ The information in this document applies to the 2.0-dev version of POI (i.e. CVS HEAD).
+ Since this area is a work in progress, this document will be updated with new features as and
+ when they are added.
+
+ In org.apache.poi.hssf.usermodel.HSSFCell
+ setCellFormula("formulaString") is used to add a formula to sheet and
+ getCellFormula() is used to retrieve the string representation of a formula.
+
+ We aim to support the complete excel grammer for formulas. Thus, the string that you pass in
+ to the setCellFormula call should be what you expect to type into excel. Also, note
+ that you should NOT add a "=" to the front of the string.
+
+ Formulas in Excel are stored as sequences of tokens in Reverse Polish Notation order. The
+ open office XLS spec is the best
+ documentation you will find for the format.
+
+ The tokens used by excel are modelled as individual *Ptg classes in the
+ org.apache.poi.hssf.record.formula package.
+
+ The task of parsing a formula string into an array of RPN ordered tokens is done by the
+ org.apache.poi.hssf.record.formula.FormulaParser class. This class implements a hand
+ written recursive descent parser.
+ Check out the javadocs for details.
+
+ You might find the
+ 'Excel 97 Developer's Kit' (out of print, Microsoft Press, no
+ restrictive covenants, available on Amazon.com) helpful for
+ understanding the file format.
+
+ Also useful is the open office XLS spec. We
+ are collaborating with the maintainer of the spec so if you think you can add something to their
+ document just send through your changes.
+
+ Low level records can be time consuming to created. We created a record
+ generator to help generate some of the simpler tasks.
+
+ We use XML
+ descriptors to generate the Java code (which sure beats the heck out of
+ the PERL scripts originally used ;-) for low level records. The
+ generator is kinda alpha-ish right now and could use some enhancement,
+ so you may find that to be about 1/2 of the work. Notice this is in
+ org.apache.poi.hssf.record.definitions.
+ One thing to note: If you are making a large code contribution we need to ensure
+ any participants in this process have never
+ signed a "Non Disclosure Agreement" with Microsoft, and have not
+ received any information covered by such an agreement. If they have
+ they'll not be able to participate in the POI project. For large contributions we
+ may ask you to sign an agreement. Check our todo list or simply look for missing functionality. Start small
+ and work your way up. Make sure you read the contributing section
+ as it contains more generation information about contributing to Poi in general. This release of the how-to outlines functionality for the CVS HEAD.
+ Those looking for information on previous releases should
+ look in the documentation distributed with that release.
+ This release allows numeric and string cell values to be written to
+ or read from an XLS file as well as reading and writing dates. Also
+ in this release is row and column sizing, cell styling (bold,
+ italics, borders,etc), and support for built-in data formats. New
+ to this release is an event-based API for reading XLS files.
+ It differs greatly from the read/write API
+ and is intended for intermediate developers who need a smaller
+ memory footprint. It will also serve as the basis for the HSSF
+ Generator. The high level API (package: org.apache.poi.hssf.usermodel)
+ is what most people should use. Usage is very simple.
+ Workbooks are created by creating an instance of
+ org.apache.poi.hssf.usermodel.HSSFWorkbook.
+ Sheets are created by calling createSheet() from an existing
+ instance of HSSFWorkbook, the created sheet is automatically added in
+ sequence to the workbook. Sheets do not in themselves have a sheet
+ name (the tab at the bottom); you set
+ the name associated with a sheet by calling
+ HSSFWorkbook.setSheetName(sheetindex,"SheetName",encoding).
+ The name may be in 8bit format (HSSFWorkbook.ENCODING_COMPRESSED_UNICODE)
+ or Unicode (HSSFWorkbook.ENCODING_UTF_16). Default encoding is 8bit per char.
+ Rows are created by calling createRow(rowNumber) from an existing
+ instance of HSSFSheet. Only rows that have cell values should be
+ added to the sheet. To set the row's height, you just call
+ setRowHeight(height) on the row object. The height must be given in
+ twips, or 1/20th of a point. If you prefer, there is also a
+ setRowHeightInPoints method.
+ Cells are created by calling createCell(column, type) from an
+ existing HSSFRow. Only cells that have values should be added to the
+ row. Cells should have their cell type set to either
+ HSSFCell.CELL_TYPE_NUMERIC or HSSFCell.CELL_TYPE_STRING depending on
+ whether they contain a numeric or textual value. Cells must also have
+ a value set. Set the value by calling setCellValue with either a
+ String or double as a parameter. Individual cells do not have a
+ width; you must call setColumnWidth(colindex, width) (use units of
+ 1/256th of a character) on the HSSFSheet object. (You can't do it on
+ an individual basis in the GUI either). Cells are styled with HSSFCellStyle objects which in turn contain
+ a reference to an HSSFFont object. These are created via the
+ HSSFWorkbook object by calling createCellStyle() and createFont().
+ Once you create the object you must set its parameters (colors,
+ borders, etc). To set a font for an HSSFCellStyle call
+ setFont(fontobj).
+ Once you have generated your workbook, you can write it out by
+ calling write(outputStream) from your instance of Workbook, passing
+ it an OutputStream (for instance, a FileOutputStream or
+ ServletOutputStream). You must close the OutputStream yourself. HSSF
+ does not close it for you.
+ Here is some example code (excerpted and adapted from
+ org.apache.poi.hssf.dev.HSSF test class): Reading in a file is equally simple. To read in a file, create a
+new instance of org.apache.poi.poifs.Filesystem, passing in an open InputStream, such as a FileInputStream
+for your XLS, to the constructor. Construct a new instance of
+org.apache.poi.hssf.usermodel.HSSFWorkbook passing the
+Filesystem instance to the constructor. From there you have access to
+all of the high level model objects through their assessor methods
+(workbook.getSheet(sheetNum), sheet.getRow(rownum), etc).
+ Modifying the file you have read in is simple. You retrieve the
+object via an assessor method, remove it via a parent object's remove
+method (sheet.removeRow(hssfrow)) and create objects just as you
+would if creating a new xls. When you are done modifying cells just
+call workbook.write(outputstream) just as you did above. An example of this can be seen in
+org.apache.poi.hssf.dev.HSSF. The event API is brand new. It is intended for intermediate
+ developers who are willing to learn a little bit of the low level API
+ structures. Its relatively simple to use, but requires a basic
+ understanding of the parts of an Excel file (or willingness to
+ learn). The advantage provided is that you can read an XLS with a
+ relatively small memory footprint.
+ To use this API you construct an instance of
+ org.apache.poi.hssf.eventmodel.HSSFRequest. Register a class you
+ create that supports the
+ org.apache.poi.hssf.eventmodel.HSSFListener interface using the
+ HSSFRequest.addListener(yourlistener, recordsid). The recordsid
+ should be a static reference number (such as BOFRecord.sid) contained
+ in the classes in org.apache.poi.hssf.record. The trick is you
+ have to know what these records are. Alternatively you can call
+ HSSFRequest.addListenerForAllRecords(mylistener). In order to learn
+ about these records you can either read all of the javadoc in the
+ org.apache.poi.hssf.record package or you can just hack up a
+ copy of org.apache.poi.hssf.dev.EFHSSF and adapt it to your
+ needs. TODO: better documentation on records. Once you've registered your listeners in the HSSFRequest object
+ you can construct an instance of
+ org.apache.poi.poifs.filesystem.FileSystem (see POIFS howto) and
+ pass it your XLS file inputstream. You can either pass this, along
+ with the request you constructed, to an instance of HSSFEventFactory
+ via the HSSFEventFactory.processWorkbookEvents(request, Filesystem)
+ method, or you can get an instance of DocumentInputStream from
+ Filesystem.createDocumentInputStream("Workbook") and pass
+ it to HSSFEventFactory.processEvents(request, inputStream). Once you
+ make this call, the listeners that you constructed receive calls to
+ their processRecord(Record) methods with each Record they are
+ registered to listen for until the file has been completely read.
+ A code excerpt from org.apache.poi.hssf.dev.EFHSSF (which is
+ in CVS or the source distribution) is reprinted below with excessive
+ comments: The low level API is not much to look at. It consists of lots of
+"Records" in the org.apache.poi.hssf.record.* package,
+and set of helper classes in org.apache.poi.hssf.model.*. The
+record classes are consistent with the low level binary structures
+inside a BIFF8 file (which is embedded in a POIFS file system). You
+probably need the book: "Microsoft Excel 97 Developer's Kit"
+from Microsoft Press in order to understand how these fit together
+(out of print but easily obtainable from Amazon's used books). In
+order to gain a good understanding of how to use the low level APIs
+should view the source in org.apache.poi.hssf.usermodel.* and
+the classes in org.apache.poi.hssf.model.*. You should read the
+documentation for the POIFS libraries as well. The HSSF application is nothing more than a test for the high
+level API (and indirectly the low level support). The main body of
+its code is repeated above. To run it:
+ This should generate a test sheet in your home directory called Poi can dynamically select it's logging implementation. Poi trys to
+ create a logger using the System property named "org.apache.poi.util.POILogger".
+ Out of the box this can be set to one of three values:
+
+ If the property is not defined or points to an invalid classthen the NullLogger is used.
+
+ Refer to the commons logging package level javadoc for more information concerning how to
+ configure commons logging.
+ HSSF has a number of tools useful for developers to debug/develop
+stuff using HSSF (and more generally XLS files). We've already
+discussed the app for testing HSSF read/write/modify capabilities;
+now we'll talk a bit about BiffViewer. Early on in the development of
+HSSF, it was decided that knowing what was in a record, what was
+wrong with it, etc. was virtually impossible with the available
+tools. So we developed BiffViewer. You can find it at
+org.apache.poi.hssf.dev.BiffViewer. It performs two basic
+functions and a derivative.
+ The first is "biffview". To do this you run it (assumes
+you have everything setup in your classpath and that you know what
+you're doing enough to be thinking about this) with an xls file as a
+parameter. It will give you a listing of all understood records with
+their data and a list of not-yet-understood records with no data
+(because it doesn't know how to interpret them). This listing is
+useful for several things. First, you can look at the values and SEE
+what is wrong in quasi-English. Second, you can send the output to a
+file and compare it.
+ The second function is "big freakin dump", just pass a
+file and a second argument matching "bfd" exactly. This
+will just make a big hexdump of the file.
+ Lastly, there is "mixed" mode which does the same as
+regular biffview, only it includes hex dumps of certain records
+intertwined. To use that just pass a file with a second argument
+matching "on" exactly. In the next release cycle we'll also have something called a
+FormulaViewer. The class is already there, but its not very useful
+yet. When it does something, we'll document it. This release contains code that supports "internationalization"
+or more accurately non-US/UK languages; however, it has not been
+tested with the new API changes (please help us with this). We've
+shifted focus a bit for this release in recognition of the
+international support we've gotten. We're going to focus on western
+European languages for our first beta. We're more than happy to
+accept help in supporting non-Western European languages if someone
+who knows what they're doing in this area is willing to pitch in!
+(There is next to no documentation on what is necessary to support
+such a move and its really hard to support a language when you don't even
+know the alphabet). This release of HSSF does not yet support Formulas. I've been
+focusing on the requests I've gotten in. That being said, if we get
+more user feedback on what is most useful first we'll aim for that.
+As a general principal, HSSF's goal is to support HSSF-Serializer
+(meaning an emphasis on write). We would like to hear from you! How
+are you using HSSF/POIFS? How would you like to use it? What features
+are most important first?
+ HSSF is the POI Project's pure Java implementation of the Excel '97(-2002) file format. HSSF provides a way to read spreadsheets create, modify, read and write XLS spreadsheets
+ It provides:
+
+ Truth be told there is probably a better way to generate your spreadsheet
+ generation (yet you'll still be using HSSF indirectly). At the time of
+ this writing we're in the process of moving the HSSF Serializer over to
+ the Apache Cocoon
+ Project. With Cocoon you can serialize any XML datasource (of
+ which might be a ESQL page outputting in SQL for instance) by simply
+ applying the stylesheet and designating the serializer.
+
+ If you're merely reading spreadsheet data, then use the eventmodel api
+ in the org.apache.poi.hssf.eventmodel package.
+
+ If you're modifying spreadsheet data then use the usermodel api. You
+ can also generate spreadsheets this way, but using Cocoon (which will do
+ it this way indirectly) is the best way...we promise.
+
+ The intent of this document is to outline some of the known limitations of the
+ POI HSSF API's. It is not intended to be complete list of every bug or missing
+ feature of HSSF, rather it's purpose is to provide a broad feel for some of the
+ functionality that is missing or broken.
+
+ Want to use HSSF read and write spreadsheets in a hurry? This guide is for you. If you're after
+ more in-depth coverage of the HSSF user-API please consult the HOWTO
+ guide as it contains actual descriptions of how to use this stuff.
+
+ The record generator was born from frustration with translating
+ the Excel records to Java classes. Doing this manually is a time
+ consuming process. It's also very easy to make mistakes.
+
+ A utility was needed to take the defintition of what a
+ record looked like and do all the boring stuff. Thus the
+ record generator was born.
+
+ The record generator takes XML as input and produced the following
+ output:
+
+
+
+
+ Product
+ URL
+ Description
+
+
+ Formula One
+
+ www.tidestone.com
+
+ An alternative to this project is to
+ buy the $10,000 Formula 1 library
+ and accept its crude api and limitations.
+
+
+ Visual Basic
+
+ www.microsoft.com
+
+ Give up XML and write Visual Basic code on a Microsoft Windows based
+ Environment or output in Microsoft's beta and primarily undocumented
+ XML for office format.
+
+
+ JExcel
+ http://stareyes.homeip.net:8888
+ Frequently unavailable. Little currently known about it's capabilities.
+
+
+ JWorkbook
+ http://www.object-refinery.com/jworkbook/index.html
+ This effort supports Gnumeric and Excel, however the Excel part is done using POI anyway.
+
+
+ xlReader
+ http://www.sourceforge.net/projects/xlrd
+ Provides decent support for reading Excel.
+
+
+ Excel ODBC Driver
+ http://www.nwlink.com/~leewal/content/exceljavasample.htm
+ ODBC offers a somewhat wierd method for using Excel.
+
+
+ ExtenXLS
+ http://www.extentech.com/products/ExtenXLS/docs/intro3.jsp
+ Commercial library for reading, modifying and writing Excel spreadsheets. Not cheap but
+ certainly a lot more affordable than Formula 1. No idea as to it's quality.
+
+
+ J-Integra Java-Excel Bridge
+ http://www.intrinsyc.com/products/bridging/jintegra.asp
+ Uses DCOM to an Excel instance on a windows machine.
+
+
+ Perl & C
+ -
+ There are a number of perl and C libraries, however none of them are consistent.
+
+
+ VistaJDBC
+ http://www.vistaportal.com/products/vistajdbc.htm
+ VistaJDBC driver works with both StarOffice and Excel spreadsheets and
+ can access data using standard SQL statements without any API programming.
+ VistaJDBC also implemented ability to choose by not just rows and columns but by
+ specific cells, ranges of cells, etc.
+
+
+
+ Coldtags Excel Tag Library
+ http://www.servletsuite.com/servlets/exceltag.htm
+
+ This library outputs a simple CSV file, in which cells can
+ contain numbers or text. You could output a CSV file without its
+ help, but it gives a little more readability/structure to the code, and
+ could be extended to handle more complexity. When
+ you invoke one of these JSP pages from your browser, you open up an Excel
+ spreadsheet. There's no formatting, worksheets, or anything fancy like that.
+ So it's not strictly a competitor but it does the job.
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+export HSSFDIR={wherever you put HSSF's jar files}
+export LOG4JDIR={wherever you put LOG4J's jar files}
+export CLASSPATH=$CLASSPATH:$HSSFDIR/hssf.jar:$HSSFDIR/poi-poifs.jar:$HSSFDIR/poi-util.jar:$LOG4JDIR/jog4j.jar
+ java org.apache.poi.hssf.dev.HSSF ~/myxls.xls write
"myxls.xls"
.
+
+ java org.apache.poi.hssf.dev.HSSF ~/input.xls output.xls
+
+
+This is the read/write/modify test. It reads in the spreadsheet, modifies a cell, and writes it back out.
+Failing this test is not necessarily a bad thing. If HSSF tries to modify a non-existant sheet then this will
+most likely fail. No big deal.
+
+
+
+
+
+
+ You can not currently create charts. This is planned for the 2.0 release. You can
+ however create a chart in Excel, modify the chart data values using HSSF and write
+ a new spreadsheet out. This is possible because POI attempts to keep existing records
+ intact as far as possible.
+
+ HSSF does not support rich text cells. Rich text cells are
+ cells that have multiple fonts and styles in the once cell. Any attempt to read
+ a spreadsheet that has rich text cells will throw an exception. This feature may
+ be supported in the future but it is not currently planned. Patches are welcome.
+
+ It is not yet possible to create outlines. Reading a spreadsheet with outlines
+ may work correctly but has not been tested. Write support for outlines may
+ be added in the future but it is not currently planned. Patches are welcome.
+
+ Macros can not be created. The are currently no plans to support macros. Reading
+ workbooks containing macros is supported but attempting to write those workbooks
+ will fail. This is because macros are stored as extra file sytems within the
+ compound document, and these are not currently kept when the file is rewritten.
+
+ Generating pivot tables is not supported. Reading spreadsheets containing pivot tables
+ has not been tested.
+
+
+
+
+
+ The record generator is invoked as an Ant target (generate-records). It goes + through looking for all files in src/records/defintitions ending with _record.xml. + It then creates two files; the Java record definition and the Java test case template. +
++ The records themselves have the following general layout: +
+ ++ Currently the type can be of type int, float or string. The 'int' + type covers bytes, shorts and integers which is selected using a + size of 1, 2 or 4. An additional type called varword is used to + represent a array of word values where the first short is the length + of the array. The string type generation is only partially + implemented. If choosing string you must select a size of 'var'. +
++ The Java records are regenerated each time the record generator is + run, however the test stubs are only created if the test stub does + not already exist. What this means is that you may change test + stubs but not the generated records. +
++ The record generation works by taking an XML file and styling it + using XLST. Given that XSLT is a little limited in some ways it was + necessary to add a little Java code to the mix. +
++ See record.xsl, record_test.xsl, FieldIterator.java, + RecordUtil.java, RecordGenerator.java +
++ The record generator does not handle all possible record types and + is not ment to. Sometimes it's going to make more sense to generate + the records manually. The main point of this thing is to make the + easy stuff simple. +
++ Currently the record generator is optimized to create Excel records. + It could be adapted to create Word records with a little poking + around. +
++ Currently the the XSL file that generates the record calls out to + Java objects. This would have been better done as Javascript inside + the XSL file itself. The Java code for the record generation is + currently quite messy with minimal comments. +
+Primary Actor: HSSF client
+Scope: HSSF
+Level: Summary
+Stakeholders and Interests:
+Precondition: None
+Minimal Guarantee: None
+Main Success Guarantee:
+Extensions:
+2a. Exceptions +thrown by POIFS will be passed on to the HSSF client.
+Primary Actor: HSSF client
+Scope: HSSF
+Level: Summary
+Stakeholders and Interests:
+Precondition:
+Minimal Guarantee: None
+Main Success Guarantee:
+Extensions:
+3a. Exceptions +from POIFS are passed to the HSSF client.
+ +Primary Actor: HSSF client
+Scope: HSSF
++Level: Summary
+Stakeholders and Interests:
+Precondition:
+Minimal Guarantee: None
+Main Success Guarantee:
+Extensions: +None
+ +Primary Actor: HSSF
+Scope: HSSF
++Level: Summary
+Stakeholders and Interests:
+Precondition:
+Minimal +Guarantee: None
+Main Success Guarantee:
+Extensions:
+3a. Exceptions +thrown by POIFS will be passed on
+Primary Actor: HSSF
+Scope: HSSF
++Level: Summary
+Stakeholders and Interests:
+Precondition: +
+Minimal Guarantee: None
+Main Success Guarantee:
+Extensions:None
+