|
Navigation
HSSF
Contributer's Guide
|
The New Halloween Document
The New Halloween Document
|
How to use the HSSF prototype API |
Recent revision history |
- 12.30.2001 - revised for poi 1.0-final - minor revisions
- 01.03.2001 - revised for poi 1.1-devel
|
Capabilities |
This release of the how-to outlines functionality included in a
development build of HSSF. Those looking for information on the
release edition should look in the poi-src for the release or at a
previous edition in CVS tagged for that release.
This release allows numeric and string cell values to be written to
or read from an XLS file. Also in this release is row and column
sizing, cell styling (bold, italics, borders,etc), and support for
built-in data formats. New to this release is an event-based API
for reading XLS files. It differs greatly from the read/write API
and is intended for intermediate developers who need a smaller
memory footprint. It will also serve as the basis for the HSSF
Generator.
|
Target Audience |
This release is intended for developers, java-fanatics and the
just generally all around impatient. HSSF has not yet been
extensively tested in a high load multi-threaded situation. This
release is not considered to be "golden" as it has new
features that have not been extensively tested, and is an early 2.0
build that could be restructured significantly in the future (not
that there are necessarily plans to do so, just that you're better
off basing your code on 1.0 and sticking with it if you don't need
2.0 stuff bad enough to deal with us pulling the rug out from under
you regularly).
|
General Use |
User API |
Writing a new one |
The high level API (package: org.apache.poi.hssf.usermodel)
is what most people should use. Usage is very simple.
Workbooks are created by creating an instance of
org.apache.poi.hssf.usermodel.HSSFWorkbook.
Sheets are created by calling createSheet() from an existing
instance of HSSFWorkbook, the created sheet is automatically added in
sequence to the workbook. In this release there will always be at
least three sheets generated regardless of whether you have three
sheets. More than three sheets is probably not supported. Sheets do
not in themselves have a sheet name (the tab at the bottom); you set
the name associated with a sheet by calling
HSSFWorkbook.setSheetName(sheetindex,"SheetName").
Rows are created by calling createRow(rowNumber) from an existing
instance of HSSFSheet. Only rows that have cell values should be
added to the sheet. To set the row's height, you just call
setRowHeight(height) on the row object. The height must be given in
twips, or 1/20th of a point. If you prefer, there is also a
setRowHeightInPoints method.
Cells are created by calling createCell(column, type) from an
existing HSSFRow. Only cells that have values should be added to the
row. Cells should have their cell type set to either
HSSFCell.CELL_TYPE_NUMERIC or HSSFCell.CELL_TYPE_STRING depending on
whether they contain a numeric or textual value. Cells must also have
a value set. Set the value by calling setCellValue with either a
String or double as a parameter. Individual cells do not have a
width; you must call setColumnWidth(colindex, width) (use units of
1/256th of a character) on the HSSFSheet object. (You can't do it on
an individual basis in the GUI either).
Cells are styled with HSSFCellStyle objects which in turn contain
a reference to an HSSFFont object. These are created via the
HSSFWorkbook object by calling createCellStyle() and createFont().
Once you create the object you must set its parameters (colors,
borders, etc). To set a font for an HSSFCellStyle call
setFont(fontobj).
Once you have generated your workbook, you can write it out by
calling write(outputStream) from your instance of Workbook, passing
it an OutputStream (for instance, a FileOutputStream or
ServletOutputStream). You must close the OutputStream yourself. HSSF
does not close it for you.
Here is some example code (excerpted and adapted from
org.apache.poi.hssf.dev.HSSF test class):
// create a new file
FileOutputStream out = new FileOutputStream("/home/me/myfile.xls");
// create a new workbook
HSSFWorkbook wb = new HSSFWorkbook();
// create a new sheet
HSSFSheet s = wb.createSheet();
// declare a row object reference
HSSFRow r = null;
// declare a cell object reference
HSSFCell c = null;
// create 3 cell styles
HSSFCellStyle cs = wb.createCellStyle();
HSSFCellStyle cs2 = wb.createCellStyle();
HSSFCellStyle cs3 = wb.createCellStyle();
// create 2 fonts objects
HSSFFont f = wb.createFont();
HSSFFont f2 = wb.createFont();
//set font 1 to 12 point type
f.setFontHeightInPoints((short)12);
//make it red
f.setColor((short)0xA);
// make it bold
//arial is the default font
f.setBoldweight(f.BOLDWEIGHT_BOLD);
//set font 2 to 10 point type
f2.setFontHeightInPoints((short)10);
//make it the color at palette index 0xf (white)
f2.setColor((short)0xf);
//make it bold
f2.setBoldweight(f2.BOLDWEIGHT_BOLD);
//set cell stlye
cs.setFont(f);
//set the cell format see HSSFDataFromat for a full list
cs.setDataFormat(HSSFDataFormat.getFormat("($#,##0_);[Red]($#,##0)"));
//set a thin border
cs2.setBorderBottom(cs2.BORDER_THIN);
//fill w fg fill color
cs2.setFillPattern((short)1);
// set foreground fill to red
cs2.setFillForegroundColor((short)0xA);
// set the font
cs2.setFont(f2);
// set the sheet name to HSSF Test
wb.setSheetName(0,"HSSF Test");
// create a sheet with 300 rows (0-299)
for (rownum = (short)0; rownum < 300; rownum++)
{
// create a row
r = s.createRow(rownum);
// on every other row
if ( (rownum % 2) == 0) {
// make the row height bigger (in twips - 1/20 of a point)
r.setHeight((short)0x249);
}
//r.setRowNum(( short ) rownum);
// create 50 cells (0-49) (the += 2 becomes apparent later
for (short cellnum = (short)0; cellnum < 50; cellnum += 2)
{
// create a numeric cell
c = r.createCell(cellnum,HSSFCell.CELL_TYPE_NUMERIC);
// do some goofy math to demonstrate decimals
c.setCellValue(rownum * 10000 + cellnum
+ ((( double ) rownum / 1000)
+ (( double ) cellnum / 10000)));
// on every other row
if ( (rownum % 2) == 0) {
// set this cell to the first cell style we defined
c.setCellStyle(cs);
}
// create a string cell (see why += 2 in the
c = r.createCell((short)(cellnum+1),HSSFCell.CELL_TYPE_STRING);
// set the cell's string value to "TEST"
c.setCellValue("TEST");
// make this column a bit wider
s.setColumnWidth((short)(cellnum+1), (short)((50*8) / ((double)1/20)) );
// on every other row
if ( (rownum % 2) == 0) {
// set this to the white on red cell style
// we defined above
c.setCellStyle(cs2);
}
}
}
//draw a thick black border on the row at the bottom using BLANKS
// advance 2 rows
rownum++;
rownum++;
r = s.createRow(rownum);
// define the third style to be the default
// except with a thick black border at the bottom
cs3.setBorderBottom(cs3.BORDER_THICK);
//create 50 cells
for (short cellnum = (short)0; cellnum < 50; cellnum++) {
//create a blank type cell (no value)
c = r.createCell(cellnum,HSSFCell.CELL_TYPE_BLANK);
// set it to the thick black border style
c.setCellStyle(cs3);
}
//end draw thick black border
// demonstrate adding/naming and deleting a sheet
// create a sheet, set its title then delete it
s = wb.createSheet();
wb.setSheetName(1,"DeletedSheet");
wb.removeSheetAt(1);
//end deleted sheet
// write the workbook to the output stream
// close our file (don't blow out our file handles
wb.write(out);
out.close();
|
|
Reading or modifying an existing file |
Reading in a file is equally simple. To read in a file, create a
new instance of org.apache.poi.poifs.Filesystem, passing in an open InputStream, such as a FileInputStream
for your XLS, to the constructor. Construct a new instance of
org.apache.poi.hssf.usermodel.HSSFWorkbook passing the
Filesystem instance to the constructor. From there you have access to
all of the high level model objects through their assessor methods
(workbook.getSheet(sheetNum), sheet.getRow(rownum), etc).
Modifying the file you have read in is simple. You retrieve the
object via an assessor method, remove it via a parent object's remove
method (sheet.removeRow(hssfrow)) and create objects just as you
would if creating a new xls. When you are done modifying cells just
call workbook.write(outputstream) just as you did above.
An example of this can be seen in
org.apache.poi.hssf.dev.HSSF.
|
|
Event API |
The event API is brand new. It is intended for intermediate
developers who are willing to learn a little bit of the low level API
structures. Its relatively simple to use, but requires a basic
understanding of the parts of an Excel file (or willingness to
learn). The advantage provided is that you can read an XLS with a
relatively small memory footprint.
To use this API you construct an instance of
org.apache.poi.hssf.eventmodel.HSSFRequest. Register a class you
create that supports the
org.apache.poi.hssf.eventmodel.HSSFListener interface using the
HSSFRequest.addListener(yourlistener, recordsid). The recordsid
should be a static reference number (such as BOFRecord.sid) contained
in the classes in org.apache.poi.hssf.record. The trick is you
have to know what these records are. Alternatively you can call
HSSFRequest.addListenerForAllRecords(mylistener). In order to learn
about these records you can either read all of the javadoc in the
org.apache.poi.hssf.record package or you can just hack up a
copy of org.apache.poi.hssf.dev.EFHSSF and adapt it to your
needs. TODO: better documentation on records.
Once you've registered your listeners in the HSSFRequest object
you can construct an instance of
org.apache.poi.poifs.filesystem.FileSystem (see POIFS howto) and
pass it your XLS file inputstream. You can either pass this, along
with the request you constructed, to an instance of HSSFEventFactory
via the HSSFEventFactory.processWorkbookEvents(request, Filesystem)
method, or you can get an instance of DocumentInputStream from
Filesystem.createDocumentInputStream("Workbook") and pass
it to HSSFEventFactory.processEvents(request, inputStream). Once you
make this call, the listeners that you constructed receive calls to
their processRecord(Record) methods with each Record they are
registered to listen for until the file has been completely read.
A code excerpt from org.apache.poi.hssf.dev.EFHSSF (which is
in CVS or the source distribution) is reprinted below with excessive
comments:
//this non-public class implements the required interface
// we construct it with a copy of its container class...this is cheap but effective
class EFHSSFListener implements HSSFListener {
EFHSSF efhssf;
public EFHSSFListener(EFHSSF efhssf) {
this.efhssf = efhssf;
}
// we just use this as an adapter so we pass the record to the method in the container class
public void processRecord(Record record) {
efhssf.recordHandler(record);
}
}
//here is an excerpt of the main line execution code from EFHSSF
public void run() throws IOException {
// create a new file input stream with the input file specified
// at the command line
FileInputStream fin = new FileInputStream(infile);
// create a new org.apache.poi.poifs.filesystem.Filesystem
Filesystem poifs = new Filesystem(fin);
// get the Workbook (excel part) stream in a InputStream
InputStream din = poifs.createDocumentInputStream("Workbook");
// construct out HSSFRequest object
HSSFRequest req = new HSSFRequest();
// lazy listen for ALL records with the listener shown above
req.addListenerForAllRecords(new EFHSSFListener(this));
// create our event factory
HSSFEventFactory factory = new HSSFEventFactory();
// process our events based on the document input stream
factory.processEvents(req,din);
// once all the events are processed close our file input stream
fin.close();
// and our document input stream (don't want to leak these!)
din.close();
// create a new output stream from filename specified at the command line
FileOutputStream fout = new FileOutputStream(outfile);
// write the HSSFWorkbook (class member) we created out to the file.
workbook.write(fout);
// close our file output stream
fout.close();
// print done. Go enjoy your copy of the file.
System.out.println("done.");
}
//here is an excerpt of the recordHander called from our listener.
// the record handler in the container class is intent on just rewriting the file
public void recordHandler(Record record) {
HSSFRow row = null;
HSSFCell cell = null;
int sheetnum = -1;
switch (record.getSid()) {
// the BOFRecord can represent either the beginning of a sheet or the workbook
case BOFRecord.sid:
BOFRecord bof = (BOFRecord) record;
if (bof.getType() == bof.TYPE_WORKBOOK) {
//if its the workbook then create a new HSSFWorkbook
workbook = new HSSFWorkbook();
// assigned to the class level member
} else if (bof.getType() == bof.TYPE_WORKSHEET) {
sheetnum++;
// otherwise if its a sheet increment the sheetnum index
cursheet = workbook.getSheetAt(sheetnum);
}
break;
// get the sheet at that index and assign it to method variable
// cursheet (the sheet was created when the BoundSheetRecord record occurred
case BoundSheetRecord.sid:
// when we find a boundsheet record create a new sheet in the workbook and
BoundSheetRecord bsr = (BoundSheetRecord) record;
// assign it the name specified in this record.
workbook.createSheet(bsr.getSheetname());
break;
// if this is a row record add the row to the current sheet
case RowRecord.sid:
RowRecord rowrec = (RowRecord) record;
// assign our row the rownumber specified in the Row Record
cursheet.createRow(rowrec.getRowNumber());
break;
// if this is a NumberRecord (RKRecord, MulRKRecord get converted to Number
// records) then get the row specified in the number record from the current
// sheet. With this instance of HSSFRow create a new HSSFCell with the column
// number specified in the record and assign it type NUMERIC
case NumberRecord.sid:
NumberRecord numrec = (NumberRecord) record;
row = cursheet.getRow(numrec.getRow());
cell = row.createCell(numrec.getColumn(),HSSFCell.CELL_TYPE_NUMERIC);
// set the HSSFCell's value to the value stored in the NumberRecord
cell.setCellValue(numrec.getValue());
break;
// if this is the SSTRecord (occurs once in the workbook) then add all of its
// strings to our workbook. We'll look them up later when we add LABELSST records.
case SSTRecord.sid:
SSTRecord sstrec = (SSTRecord) record;
for (int k = 0; k < sstrec.getNumUniqueStrings(); k++) {
workbook.addSSTString(sstrec.getString(k));
}
break;
// if this is a LabelSSTRecord then get the row specified in the LabelSSTRecord from
// the current sheet. With this instance of HSSFRow create a new HSSFCell with the
// column nubmer specified in the record and set the type to type STRING.
case LabelSSTRecord.sid:
LabelSSTRecord lrec = (LabelSSTRecord) record;
row = cursheet.getRow(lrec.getRow());
cell = row.createCell(lrec.getColumn(),HSSFCell.CELL_TYPE_STRING);
//set the cells value to the string in our workbook object (added in the case
//above) at the index specified by the LabelSSTRecord.
cell.setCellValue(workbook.getSSTString(lrec.getSSTIndex()));
break;
}
}
|
|
Low Level APIs |
The low level API is not much to look at. It consists of lots of
"Records" in the org.apache.poi.hssf.record.* package,
and set of helper classes in org.apache.poi.hssf.model.*. The
record classes are consistent with the low level binary structures
inside a BIFF8 file (which is embedded in a POIFS file system). You
probably need the book: "Microsoft Excel 97 Developer's Kit"
from Microsoft Press in order to understand how these fit together
(out of print but easily obtainable from Amazon's used books). In
order to gain a good understanding of how to use the low level APIs
should view the source in org.apache.poi.hssf.usermodel.* and
the classes in org.apache.poi.hssf.model.*. You should read the
documentation for the POIFS libraries as well.
|
HSSF Class/Test Application |
The HSSF application is nothing more than a test for the high
level API (and indirectly the low level support). The main body of
its code is repeated above. To run it:
- download the poi-alpha build and untar it (tar xvzf
tarball.tar.gz)
- set up your classpath as follows:
export HSSFDIR={wherever you put HSSF's jar files}
export LOG4JDIR={wherever you put LOG4J's jar files}
export CLASSPATH=$CLASSPATH:$HSSFDIR/hssf.jar:$HSSFDIR/poi-poifs.jar:$HSSFDIR/poi-util.jar:$LOG4JDIR/jog4j.jar
- type:
java org.apache.poi.hssf.dev.HSSF ~/myxls.xls write
This should generate a test sheet in your home directory called "myxls.xls" .
- Type:
java org.apache.poi.hssf.dev.HSSF ~/input.xls output.xls
This is the read/write/modify test. It reads in the spreadsheet, modifies a cell, and writes it back out.
Failing this test is not necessarily a bad thing. If HSSF tries to modify a non-existant sheet then this will
most likely fail. No big deal.
|
HSSF Logging facility |
HSSF now has a logging facility (using log4j - thanks jakarta!)
that will record massive amounts of debugging information. Its mostly
useful to us hssf-developing geeks, but might be useful in tracking
down problems. By default we turn this off because it results in
unnecessary performance degradation when fully turned on! Using it is
simple. You need an hssflog.properties file (example listed below,
those familiar with log4j can customize this as they wish). You can
either put this in your home directory (or wherever the default
directory is on windows which I suspect is c:\windows) or you can put
it wherever you want and set the HSSF.log to the path ending in "/"
(or "\\" on windows). If for any reason HSSF can't find it,
you get no logging. If the log configuration dictates the logging be
turned off, you get no logging.
Here is an example hssflog.properties (actually its not an example
its mine):
# Set root category priority to DEBUG and its only appender to A1.
log4j.rootCategory=DEBUG, A1
# A1 is set to be a ConsoleAppender.
log4j.appender.A1=org.apache.log4j.ConsoleAppender
# A1 uses PatternLayout.
log4j.appender.A1.layout=org.apache.log4j.PatternLayout
log4j.appender.A1.layout.ConversionPattern=%-4r [%t] %-5p %c %x - %m%n
#uncomment below to change the level to WARN to disable debugging information. This effectively turns off logging.
#the default level is DEBUG (and changing it to DEBUG is the basically the same thing as leaving it commented out).
#log4j.category.org.apache.poi=WARN
|
|
HSSF Developer's tools |
HSSF has a number of tools useful for developers to debug/develop
stuff using HSSF (and more generally XLS files). We've already
discussed the app for testing HSSF read/write/modify capabilities;
now we'll talk a bit about BiffViewer. Early on in the development of
HSSF, it was decided that knowing what was in a record, what was
wrong with it, etc. was virtually impossible with the available
tools. So we developed BiffViewer. You can find it at
org.apache.poi.hssf.dev.BiffViewer. It performs two basic
functions and a derivative.
The first is "biffview". To do this you run it (assumes
you have everything setup in your classpath and that you know what
you're doing enough to be thinking about this) with an xls file as a
parameter. It will give you a listing of all understood records with
their data and a list of not-yet-understood records with no data
(because it doesn't know how to interpret them). This listing is
useful for several things. First, you can look at the values and SEE
what is wrong in quasi-English. Second, you can send the output to a
file and compare it.
The second function is "big freakin dump", just pass a
file and a second argument matching "bfd" exactly. This
will just make a big hexdump of the file.
Lastly, there is "mixed" mode which does the same as
regular biffview, only it includes hex dumps of certain records
intertwined. To use that just pass a file with a second argument
matching "on" exactly.
In the next release cycle we'll also have something called a
FormulaViewer. The class is already there, but its not very useful
yet. When it does something, I'll document it.
|
What's Next? |
This release contains code that supports "internationalization"
or more accurately non-US/UK languages; however, it has not been
tested with the new API changes (please help us with this). We've
shifted focus a bit for this release in recognition of the
international support we've gotten. We're going to focus on western
European languages for our first beta. We're more than happy to
accept help in supporting non-Western European languages if someone
who knows what they're doing in this area is willing to pitch in!
(There is next to no documentation on what is necessary to support
such a move and its really hard to support a language when you don't even
know the alphabet).
This release of HSSF does not yet support Formulas. I've been
focusing on the requests I've gotten in. That being said, if we get
more user feedback on what is most useful first we'll aim for that.
As a general principal, HSSF's goal is to support HSSF-Serializer
(meaning an emphasis on write). We would like to hear from you! How
are you using HSSF/POIFS? How would you like to use it? What features
are most important first?
This release is near feature freeze for the 1.0-beta. All
priorities refer to things we'll be adding in the next release
(probably 2.0). The 1.0-beta is scheduled for release in the mid to
late December timeframe. While it's way to early to say when the
2.0-beta will be released, my "gut" feeling is to aim for
around March and have at least the first three items.
Current list of priorities:
- Helper class for fonts, etc.
- Add Formulas.
- Implement more record types (for other things ... not sure
what this will mean yet).
- Add more dummy checks (for when API user's do things they
"can't" do)
- Add support for embedded graphics and stuff like that.
- Create new adapter object for handling MulBlank, MulRk, Rk
records.
|
Changes |
1.1.0 |
- Created new event model
- Optimizations made to HSSF including aggregate records for
values, rows, etc.
- predictive sizing, offset based writing (instead of lots of
array copies)
- minor re-factoring and bug fixes.
|
1.0.0 |
- Minor documentation updates.
|
0.14.0 |
- Added DataFormat helper class and exposed set and get format
on HSSFCellStyle
- Fixed column width apis (unit wise) and various javadoc on
the subject
- Fix for Dimensions record (again)... (one of these days I'll
write a unit test for this ;-p).
- Some optimization on sheet creation.
|
0.12.0 |
- Added MulBlank, Blank, ColInfo
- Added log4j facility and removed all sys.out type logging
- Added support for adding font's, styles and corresponding
high level api for styling cells
- added support for changing row height, cell width and default
row height/cell width.
- Added fixes for internationalization (UTF-16 should work now
from HSSFCell.setStringValue, etc when the encoding is set)
- added support for adding/removing and naming sheets.
|
0.11.0 |
- Bugfix release. We were throwing an exception when reading
RKRecord objects.
|
0.10.0 |
- Got continuation records to work (read/write)
- Added various pre-support for formulas
- Massive API reorganization, repackaging.
- BiffViewer class added for validating HSSF & POI and/or
HSSF Output.
- Better API support for modification.
|
0.7 (and interim releases) |
- Added encoding flag to high and low level api to use utf-16
when needed (HSSFCell.setEncoding())
- added read only support for Label records (which are
reinterpreted as LabelSST when written)
- Broken continuation record implementation (oops)
- BiffViewer class added for validating HSSF & POI and/or
HSSF Output.
|
0.6 (release) |
- Support for read/write and modify.
- Read only support for MulRK records (converted to Number when
writing)
|
|
|
|
|