This document is for developers wishing to contribute to the + FormulaEvaluator API functionality.
+Currently, contribution is desired for implementing the standard MS + excel functions. Place holder classes for these have been created, + contributors only need to insert implementation for the + individual "evaluate()" methods that do the actual evaluation.
+Briefly, a formula string (along with the sheet and workbook that + form the context in which the formula is evaluated) is first parsed + into RPN tokens using the FormulaParser class in POI-HSSF main. + (If you dont know what RPN tokens are, now is a good time to + read + this.) +
+RPN tokens are mapped to Eval classes. (Class hierarchy for the Evals + is best understood if you view the class diagram in a class diagram + viewer.) Depending on the type of RPN token (also called as Ptgs + henceforth since that is what the FormulaParser calls the classes) a + specific type of Eval wrapper is constructed to wrap the RPN token and + is pushed on the stack.... UNLESS the Ptg is an OperationPtg. If it is an + OperationPtg, an OperationEval instance is created for the specific + type of OperationPtg. And depending on how many operands it takes, + that many Evals are popped of the stack and passed in an array to + the OperationEval instance's evaluate method which returns an Eval + of subtype ValueEval.Thus an operation in the formula is evaluated.
+OperationEval.evaluate(Eval[])
returns an Eval which is supposed
+ to be of type ValueEval (actually since ValueEval is an interface,
+ the return value is instance of one of the implementations of
+ ValueEval). The valueEval resulting from evaluate() is pushed on the
+ stack and the next RPN token is evaluated.... this continues till
+ eventually there are no more RPN tokens at which point, if the formula
+ string was correctly parsed, there should be just one Eval on the
+ stack - which contains the result of evaluating the formula.
Ofcourse I glossed over the details of how AreaPtg and ReferencePtg + are handled a little differently, but the code should be self + explanatory for that. Very briefly, the cells included in AreaPtg and + RefPtg are examined and their values are populated in individual + ValueEval objects which are set into the AreaEval and RefEval (ok, + since AreaEval and RefEval are interfaces, the implementations of + AreaEval and RefEval - but you'll figure all that out from the code)
+OperationEvals for the standard operators have been implemented and + basic testing has been done
+FunctionEval is an abstract super class of FuncVarEval. The reason for this is that in the FormulaParser Ptg classes, there are two Ptgs, FuncPtg and FuncVarPtg. In my tests, I did not see FuncPtg being used so there is no corresponding FuncEval right now. But in case the need arises for a FuncVal class, FuncEval and FuncVarEval need to be isolated with a common interface/abstract class, hence FunctionEval.
+FunctionEval also contains the mapping of which function class maps to which function index. This mapping has been done for all the functions, so all you really have to do is implement the evaluate method in the function class that has not already been implemented. The Function indexes are defined in AbstractFunctionPtg class in POI main.
+So here is the fun part - lets walk through the implementation of the excel + function... AVERAGE()
+public Eval evaluate(Eval[] operands) {}
for (int i=0, iSize=operands.length; i<iSize; i++) {...}
if (operands[i] == null) continue;
Strings are ignored. Booleans are ignored!!! (damn Oo.o! I was almost misled here - nevermind). Actually here's the info on Bools: + if you have formula: "=TRUE+1", it evaluates to 2. + So also, when you use TRUE like this: "=SUM(1,TRUE)", you see the result is: 2. + So TRUE means 1 when doing numeric calculations, right? + Wrong! + Because when you use TRUE in referenced cells with arithmetic functions, it evaluates to blank - meaning it is not evaluated - as if it was string or a blank cell. + eg. "=SUM(1,A1)" when A1 is TRUE evaluates to 1. + So you have to do this kind of check for every possible data type as a function argument for any function before you understand the behaviour of the function. The operands can be entered in excel as comma separated or as a region specified like: A2:D4. Regions are treated as a single token by the parser hence we have AreaEval which stores the ValueEval at each cell in a region in a 1D array. So in our function if the operand is of type AreaEval we need to get the array of ValueEvals in the region of the AreaEval and iterate over each of them as if each of them were individual operands to the AVERAGE function. +
+Thus, since sometimes, Excel treats + Booleans as the numbers 0 and 1 (for F and T respectively). + Hence BoolEval and NumberEval both implement a common interface: + NumericValueEval (since numbers and bools are also valid string + values, they also implement StringValueEval interface which is + also implemented by StringEval).
++ The ValueEval inside an AreaEval can be one of: + NumberEval, BoolEval, StringEval, ErrorEval, BlankEval. + So you must handle each of these cases. + Similarly, RefEvals have a property: innerValueEval that returns the ValueEval at the referenced cell. The ValueEval inside a RefEval can be one of: NumberEval, BoolEval, StringEval, ErrorEval, BlankEval. So you must handle each of these cases - see how excel treats each one of them. +
+ +The POI formula evaluation code enables you to calculate the result of + formulas in Excels sheets read-in, or created in POI. This document explains + how to use the API to evaluate your formulas. +
+The code currently provides implementations for all the arithmatic operators. + It also provides implementations for about 30 built in + functions in Excel. The framework however makes is easy to add + implementation of new functions. See the Formula + evaluation development guide for details.
+Note that user-defined functions are not supported, and is not likely to done + any time soon... at least, not till there is a VB implementation in Java! +
+The following code demonstrates how to use the HSSFFormulaEvaluator + in the context of other POI excel reading code. +
+There are two ways in which you can use the HSSFFormulaEvalutator API.
+Thus using the retrieved value (of type + HSSFFormulaEvaluator.CellValue - a nested class) returned + by HSSFFormulaEvaluator is similar to using a HSSFCell object + containing the value of the formula evaluation. CellValue is + a simple value object and does not maintain reference + to the original cell. +
+ +