Formula Systems

From PCGen Wiki
Jump to: navigation, search

JEP Formula System

What it is

The formula system is embedded within the "core" of PCGen to do mathematical calculations. It is used internally as well as exposed in a limited fashion to the data developers through tokens. Specifically, both data defined variables and BONUS values depend on the formula calculations performed by this subsystem.

Licensing

The licensing for JEP changed and we have targeted it for replacement. When we originally integrated JEP, it was licensed under a dual commercial/GPL license. We received (and still operate under) a special exception to use JEP with our code. This has the side effect of limiting the use of our code outside of our distribution. Subsequent to our initial use of JEP, the GPL option was dropped, and JEP is now a commercial product. This means we no longer have updates available to us, and we are using a stagnant library.

Performance

We want to improve the performance of formula calculation. Today, each time a formula is processed, we re-parse the formula (which redoes all of the validation and other checks). This is CPU intensive for complex formulas, and should be something we can cache and re-use. We therefore want a system where we can parse the formula early in the process and store the parsed formula, using that "binary" version for evaluation.

We want to improve performance around BONUSes. The major performance bottleneck we now have is around the loop of "Variables can be modified by BONUSes, which can be conditional upon Prerequisites, which can use variables". This loop is currently "lookback" in that things are calculated as necessary and a large resolution loop is required to ensure the system reaches stability. We want to shortcut this when possible to reduce looping and overall calculation of values that do not change.

Current Status

We are working to remove JEP from PCGen. This is a result of

  1. JEP is now a closed source library, so we cannot continue to benefit from future releases/bug fixes
  2. JEP is a license exception in our code, which restricts our license flexibility
  3. JEP has more features than we need (some of which get in our way)

A new set of code (imported as the PCGen-Formula library, stored in a separate GitHub project called PCGen-Formula) is the replacement. It addressed many of the requirements we are working to get, including local variables and validation at LST load (rather than runtime). By also maintaining an awareness of direct dependencies, it can perform a dramatically smaller number of calculations when any value is changed.

Format

The Format of a variable is the java "Class" it contains. In JEP, all variables are numbers.

Parsing

Major characteristics of this system:

  • Formulas have functions (delimited by parenthesis)
  • Formulas have both built-in terms as well as user defined variables (DEFINE: token)
  • All user defined variables are global in scope
  • All user defined variables are provided a starting value on definition
  • User defined variables are assumed to be zero if no DEFINE is ever encountered
  • User defined variables may be defined in more than one location
  • Some terms may be local in scope (e.g. spells have unique terms), but these are fragile and not thread safe.
  • Diagnostic tools in the UI allow presentation of the current value of a variable (or really any formula)
  • Formulas are NOT parsed for significant levels of validity at data load. They are given basic checks (to ensure parenthesis match, for example), but a full validity check is not possible.

Design

Currently, the use of JEP requires a multi-pass resolution. The formula is parsed, then it is processed/queried in order to determine any variables or terms it contains, and then those values are loaded into the formula and the formula is resolved. (see VariableProcessor.processJepFormula() )

This (JEP) method of resolution comes with some challenges and risks.

The parsed version of a JEP formula (a PJEP) ends up with a lot of complexity. It is a controller of sorts, rather than just a framework for a formula.

Also, the parsed version of a JEP formula has a set of awareness. The variable/term values are loaded into it, meaning a contract is placed on the programmer to "clean that up" so those values do not pollute other, later calculations. (This is generally not an issue based on how it is otherwise used, but may not be initially clear to an uninformed reader)

How we use JEP

Term

For purposes of this document, a term is a "built in" value that can be used in a formula. These are a subset of the items currently documented in the "Pre-defined Variables" or "System-Defined Variables" section of the docs (depending on how you get to that section)

An example of this include (but are certainly not limited to) BAB, BASECR, CASTERLEVEL, CL, SR, and TL.

Note that some terms can possess context. For example, CASTERLEVEL is valid only in SPELL objects, as it requires the "context" of the castable spell (which implicitly includes the class level, DC, # of times usable, etc... it is more than the Spell object defined in the LST file)

Some built-in variables look to the user as if they are a function, and are often treated by users as functions even though they are terms. These include square brackets in the pre-defined variable name. (e.g. COUNT[SKILLS]).

Variable

For purposes of this document, a variable is a data-defined value. Using today's data, this means it was defined by the DEFINE: token being encountered in the data. Obviously, due to the nature of formulas and ambiguity, there is limitation on variable names in that they must not conflict with a pre-defined term.

Global Variable

Global variables are variables that exist across the entire set of data. A global variable can be defined in one object and used in another. This is currently the case for all variables in PCGen, as they are all created with the DEFINE: token.

Paren Function

A paren function is a function that uses parenthesis () to contain the arguments to the function. An example of this is var("CL=Fighter").

"var" is the function, "CL=Fighter" is the (one) argument to the function.

Called "paren functions" to avoid confusion with the "terms" that contain brackets.

Ambiguity over a value

We want to avoid some situations of ambiguity. The current system for variables takes a "largest wins" argument when multiple definitions are encountered. This leads to potential confusion (and potentially debate over whether such an implicit decision should be allowed). It has also led to the adoption of a "data standard" that all values should be defined at zero and modifications provided as a BONUS:VAR|... Our redesign looks to eliminate the confusion over multiple conflicting definitions by providing a global definition characteristic and preventing otherwise identical variable definitions from having different starting values.

The new Formula parser

Design

The new formula parser is a JavaCC (well, technically, JJTree) defined syntax. It is held separately in the PCGen-Formula library, which is one of our two PCGen Libraries

Removed Characteristics

Boolean-aware calculation

Boolean and Numeric values will be calculated in their own domain. It will NOT be assumed that TRUE is 1 and FALSE is 0 as some other formula systems assume. A Boolean value here will be a Boolean and only usable where a Boolean is legal.

Therefore, the common operations that are Boolean operations such as AND (&&) and OR (||) will only operate on Boolean values. If a Function requires a Boolean value (such as the first argument to an IF function), then it must validate the semantics of the subformula.

This was based on team feedback as the formula system was originally designed.

Shared Characteristics

Major characteristics of this system that are similar to the existing implementation:

  1. Formula Functions are stored as plugins, so that new functions would not necessarily force a full rebuild of PCGen.

Changed Characteristics

Major characteristics of this system that are major modifications of the existing implementation:

  1. Formats other than numbers are supported.
  2. Starting value for a variable is based on the variable format (e.g. zero for numbers), so multiple defines do not "compete" [new: Define does not have a definition per DEFINE: token, but rather a DEFAULTVARIBLEVALUE once per game mode for a variable format (numbers being one possible variable format). This is consistent with and thus formalizes the data best practice of defining variables to zero].
  3. Terms are eliminated. They should be replcaed by functions or user variables.
  4. BONUS:VAR is eliminated and replaced with Modifiers (see below) [and in general ALL BONUS tokens will eventually disappear].
  5. The new system will not have any terms. All items will be a function or a user defined variable. This will dramatically simplify dependency analysis.
  6. The system does "integer aware" arithmetic, meaning 2+3=5 (not 5.0 or 4.999999999 or 5.000000001 as we might have today).

New Capabilities

Major characteristics of this system that are new capabilities:

  1. Diagnostic tools in the UI should provide the ability to expose the entire calculation, from default value through all modifications (and source of those modifications).
    1. It is important to recognize that the data practice of indirection (e.g. one object awards a Template called FACE_20 that sets the FACE:) is STRONGLY discouraged in the new system. That indirection makes it A LOT harder to figure out what modified a variable - something that the new system can make VERY clear, but that the data can hide and make difficult.
  2. Formulas are validated at data load to ensure good syntax, and that any functions that are used exist (and are valid) and variables are nominally defined somewhere in the data.
    1. There are a few rare excpetions to full validation, because they really occur at runtime... so just like we can't automatically detect that a divide by zero will occur, we can't detect that a Table provided to a lookup() function will actually have the necessary values and columns for the incoming variables.
  3. Dependencies are tracked and any circular logic will be identified as such during calculation (it is expected that this is not possible to catch at LST load due to formulas on impossible data combinations leading to false positives). This is done in the specific Solver.
  4. Local Variables (see below)

Local Variables

Local Variables are variables that exist in only a portion of the data. They possess context. They can only be used within that context. For example, a local variable defined on a piece of Equipment could only be used within that piece of equipment. Since Equipment can "own" Equipment Modifiers, EqMods could also modify or evaluate the local variable. However, an attempt to interpret the value of the local variable in a Spell (for example) should produce an error, since there is no known context of Equipment within a spell.

Given that they have (per-item) context to a specific instance (such as "Longsword +1" and "Shortbow +2"), local variables have independent values on each instance (so MyPlusValue on the Longsword could be 1, and MyPlusValue on the Shortbow could be 2).

Like global variables, local variables are on a per-character basis (so local variables are per-character, per-item). A Longsword on one character has a value N and a Longsword on a second character has a value M. N may or may not be equal to M (there is no linkage - it is based on whether the two Longswords have identical EqMods).

Modifiers

The MODIFY and MODIFYOTHER tokens replace BONUS:VAR. These provide a significantly enhanced experience from using BONUS, specifically:

  1. Specific values calculated by the formula system
  2. No implicit stacking rules. All stacking is explicit by the data owner
  3. No implicit replacement rules. All replacement is explicit by the data owner
  4. Dependency will be explicit and calculation will be carefully managed. (A graph of dependencies between variables is maintained by a SolverManager)

Single Pass Evaluation

Unlike JEP where a pass is made to determine the variables and then load those into the object, the new formula system is capable of doing an evaluation in a single pass.

Having a single-pass resolution by a visitor to the tree, with the context passed in as a parameter, has a number of functional advantages:

  • This effectively makes a formula "immutable" in the sense that the context is passed in during resolution and thus can be evaluated in multiple threads without causing issues. Note this is a necessary consideration if we want to reuse formulas that are in the data multiple times, as the UI demonstrably triggers evaluation of items from multiple threads. The visitors are also "immutable" (they are in fact, fully stateless), and thus reusable without being concerned about thread safety.
  • This keeps the formula as light-weight as possible (The tree structure is still a bit expensive, but better than a PJEP).
  • The context can be set based on what is passed in, meaning evaluation locality is driven by the caller (knowing the locality) rather than forcing the formula to evaluate some string to figure out where it is.
  • The entire concept of a PJEP pool goes away (in exchange we effectively have a context, but those are reusable so we don't ever have the lock/free necessity of PJEP). (For specificity, those solving contexts are called "DependencyManager", "EvaluationManager", and "FormulaSemantics", and can currently be found in the PCGen-formula library.
  • We actually prohibit any funky hoop-jumping. By only providing the context in the sense of a variable library, we can accurately handle things like spell-localized items (e.g. today's CASTERLEVEL term) without having a temporarily set global item.

Further Reading

For more detailed information on setting up the new formula system, see Setting up the new Formula System