Difference between revisions of "Formula Parser Equip Vars Proposal"
Tom Parker (talk | contribs) (→Requirements) |
Tom Parker (talk | contribs) m (→Required Functions) |
||
Line 165: | Line 165: | ||
* Dependencies are tracked and any circular logic will be identified as such during calculation (it is expected that this is not possible to catch at LST load due to formulas on impossible data combinations leading to false positives) | * Dependencies are tracked and any circular logic will be identified as such during calculation (it is expected that this is not possible to catch at LST load due to formulas on impossible data combinations leading to false positives) | ||
− | Modifiers | + | ===Modifiers=== |
+ | |||
+ | These are effectively an early framework of the replacement for BONUSes, will have the following characteristics: | ||
* Specific values calculated by the formula system | * Specific values calculated by the formula system | ||
* No implicit stacking rules. All stacking is explicit by the data owner | * No implicit stacking rules. All stacking is explicit by the data owner | ||
Line 192: | Line 194: | ||
The intent of bracket functions is to allow them to be easily implemented for backwards compatibility but not to continue to develop new function as bracket functions. The intent would be to use paren functions for all new formulas (bracket functions "start their life" as deprecated). | The intent of bracket functions is to allow them to be easily implemented for backwards compatibility but not to continue to develop new function as bracket functions. The intent would be to use paren functions for all new formulas (bracket functions "start their life" as deprecated). | ||
− | |||
==Other Functional Requirements== | ==Other Functional Requirements== |
Revision as of 16:44, 10 May 2014
Contents
- 1 Scope of the project
- 2 Definitions
- 3 Existing Implementation
- 4 Reason for development
- 5 Requirements
- 6 Calculating PC variables
- 7 Existing Challenges/Limitations
- 8 Conversion
- 9 Existing Sandbox
- 10 Use Cases
- 11 Major Components
- 12 Key Items for the Data Team to consider
Scope of the project
This discusses the design of a replacement formula/value calculation system within PCGen and its limited use for Equipment Variables.
The formula system is embedded within the "core" of PCGen to do mathematical calculations. It is used internally as well as exposed in a limited fashion to the data developers through tokens. Specifically, both data defined variables and BONUS values depend on the formula calculations performed by this subsystem.
This project discusses the architecture and design of that system with the long-term intent of using it as the formula system for all of PCGen. The immediate project is a proposal to implement Equipment Variables using the system in order to enable testing on a lower-risk basis and learn about overall integration challenges with PCGen. This document therefore balances the full set of requirements while only worring about implementation of those necessary to do equipment variables.
Note that the changes proposed only cover the formula system. While many mentions are made of the BONUS system, it will remain unchanged at this time. It is discussed in some detail because it heavily relies on the formula system so we can use the requirements of the BONUS system to guide the design of the formula system.
A Note on examples and document scope
Examples included here are based on a hypothetical syntax meant to demonstrate the concepts. This is a code / architecture project scope and is not intended as a data proposal to finalize LST syntax. No guarantee is made that the provided systax is compatible with current LST files (and thus whether it is even usable in a formal LST syntax proposal).
Definitions
Term
For purposes of this document, a term is a "built in" value that can be used in a formula. These are a subset of the items currently documented in the "Pre-defined Variables" or "System-Defined Variables" section of the docs (depending on how you get to that section)
An example of this include (but are certainly not limited to) BAB, BASECR, CASTERLEVEL, CL, SR, and TL.
Note that some terms can possess context. For example, CASTERLEVEL is valid only in SPELL objects, as it requires the "context" of the castable spell (which implicitly includes the class level, DC, # of times usable, etc... it is more than the Spell object defined in the LST file)
Some terms look to the user as if they are a function, and are often treated by users as functions even though they are terms. These include square brackets in the term name. (e.g. COUNT[SKILLS]). Due to the presence of the brackets they are NOT terms for purposes of this document. See Bracket Functions below.
Variable
For purposes of this document, a variable is a data-defined value. Using today's data, this means it was defined by the DEFINE: token being encountered in the data. Obviously, due to the nature of formulas and ambiguity, there is limitation on variable names in that they must not conflict with a pre-defined term.
Global Variable
Global variables are variables that exist across the entire set of data. A global variable can be defined in one object and used in another. This is currently the case for all variables in PCGen, as they are all created with the DEFINE: token.
Local Variable
Local Variables are variables that exist in only a portion of the data. They possess context. They can only be used within that context. For example, a local variable defined on a piece of Equipment could only be used within that piece of equipment. Since Equipment can "own" Equipment Modifiers, EqMods could also modify or evaluate the local variable. However, an attempt to interpret the value of the local variable in a Spell (for example) should produce an error, since there is no known context of Equipment within a spell.
Also, given that they have context to a specific instance (such as "Longsword +1" and "Shortbow +2"), Local variables have independent values on each instance (so MyPlusValue on the Longsword could be 1, and MyPlusValue on the Shortbow could be 2)
A Note on Possible usage
While this proposal is specific to Equipment for completeness of thought and assistance in understanding possible future uses, the only places where local variables make sense is:
- Equipment: since EqMods can be attached
- Classes: Since there are ClassLevels that can be "attached"
- Spells: Since they could be enhanced by things like MetaMagic Feats by the time they become castable spells (eventually forming what the code team calls a CharacterSpell)
Paren Function
A paren function is a function that uses parenthesis () to contain the arguments to the function. An example of this is var("CL=Fighter")
"var" is the function, "CL=Fighter" is the (one) argument to the function
Bracket Function
A bracket function is a "built in" value that can be used in a function. These are a subset of the items currently documented in the "Pre-defined Variables" or "System-Defined Variables" section of the docs (depending on how you get to that section)
An example of this include (but are certainly not limited to) COUNT[SKILLS] and COUNT[STATS]
While both use the same infrastructure in PCGen currently, These can be distinguished from terms in that they contain square brackets.
Existing Implementation
The existing implemenation is comprised of the following:
Formula Parsing
Formula parsing is performed by JEP (the Java Expression Parser) a 3rd party library
Major characteristics of this system:
- Formulas have functions (delimited by parenthesis)
- We emulate formulas that are delimited by brackets (they are treated as terms)
- Formulas have both built-in terms as well as user defined variables (DEFINE: token)
- All user defined variables are global in scope
- All user defined variables are provided a starting value on definition
- User defined variables are assumed to be zero if no DEFINE is ever encountered
- User defined variables may be defined in more than one location
- Some terms may be local in scope (e.g. spells have unique terms)
- Diagnostic tools in the UI allow presentation of the current value of a variable (or really any formula)
- Formulas are NOT parsed for significant levels of validity at data load. They are given basic checks (to ensure parenthesis match, for example), but a full validity check is not possible.
Bonus Processing
BONUS processing is performed by our BONUS management system (specifically BonusManager)
Major characteristics of this system:
- BONUSes have specific values calculated by the formula system
- BONUSes have certain stacking rules based on their type and other flags (.STACK)
- BONUSes allow override of values (.REPLACE)
- BONUSes are used to modify variables (BONUS:VAR|...)
- BONUSes can be conditional (and the condition cane be a variable or other item), making BONUS updates highly self-dependent [this is currently done in a loop to ensure BONUS values stabilize]
- The system does not manage loops/conflicts well, in that lack of stabilization has to be terminated based on a number of tries.
Reason for development
Licensing
The licensing for JEP changed and we have targeted it for replacement. When we originally integrated JEP, it was licensed under a dual commercial/GPL license. We received (and still operate under) a special exception to use JEP with our code. This has the side effect of limiting the use of our code outside of our distribution. Subsequent to our initial use of JEP, the GPL option was dropped, and JEP is now a commercial product. This means we no longer have updates available to us, and we are using a stagnant library.
Performance
We want to improve the performance of formula calculation. Today, each time a formula is processed, we reparse the formula (which redoes all of the validation and other checks). This is CPU intensive for complex formulas, and should be something we can cache and re-use. We therefore want a system where we can parse the formula early in the process and store the parsed formula, using that "binary" version for evaluation. (See functional requirements for more on the binary format)
We want to improve performance around BONUSes. The major performance bottleneck we now have is around the loop of "Variables can be modified by BONUSes, which can be conditional upon Prerequisites, which can use variables". This loop is currently "lookback" in that things are calculated as necessary and a large resolution loop is requried to ensure the system reaches stability. We want to shortcut this when possible to reduce looping and overall calculation of values that do not change.
Avoiding Ambiguity
We want to avoid some situations of ambiguity. The current system for variables takes a "largest wins" argument when multiple definitions are encountered. This leads to potential confusion (and potentially debate over whether such an implicit decision should be allowed). It has also led to the adoption of a "data standard" that all values should be defined at zero and modifications provided as a BONUS:VAR|... This redesign looks to eliminate the confusion over multiple conflicting definitions by providing a global definition characteristic and preventing otherwise identicial variable definitions from having different starting values.
Reducing Complexity
We want to reduce the amount of confusion and complexity around BONUSes. Currently the conditions of stacking, replacement, and overall calculation of final values are constrained and limit flexibility of data designers. We requrie multiple BONUSes to calculate a single value (and even then it is not feature complete). Simple rule changes can add huge complexity to data and that shouldn't be necessary.
Reduce d20 Linkage
We want to reduce the tie to d20. Many of our system make heavy d20-related assumptions (reference our internal terms here, for the most part) and we want to reduce those over time (put more power in the hands of the data and reduce the assumptions in the code)
Requirements
Given that the intent is to eventually replace JEP, the primary design criteria is to minimize the amount of radical change required to existing, well-formed formulas. This guides and constrains much of the design to result in a syntax that is very similar to any general equation parser (which also happens to be similar to JEP) so overall the design here is not terribly unique to PCGen. In fact, any tutorial for building an equation parser covers much of the basic design (leaving out perhaps how functions are defined and parsed).
Well-formed
Well-formed in the case of the larger project of replacing JEP matches the definition often used in LST token development, meaning if things are leveraging a bug to get a correct answer, or have some outlying issue that allows them to work when they shouldn't (such as unbalanced parenthesis), then the system makes no effort to allow the not-well-formed formula to work. This includes using some obscure features of JEP that we do not intend to duplicate (I don't believe we actually do this, but we need to recognize that JEP has some rather advanced capability that we do not intend to duplicate). The intent is for 95%+ (likely 99%+ given our experience with LST tokens) of formulas to initially work without modification. To be clear: The requirement when this is swapped in as the primary formula parser is NOT that LST will work unmodified: We know from experience that there are errors in the data and we must be able to allow those to fail at LST load, even though they do not today.
Work without modification
Work without modification in the case of the larger project of replacing JEP does not mean "work without deprecated content". It also does not mean "be converted without data team intervention".
It means the formula will parse and produce the correct answer in the version in which the formula swap is made. It may report as deprecated either because it uses deprecated features (bracket functions being one example) or it may report as deprecated because the new equation parser cannot parse it and it has to fall back to JEP or the older pre-JEP equation parser.
This second situation probably deserves an example. Because we do not have the ability to strictly monitor formulas when they load to determine if they are JEP-legal, it is still possible to have a formula of the form: FooMAXBar
This is the equivalent of max(Foo,Bar)
This formula would never be parsed by the New Equation Parser, and would immediately be reported as deprecated. It would also report as unconvert-able by any converter and require data team intervention. Requiring such cases to be automatically converted basically renders any equation parser impossible to implement, due to the quirks and complexity of both JEP and the pre-JEP formula system.
The general rules for conversion will be:
- If the item is correctly parsed by the New Equation Parser, it should be automatically convertable. There may be exceptions that we have not yet identified (as that project is not being fully scoped at this time)
- If the item has to fall back to JEP or the pre-JEP formula system, it will not be automatically convertable.
Required Functions
Major characteristics of this system that are part of the existing implementation:
- Formulas allow user defined variables
- User defined variables may be defined in more than one location
- User defined variables are assumed to be zero if no DEFINE is ever encountered (an alternative is to produce an error - up to the data team to decide)
Major characteristics of this system that are small modifications of the existing implementation:
- Formulas have functions (delimited by parenthesis or brackets) [new: bracket functions are now "native" (more below on why)]
- Support global and local variables [new: support for local variables]
Major charactetistics of this system that are major modifications of the existing implementation:
- Starting value for a variable is based on the variable type (e.g. zero for numbers), so multiple defines do not "compete" [new: Define does not have a definition per DEFINE: token, but rather once per game mode for a variable type (numbers being a variable type). This is consistent with and thus formalizes the data best practice of defining variables to zero]
- Terms are eliminated. They are replaced by new functions (in most cases - some may be handled as user variables)
- BONUS:VAR is eliminated and replaced with Modifiers (see below)
Major characteristics of this system that are new capabilities:
- Diagnostic tools in the UI should provide the ability to expose the entire calculation, from default value through all modifications (and source of those modifications)
- Formulas are validated at data load to ensure good syntax, and that any functions that are used exist (and are valid) and variables are nominally defined somewhere in the data.
- Dependencies are tracked and any circular logic will be identified as such during calculation (it is expected that this is not possible to catch at LST load due to formulas on impossible data combinations leading to false positives)
Modifiers
These are effectively an early framework of the replacement for BONUSes, will have the following characteristics:
- Specific values calculated by the formula system
- No implicit stacking rules. All stacking is explicit by the data owner
- No implicit replacement rules. All replacement is explicit by the data owner
- Dependency will be explicit and calculation will be carefully managed. A calculation loop should only be necessary in as much as it modifies values used in Prerequisites.
Future requirements
For reference:
- Modifiers can be conditional (and the condition can be a variable or other item)
Note on Intent with respect to bracket functions
Today, bracket functions are effectively built in variables (of rather high complexity).
These should be retired for a few reasons:
- The current system of two things that look like functions (one with parens, one with square brackets) is enormously confusing and often misinterpreted, especially since both use "count"
- The argument style (doesn't use quotes) is inconsistent with parenthesis-based functions, adding yet more confusion due to inconsistent syntax
- The square bracket items often require pattern matching for the current term system to recognize them as a term, and we want the parsing of strings (variables and function names) to be deterministic rather than a pattern match
- We are retiring terms anyway, so these need a form of replacement
At the same time, the presence of the square brackets can allow us to temporarily treat them as "first class functions" rather than a complex term. This:
- Helps break up the code into smaller, more isolated pieces that are specific to one argument rather than dealing with the entire function
- Helps sunset the pattern matching behavior (at least for the parser framework itself - the function may have to do some complex process to work out the arguments)
The intent of bracket functions is to allow them to be easily implemented for backwards compatibility but not to continue to develop new function as bracket functions. The intent would be to use paren functions for all new formulas (bracket functions "start their life" as deprecated).
Other Functional Requirements
In a previous code team discussion, we had a discussion over the "binary" nature of the formulas that we would use. An existing "formula compiler" (from a separate project) was provided that compiled formulas into bytecode (using ASM). This was deemed "more than necessary" (and potentially confusing), so it was decided that the "binary" implementation would simply be the "tree" of objects returned by the parse. We could then visit the tree to perform the calculation. (I am - unfortunately - unable to find the code team meeting in which this discussion occurred)
Note that no (current) judgement is made over whether the formulas are parsed at LST load and permanently stored in their parsed state, or parsed at LST load, discarded, and then parsed and cached on first use in a PC. The former is clearly more memory intensive and may be unreasonable. The threshold is set here at a maximum of 5MB for formulas when the RSRD for Players is loaded. If that threshold is exceeded, then load, discard and cache-on-first-use will be the required implementation.
Have a formula factory that can detect situations where a formula like 1+INT is used multiple places in the LST. These should be reduced via a cache to a single Formula object (since a Formula is immutable) to save memory and hopefully reduce LST load time.
Make formula functions into plugins, so the Formula system can be extended without modifying the core.
There are no built-in terms. Everything will be a variable. There may need to be a system for defining game-mode-wide Variables (effectively the replacement of terms - more below).
Bonus points if we can have the system do "integer aware" arithmetic, meaning 2+3=5 (not 5.0 or 4.999999999 or 5.000000001 as we might have today)
a Prerequisiste for "equipment variables" needs to be provided, so something like: PREEQVARxx:a,b ... where XX is "LT", "LTEQ", etc. ... and a, b are the two values.
Interaction with existing (JEP) Formula System
There is NO expectation that formulas can be shared across the two systems while we have two parsers. You cannot use a global variable in a formula (JEPFormula) in an equipment variable formula (NEPFormula), or vice versa. The namespaces and calculations are completely separate.
To break this assumption invites a whole ton of complexity that basically breaks any ability to do equipment variables in a simple way that does not impose itself on the Variable->Bonus->Prerequisite->Variable loop.
Note: There may need to be a limited ability to import NEPFormula variables into JEPFormulas during a future BONUS transition. This would be doable with a function (e.g. nepvar("blah")). Due to the ugly nature of what is legal in JEP formulas (and the fact that they cannot be validated during LST load), JEP variables will not be usable in NEPFormulas.
Requirements For discussion
I would propose that we break the Formula Parser off into a separate sub-project. None of it needs to be PCGen-specific (as demonstrated in the current sandbox), and the jjtree/javacc calls required to build .java files make a more complex build cycle that it would be nice to hide from the main trunk (the challenge being that you have to do a build, then actually select the project and cause it to refresh in order for it to correctly compile the java files to the current version - so the compilation of the formula system into a separate JAR file is something I would appreciate.) ... This may have the effect of breaking a reasonable portion of the pcgen.base.* also into a separate (and different) subproject since they are shared dependencies of the core and the formula parser. Since these items are reasonably stable (most have had more more than cosmetic changes in years), the separation and then addition of 2 JARs to our distribution should not be an unreasonable burden on the main build of PCGen.
Discussion
Why no built in terms?
Basically they add complexity. With built-in terms, when a string is encountered in a formula, we have to establish whether it is a built-in term or whether it is a variable. The built-in term would be a plug-in in Java that would then call back into the core in order to produce the answer. Use "BASECR" as an example.
As a term: a) Formula Visitors all need to distinguish "is it a term" or "is it a varible" - meaning a code check (if statement) and a "term library" has to be added to the FormulaManager. b) If a term is encountered, it has to jump into that term (subroutine call to external plugin - and the external plugin is something we had to process at boot) c) The term then calls back into the core to get the answer (so the term had to be passed the PC) d) During LST load, the term system must also be checked to ensure that a variable name and a term name do not overlap.
Now imagine we have a variable: a) Formula visitors assume all text is a variable b) Variables must be able to be defined game-mode-wide (since we still may want the user to be able to type "RACECR" rather than "racecr()" - though that is certainly open for debate c) We need to implement a function that can get the base challenge rating (this effectively does "b" and "c" from term but just call it "function" instead)
So the net effect of banning terms is that we simplified the check of variables at LST load and simplified the formula visitors in exchange for - possibly - some more variables and having to define those game-mode-wide. This is actually a good trade, as it decouples us from d20, and makes our calculations (and variables) explicit to the game mode. We also can remove certain variables from game modes that do not need to worry about those items, cleaning up the data and better allowing errors in the data to be caught... so it's effectively a win-win trade to have no built-in terms.
There is one situation where terms could be seen as an advantage: They cannot be modified. But even in this case, I am challenged to find a use case where that advantage is clear. If such a use case is encountered, then adding the ability to "lock" a variable in the game mode (e.g. LOCKVAR:BaseCR) would be possible - this really is a rather trivial change to the VariableIDFactory. Today when a variable scope is asserted, it returns true (that's ok) or false (you've asked for a definition that conflicts with what I've already been told). If we want to have locked variables (things that are "final" and not modifiable in user data) then we end up with wanting to have 3 responses: Legal, Illegal, and Locked. That is a minor change to the VariableIDFactory and is easily supported if such a use case is identified.
Calculating PC variables
The formula system described above only parses the Formula into a tree. We need another system to take that tree, understand dependencies, and properly calculate values based on those dependencies. We call this subsystem the "Solver" subsystem.
This can solve for all characteristics of a PC. This could be a variable used by the data team or an internal item like the calculation of "Hands" (what could be referred to as a "global characteristic" of a PC.
For any variable or given PC Characteristic, it can be solved through knowledge of:
- An initial value
- A set of modifiers that allow modification of that value
Modification may include add, subtract, multiply, set, or some more complex operation
The initial design around equipment variables, but allow for non-numeric values to be solved
Existing Challenges/Limitations
We are heavily dependent upon the formula system for appropriate calculation of values, and significant changes to this system are challenging (arguably "high risk")
The dependency calculation on BONUSes is currently based on the entire String of the BONUS, and thus performs some very complicated dependency analysis due to that use of the full String. This makes a transition to a new formula system (that clearly understands dependencies) a prerequisite for conversion of the BONUS sytem to a more manageable system. This limits our first-pass scope to just the variable system.
Conversion
This proposal adds a new capability (Equipment Variables) and thus has no compatibility issues. Future projects to replace JEP or redo the BONUS system will encounter a number of compatibility issues, including, but probably not limited to:
- Handling .STACK, .REPLACE, TYPE= on BONUSes
- Handling conversion of terms to variables (either local or game-mode-wide)
- Handling the use of Output Tokens (to be removed - they are basically impossible to validate anyway)
- Conditional Modifiers (since BONUS can take PRExxx)
Existing Sandbox
There is a proposed implemenation of this subsystem, located in 2 pieces:
The Formula subsystem is located in sandbox/FormulaParser/NoTerm
The Solver subsystem is located in sandbox/FormulaParser/ModifierNoTerm
Use Cases
Global Variables
Note: These use cases are not supported for the current project, but are mentioned in order to complete the full context for the formula system design.
Inspire Duration
Controls the heroics duration for a Bard ("Bardic Inspire Heroics")
Today:
DEFINE:InspireDurationBase|0 (in an Ability) DEFINE:InspireHeroicsDuration|InspireDurationBase (in an Ability) BONUS:VAR|InspireDurationBase|5 (on Bard class level 1)
Future:
DEF:INTEGER|InspireDurationBase (in an Ability) DEF:INTEGER|InspireHeroicsDuration (in an Ability) MODIFY:InspireHeroicsDuration|SOLVE|InspireDurationBase (in an Ability) MODIFY:InspireDurationBase|ADD|5 (on Bard class level 1)
Local Variables
Catching Bad Use of a Variable
DEFINE:Foo|0 (in an Ability) DEFLOCAL:Bar|0 (in Equipment) MODIFY:MyVar|SOLVE|value()+Foo+Bar (in a Skill)
Should fail at load because Bar is a local variable on Equipment. (The alternative is to claim it always resolves to zero, but what's the point of NOT producing an error in this case?)
Fantasy Craft Essence/Charm
Rule: "Whether found, seized, crafted, or purchased, every magic item possesses 1 Essence and/or 1 Charm (but no more)."
Today: Not possible (needs Equipment Vars)
Future:
DEFLOCAL:INTEGER|AllowedCharms (on Equipment) DEFLOCAL:INTEGER|PossessedCharms (on Equipment) MODIFY:AllowedCharms|ADD|1 (on EqMod that makes an item magical) MODIFY:PossessedCharms|ADD|1 (on any Charm EqMod) PREEQVARLT:PossessedCharms,AllowedCharms (on any Charm EqMod)
Note this provides the flexibility to allow the charm limit to be 5 for artifacts.
Barbarian Illiteracy
Today:
DEFINE:IlliteracyLVL|0 BONUS:VAR|IlliteracyLVL|CL ABILITY:Special Ability|VIRTUAL|Illiteracy|PREVAREQ:TL,IlliteracyLVL
New System:
DEFINE:INTEGER|IlliteracyLVL MODIFY:IlliteracyLVL|SOLVE|classlevel(this()) ABILITY:Special Ability|VIRTUAL|Illiteracy|PRENEPVAREQ:totallevel(),IlliteracyLVL
Note the special case of this() being allowed in the function classlevel.... two effects: "this" is a "reserved" function name, just like "value", and it safely allows cloning, et al to occur since it is resolved at runtime to determine the owning object.
Note the use of this() may have to be limited to certain situations - there are objects that do not properly "trace themselves" through today's formula system and we will need to address how that works in the new system to ensure that the tracing always exists (although the requirement for a scope may resolve that issue entirely)
Proper Order of Operations
Order of operations and variable scope definition
If we have a formula on Longsword that uses:
DEF:INTEGER|MyVar MODIFY:MyVar|SOLVE|value()+Foo+Bar
Let's assume we are building a dependency tree for MyVar. As a reader, we can see that this formula is dependent on Foo and Bar.
Let's assume we have previously struck a DEFINE of Foo as a Global variable. Therefore we know the "Variable ID" of Foo is "Global:Foo". However, if we are in a piece of equipment, we do not know if Bar is "Global:Bar" or "Equipment('Longsword'):Bar" (or invalid, but let's skip that case since we handled that in an earlier use case)
This is significant if we are building a dependency tree. There are two ways to do it:
- Load "Unknown[possibly Equipment('Longsword')]:Bar" into the dependency tree and once we strike a define of Bar do a replacement in the graph of the "unknown" node. This could be more than one node since it if it global, then each separate local environment in which we encountered the variable may have inserted an item into the graph. This is potentially confusing, if the graph is analyzed to debug, it is object and memory churn, and in general, the additional flexibility provided has little real-world use (at best we can reuse a variable between two scopes, but arguably that makes the data more confusing and should be prohibited anyway!)
- "Know" in advance which of the scopes is potentially valid, and load the correct dependency even before the DEFINE is struck (but it will have a value of zero until the define is struck)
The latter causes less (no) churn in the dependency graph. It is possible IF we know the potential scope of a variable. By tracking at load how variables are defined and allowing a given name to only appear one time, it is unambiguous which "Variable ID" should be loaded in the dependency tree. This then leads to the requirement (discussed further below) that a variable can only be used in one local scope (for ease of validation at load).
The creation of the appropriate VariableID (in this case "Longsword:Bar") is done by the VariableIDFactory - the "sole place" to get VariableIDs created.
PC Characteristic
Hands
An example of this is "Hands" on a PlayerCharacter (simple Integer)
Today the base # of hands is set by the race, and potentially modified by Templates.
Existing processes we use to calculate Hands has a significant issue: There is a race condition. In the case of two templates that modify "Hands" on a PC, it is the template that is "applied last" which wins. However, "last" is relative, since the contents of a PC can be constantly re-interpreted internally by PCGen. So there are some (admittedly rather obscure) corner cases where the calculation would not be correct.
We need to have a way to eliminate the race condition
Today:
HANDS:2 (on Race) HANDS:4 (on Template1) HANDS:6 (on Template2)
The PC will have 4 or 6 hands, depending on the order PCGen sees the templates (this is not [well] guaranteed)
New System:
DEFINE:INTEGER|Hands (in Game Mode) MODIFY:Hands|SET|2 (in Race) MODIFY:Hands|SET|4|PRIORITY=x (in Template1) MODIFY:Hands|SET|6|PRIORITY=y (in Template2)
The PC will have 4 or 6 hands, depending on the values of x and y. if x == y then the output is undefined like in the existing case. Otherwise, the "higher" priority "wins".
Note also that we have freed up templates to do addition of hands rather than a set, so the new system is MUCH more flexible for the data team.
Movement
For items that require multiple modifications, we add more code and infrastructure because of more and more complex mathematical requirements. This lack of flexibility puts pressure on the code team rather than allowing the data team to simply specify the calculation it desires. Movement is a good example here where we have multiple bonuses that add to movement, multiply, add after the multiply, etc. This "tit-for-tat" escalation that requires new token creation should be eliminated.
Today:
MOVE:Walk,20 BONUS:MOVEADD|Walk|10 BONUS:MOVEMULT|Walk|2 BONUS:POSTMOVEADD|Walk|5
Theoretically, this will produce (20+10)*2+5 = 65 as the movement.
DEFINE:MOVE|Walk MODIFY:Walk|ADD|20 MODIFY:Walk|ADD|10|PRIORITY=100 MODIFY:Walk|MULTIPLY|2|PRIORITY=200 MODIFY:Walk|ADD|5|PRIORITY=300
This also produces 65 as the movement, and is enormously flexible about being able to do additional calculations on movement without having to get more and more BONUS objects defined in order to get the answer correct. All that needs to be correctly specified is the Priority of the modifications to ensure they occur in the correct order (and other data can insert new items between existing calculations should anything like that be needed - no code required)
Major Components
Overall Formula
To be stored in pcgen.base.formula
Parser syntax is written in jjtree (.jjt) syntax, compiled to a javacc (.jj) file and then to java. Parser syntax, parser files modified from the default, and dynamic files are stored in pcgen.base.formula.parse
Principles of Design:
- Typical mathematical calculations (+-*/%^, logical operations, etc)
- Allow variables and functions
- Functions can be parenthesis functions, e.g. count(...) or bracket functions, e.g. COUNT[...]. The two are NOT interchangeable.
- Parse the formula into a tree, allow the tree to be walked in order to perform calculations
- Allow the formula tree to be "reconstructed" into the string representation so the data converter can do formula modifiation (e.g. a function rename)
Maintain the formula as a tree - minimize post-processing after parse Little value in post-processing (the "Expensive" part is building the tree) Visitor pattern already well recognized, so we can just have visitors that perform different functions and that should be reasonable for a developer to follow.
FormulaValidity
Principles of design:
- Let parse errors do their thing (We'll have to consider what we do as far as bad parser diagnostics)
- A formula should be able to return a FormulaValidity object to identify (with specificity) any issues with the formula
- If there is more than one issue, only one issue needs to be returned (fast fail is acceptable)
- Want to be able to be clear to a user why something is not a valid formula (e.g. a variable that is never defined)
- Things to detect: Bad structure (internal errors), invalid # of formula args, function not found, variable not found
Variables
To be stored in pcgen.base.formula.variable
Principles of Design/use:
- There are NO built in terms - we will have "global" variables that can be defined at the game mode level (not attached to an object)
- Variable names must start with a letter, and may have numbers, periods, underscores
- Variable names may have a single equal sign ("=") for backwards compatibility with things like CL=x. This should NOT be part of a data standard and may be deprecated after we make a full conversion to the new equation parser
- A Variable has both a name and a scope (there can be local variables)
Variable Scope
To be stored in pcgen.base.formula.scope
Principles of Design:
- Support local variables (e.g. a variable solely calculated within a piece of equipment)
- Detect when a variable is properly used in scope (see use cases)
- A Variable name can only be defined in one scope. If "Foo" is used as an Equipment variable, it CANNOT be used as a Spell variable (for example) [this assumes we support multiple scopes of local variables at some point] (see use cases). Enforcement of this is done by "VariableIDFactory".
- Each variable scope needs to understand / contain its parent scope so that the scope tree can be "walked" during variable resolution
- Scope should not be null (there should be a "global scope" object) [This is mainly for "laziness" of not wanting to have null checks all over the place - just enforce up front the != null characteristic]
Discussion
There are a few design characteristics / decisions that should be noted:
Definitions
When a variable is initially defined, we end up with a "variable scope defintion", this is effectively a combination of class and the var name (e.g. "Equipment.class:Foo"). We can tell this (in the case of PCGen) from the location in which the variable is defined, e.g. if we encounter DEFLOCAL:INTEGER|MyEqVar in an Equipment LST file, we know that the "local" object is the Equipment class.
When a variable is actually initialized, we end up with a "variable scope", this is a combination of an actual instance and the var name (e.g. "Longsword:Foo"). This is done when an item is instantiated. In the case of Equipment, this means (a) when it is added to the PC or (b) When it is opened in the customizer.
VariableScopeDefinition design reasoning
One possibility would have been to force the VariableScopeDefinition to be a hard class, such as Equipment.class and have that implicitly compared within the VariableIDFactory. This was seen as too restrictive, given that there may be situations where - for example - subclasses may be legal. If local variables are allowed on classes, then we need to allow those to work on a SubClass line as well. So a strict compare does not work, and if the rule is not that obvious, then it's better to externalize the check.
Therfore, we have a VariableScopeDefinition object that has an isCompatible(x) method. This allows us to subclass items yet still have them in a specific scope, since a given implemenation of VariableScopeDefinition can understand what is legal. (SubClass, SubstitutionClass, etc.). In general, we avoid having the VariableIDFactory make an assumption about what is or is not legal for a given scope...
VariableIDFactory design reasoning
Why is VariableIDFactory necessary? - Why a factory?
Specifically, we want to control where VariableIDs are manufactured to ensure we are always manufacting variableIDs that are compatible with the given scope for a variable.
Since that needs to be enforced against where a variable is legal, we need the VariableID construction to be closely associated with the VariableScopeDefinition object, so we contain construction inside the factory and the factory also holds the VariableScopeDefinition objects, so a VariableID cannot be constructed if it does not meet the VariableScopeDefinition.
Function
Located in pcgen.base.formula.function (as well as - in the future - plugins)
Principles of design:
- Want to be able to have functions pluggable (common interface, ability to pull name from the instance)
- Bracket functions support one and only one argument.
- Paren functions support zero or more arguments (based on the function). A function may support a variable number of arguments... it is up to that function to declare if it is valid [or not] for the given arguments.
Notes:
- Bracket functions are designed for backwards compatibility only and are discouraged. Use should fall outside of a data standard
- Bracket functions may be deprecated in the future, subject (of course) to replacement by other means
Formula requirements that "fall onto" functions:
- Want to be able to tell if a formula is static (can cache the answer)
- Want to be able to tell if a formula is valid at LST load (find invalid variables/functions, find incorrect or invalid function arguments)
- Want to be able to get a list of variables from a formula so we can establish dependencies
- Ensure proper validation of local/global variables and catching scope problems at LST load (see use cases)
- Help avoid recalculations of unnecessary things (if desired)
Formula Manager
Located in pcgen.base.formula.manager
Design Pattern is composite Exists to simplify those things that requrie context to be resolved (legal functions, variables (which pulls in scope)) Done also to "cache" the visitors (since each visitor needs to know some of the contents in the FormulaManager, they can be lazily instantiated but then effectively cached as long as that FormulaManager is reused - especially valuable for things like the global context which in the future we can create once for the PC and never have to recreate...)
Visitors
Located in pcgen.base.formula.visitor
We have a series of visitors that can perform various functions on the parse tree. This includes, but is not limited to:
- Reconstructing the string (so formulas can be deeply processed by a converter)
- Evaluating the Formula
- Determining if a Formula is static
- Determining if a Formula is valid
- Dumping the formula to standard out
- Capturing all dependencies of a Formula.
Modifiers
Located in pcgen.base.modifier and pcgen.base.modifier.number
- Can change the value of the characteristic/variable
- Needs two sets of priority:
- Inherent Priority: Order of operations akin to mathemathical order of operations. Multiplication before addition, for example.
- User Priority: Allows the user to set the priority (effectively looked at before the inherent priority)
- Allow for arbitrary ordering of calculations (and thus resolving race conditions) without additional code intervention
- Can use the existing value, so the following are equivalent in function:
MODIFY:Hands|ADD|5 MODIFY:Hands|SOLVE|value()+5
Discussion
Why have ADD
If ADD and SOLVE with value()+x are equivalent in function, why have ADD at all?
These are NOT AT ALL equivalent in memory use or speed/performance.
The ADD occupies approximately 40 bytes of memory and is an extremely rapid calculation: If it took 200 nanoseconds I'd be surprised.
The SOLVE may occupy 500 (or more) bytes of memory (>10x) and forces a load of the formula system, so it could take 100 times as long to process as the ADD (it also took longer during LST load).
Thus: When possible the data standards should be written to use ADD, MULTIPLY or other "static" modifiers rather than SOLVE.
Why value is a function not a variable
Why does SOLVE use value() [a function] rather than something like %VALUE (a variable)?
This has to do with how the FormulaManager works.
Adding a Variable means teaching the VariableIDFactory that a new local variable is allowed. Allowing a "super local" variable (a.k.a. "temporary" variable) would mean allowing it in an arbitrary scope. So we immediately take on the burden of allowing a variable in more than one scope. (Alternately we would enable it in one scope, use it, and then delete that availability - that level of churn seems unnecessary). Even with it enabled in more than one scope, we have to teach the "variable cache" the value, and then immediately clear the value so it is not accidentally used elsewhere. We also have to build a new FormulaManager (since the FormulaManager passes itself to any functions and we have to have it pass the "correct" one - so a decorator on the FormulaManager will not work). The result of using a variable is a lot of load, use, destroy behavior (and in ways that is _definitely_ not thread safe because the "variable cache" is shared and polluted with the "local" value).
Using a function allows us to create a derivative FormulaManager that does not alter the variable storage. So the VariableIDFactory, VariableScope, and VariableStore all remain unchanged. We simply decorate the existing function library with a new function library that recognizes "value()" as a function, and pass that new FormulaManager into the Formula to be resolved. It still results in a few "temporary" objects (a FormulaManager, the decorator on the FunctionLibrary, and an EvaluateVisitor), but since it does not alter shared resources (e.g. the VariableCache), it is thread safe in addition to being significantly less effort to implement.
Why Modifier takes a FormulaManager as a parameter
Why does Modifier take FormulaManager as an argument when only one Modifier will likely ever use it? This is a balance of static vs. dynamic behavior. The two choices go something like this:
- Make the FormulaManager a parameter, knowing it will be ignored by most Modifiers (code does this today)
- Make the FormulaManager something the SolveNumberModifier must get in some other way.
"Some other way" could be a setter, so we'd have "setFormulaManager" we would need to call (which means we have an "instanceof" check to see if it is a Modifier that is a NeedsFormulaManagerModifier, and then the set - and presumably later an unset to ensure we don't have leakage). This casting and setting seems fragile and prone to error, since it is not obvious that we know all code pathways that would lead to calling the resolve method, the lack of obviousness (a contract on the developer) is a risk (an unnecessary one in my opinion). Given the small "cost" of a spare parameter, I felt a setter was not the best solution.
"Some other way" could also be during construction of the Modifier. This raises the complexity of getting a Modifier. Rather than a Modifier being constructable during LST load, we would have the LST token construct a ModifierFactory. This Factory would then be provided with the FormulaManager, and each separate PC that got the Modifier would get a different Modifier object constructed by the ModifierFactory. This, however, still might mean all of the more static modifiers would have had the FormulaManager passed into the construction request (so we gained nothing as far as passing useless parameters) and has both significantly added to complexity of the loading process as well as having objects which are different between what is loaded from LST files and what is placed on the PC. (and one of our major projects is trying to separate static and dynamic behavior!) Since this seems to have little to no advantage in terms of complexity (and miniscule advantage in terms of saving a useless parameter pass at runtime), it is not a reasonable tradeoff in my mind.
So basically, while one could claim it is not "ideal", the alternatives are worse.
Initial value of variables
- All variable types require an initial value that is not externally dependent (e.g. must be a "set" modifier")
- Initial Value is set by the type of the variable. For example, all "Integer" variables could be defaulted to zero. These defaults are loaded into the SolverFactory.
Solvers
Located in pcgen.base.solver
- Built by the SolverFactory based on the type of variable (Factory provides the initial value)
- Perform the calculation from an initial value through all the modifiers provided for that variable
- Relies on a "variable store" that stores the results of other calculations, so things like hands*5 can be used to calculate fingers.
- Can be diagnosed, so (eventually) the UI could display:
- The initial value
- Each modification that took place and the source of the Modifier
- The final value
Solver Manager
Located in pcgen.base.solver
- Is aware of the full set of Solvers for a particular system
- Maps variables to solvers
- Tracks dependencies between variables/solvers, can calculate solvers as required
- Can be "push" or "pull" on solving variables, current implementation is "aggressive" (meaning it will recalculate as soon as a dependency is updated) but not "topologically sorted" (meaning it can do more calculations than stricly necessary)
Example
Steps taken for an "aggressive solver" that is not "topologically sorted"
Assume the following was added to the PC (in this order)
DEFINE:INTEGER|Fingers DEFINE:INTEGER|Hands DEFINE:INTEGER|Feet DEFINE:INTEGER|Appendages a) MODIFY:Fingers|SET|5 b) MODIFY:Hands|SOLVE|Fingers/5 c) MODIFY:Fingers|ADD|5 d) MODIFY:Feet|SOLVE|Toes/5 e) MODIFY:Appendages|SOLVE|Fingers+Toes+Hands+Feet f) DEFINE:INTEGER|Toes g) MODIFY:Toes|ADD|10 h) MODIFY:Toes|SET|10|PRIORITY=1000
The following would take place:
a) Fingers is set to 5 (overriding the default value of zero) b) Hands is identified as being dependent on Fingers. Hands is solved to get 1. c) Fingers is set to 5 (set) + 5 (add) = 10 Hands (since it is dependent) is recalculated to produce 2. d) Feet is identified as being dependent on Toes. Feet is set to zero (toes is zero since it was never set - optional output of informational message of unset variable) e) Appendages is identified as being dependent on Fingers, Toes, Hands, Feet Appendages is calculated to be 12 (Feet is zero, and Toes is zero since it was never set - optional output of informational message of unset variable) f) Toes is defined Feet (since it is dependent) is recalculated to be 0. Since the value of feet was not changed, no update request is sent to Appendages g) Toes is calculated to be 10 (start at 0, add 10) Feet and Appendages both need to be recalculated. Note there are two options here: (g1) Perform a topological map on the dependencies to realize you should recalculate Feet then Appendages (g2) Recalculate in a random order, risking that the calculation order will be: * Appendages (because Toes changed) set to 22 * Feet set to 2 * Appendages (because Feet changed) set to 24 Note: The current implementation does not do a topological sort, asserting that the sort is more expensive than miscalculation. This can easily be tested once this is integrated into PCGen. h) Toes is set to 10. The set at Priority 1000 "resets" the value vs. the 0+10 that was originally calculated for Toes (effectively done at priority = 0).
No other updates because the value for Toes did not change.
Key Items for the Data Team to consider
There are a number of things for the data team to consider that are summarized here. Some of these are related to syntax to ensure the data team is considering items in preparation for the PROPOSAL discussion, not to attempt to answer them at this time.
Variables
Do variables need to be defined before they are used?
Here is the situation:
Ability A <> DEF:INTEGER|Hands Ability B <> MODIFY:Hands|ADD|2 <> ...(assume something uses Hands as a var) Ability C <> MODIFY:Feet|ADD|2 <> ...(assume something uses Feet as a var)
Using Ability B
Add Ability B to the PC without adding Ability A. What should happen?
Today we default any undefined item to zero, so the result of "Hands" would be zero. (This is consistent with current behavior where a BONUS:VAR| is irrelevant unless the corresponding DEFINE has been encountered.)
Should this produce an error? In other words, as we go forward, do we want to change the behavior such that the use of a variable without a corresponding DEFINE produces a runtime error?
Using Ability C
Add Ability C to the PC (assuming no other data). What should happen?
Today we default any undefined item to zero, so the result of "Feet" would be zero.
I believe this *should* produce an error because there is no DEF for "Feet" *anywhere in the data*... so it is impossible for it to ever be anything other than zero. (It is therefore useless and should be removed)
Locking Variables
Is there any use case where we need to "lock" variables where we *really* *really* wouldn't want to let data modify something (data outside the game mode)
Definition Style
Global and local definition - independent or related.
In other words, do you end up with something like:
DEF:GLOBAL|INTEGER|Hands DEF:LOCAL|INTEGER|AllowedCharms
or do you have
DEF:INTEGER|Hands DEFLOCAL:INTEGER|AllowedCharms