Difference between revisions of "Rules Persistence System"

From PCGen Wiki
Jump to: navigation, search
 
(7 intermediate revisions by the same user not shown)
Line 3: Line 3:
 
   |}
 
   |}
  
==Background==
+
=Background=
  
This document is primarily intended to communicate the design of PCGen Rules Persistence System. See the Overall System Figure [[Image:Overall_system_figure.png|frame|Block diagram of proposed CDOM structure]] to get an understanding of the place of the Rules Persistence System within the entire PCGen code base and architecture. 
+
This document is primarily intended to communicate the design of PCGen Rules Persistence System.
  
 
This document provides a detailed overview of the architecture of a specific portion of PCGen.  The overall architecture and further details of other subsystems and processes are provided in separate documents available on the [[Architecture]] page.
 
This document provides a detailed overview of the architecture of a specific portion of PCGen.  The overall architecture and further details of other subsystems and processes are provided in separate documents available on the [[Architecture]] page.
  
==Overview==
+
==Key Terms==
  
The Rules Persistence System is one of the major components of PCGen. It is responsible for loading game system and component data from the persistence data file format and saving it back into that data file format.  It is aware of the internal storage of information within PCGen only to the point it is required to store that information for use by the core of PCGen.  The Rules Persistence System is not capable of interpreting much in the way of meaning of the values it is storing.
+
; Loader
 +
: A Loader is a file used to load a specific file type within the persistent form of either the PCGen Game Mode or a specific book ("Campaign").
 +
; Token
 +
: A Token is a piece of code that parses small parts of of a file so the appropriate information can be loaded into the Rules Data Store.
 +
; Reference
 +
: A Reference is a holding object produced when a reference to an object is encountered in the data, and subsequently loaded with the underlying object when data load is complete.
 +
 
 +
 
 +
=Overview=
  
 
This document describes the Rules Persistence System, and provides guidance on how to interact with the interface/API of the Rules Persistence System.
 
This document describes the Rules Persistence System, and provides guidance on how to interact with the interface/API of the Rules Persistence System.
  
==Additional Requirements==
+
==Architectural Design==
  
In addition to the foundation [[Design Concepts for PCGen]], the following additional structural requirements are applied to the Rules Persistence System
+
It is probably significant at this point to point out that our LST language is - from a Computer Science perspective - a Domain Specific Language.
 +
 
 +
PCGen is (for the most part) strongly typed and highly structured, so the guidance we take should be from languages like Java and C++, not from languages like Perl or javascript.  Also, we have a very high incentive to get our data loading "correct" - so we should be able to catch errors up front at LST load.  So we really *want* the benefits of a "compile" step, not "parse on the fly".
  
===Catch Errors Early===
+
That overall observation gives us the ability to look at a few different aspects of how compilers work.  Specifically:
 +
* Compilers must be able to parse the source files into a format that can be processed internally. 
 +
* Compilers must consider "reference before construction" (we call it building symbol tables during compilation - i.e. "How do you know that variable reference was a declared variable?").
  
Errors in the data files should be caught during data persistence file load, and should not trigger runtime errors.
+
PCGen will have to address both of those items to successfully process files.
 
<b>Basis:</b> Given the Rules Persistence System being independent of the internal data structure, all items will be resolved at Data Load.  This will ensure that all objects in a given namespace possess a unique KEY, regardless of the source file.  In addition, all object references can be validated to ensure an object that was actually constructed and loaded.
 
  
==Key Design Decisions==
 
  
===Token/File Loader System===
+
=Parsing source files=
  
File Loaders are key components of the Rules Persistence System.  File Loader instances are specific to a given file type.  When processing a file, the File Loader splits the file into separate lines, splits the lines (if necessary) into separate tags, and then submits the tags to the Tokens.
+
==Architectural Discussion==
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]], [[Design_Concepts_for_PCGen#Data Encapsulation|Data Encapsulation]], [[Design_Concepts_for_PCGen#Increased Flexibility|Increased Flexibility]]
+
Most compilers do multiple passes at the structure of information in their source files.  There may be pre-processors, etc.  This is (often) facilitated by a parsing system that produces an object tree (via lex/yacc or JavaCC or equivalent), and it is processed with multiple different visitors, each of which can then depend on information gleaned by the previous one.  (This is actually how the new formula system parses formulas - It's a specific JavaCC syntax)
  
<b>Basis:</b> This abstracts specific individual components of the data persistence format from the internal data structure (and each other).
+
It is very difficult for PCGen to do a similar form of analysis.  Our files are not conducive to being parsed by a tree-building system, due to the inconsistent nature of many of the LST tokens.  An early version of such a parser from 2008 or so - never placed into a public repository - struggled with the exceptions and lack of consistent "reserved characters" and "separator characters" that are usually major highlights of a structured programming language.
  
<b>Implementation:</b> The interactions of Tokens, File Loaders and other elements of the Rules Persistence System is shown in Figure \ref{Fig: Flow of Data Load}.
+
However, we really *want* the benefits of a "compile" step, even though we can't build a tree.  Therefore, we have currently designed the system to do a more linear parse of the files, while (for the most part) doing a strong validation of input.
  
File Loaders are created by the Rules Persistence System for each file type.  While these may share a single Class (for code reuse), each instance is specialized to a specific data persistence format file type.  The list of available tags is also specific to a given data persistence file type.  This allows features to be limited to certain objects to avoid non-sensical situations (e.g. you can't assign material components to a Race).  A collection of Global tags that can be used in nearly all data persistence files is also available.
+
==Determining file format==
  
Each Token Class is stored in a separate file, independent of the core of PCGen, to allow each token to be independently updated, removed, or otherwise manipulated without altering or impacting other TokensIndividual Token files are in the <i>pcgen.plugin.lsttokens</i> package.  These persistence Tokens are non-abstract Classes that implement the <i>LstToken</i> interface.  When PCGen is launched, all plugins of PCGen are evaluated, and Tokens are specifically placed into the <i>TokenLibrary</i>The Tokens each have a method (<i>getTokenName()</i>) that identifies the tag the Token processes.
+
We start with the concept of a File LoaderKnowing the file format is critical to understand which of the few dozen loaders should be usedWe therefore have a set of rules so we "know" what file loader to apply to a given file.
  
By keeping each Token in an individual class, this keeps the Token Classes very simple, which makes them easy to test, modify, and understand (as they are effectively atomic to the processing of a specific token)One goal of the PCGen Rules Persistence System is to ensure that all of the parsing of LST files is done within the Tokens and not in the core of PCGen.  This makes adding new tags to the LST files to be reasonably painless (though changes to the core or export system may also be required to add required functionality).   It at least facilitates the long-term goal of altering behavior of PCGen without forcing a recompile of core PCGen code.
+
For the first pass of a load, which is loading the game mode files from disk, we know the precise format of the file, because the file names are highly rigid.  "miscinfo.lst" is a strictly required file name in a game mode(There is one and only one of that file and it must have that name and not be in a sub-directory of the game mode for the game mode to be valid). Therefore, the code can hard-code this into a sequence of lookup processes in the game mode directly that ties a specific loader to a specific file name.
  
On Transition from PCGen 5.14: PCGen 5.14 used a slightly different storage system for tokensIt stored tokens in a <i>TokenStore</i>, and that was effectively done under a Map of Maps.  The first Map was an interface identifying the type of Token, and the second Map was used to identify the token from the tag nameAs the conversion to the new Token system is being done gradually, some tokens in PCGen may remain in the PCGen 5.14 style.  New tokens will always extend (perhaps indirectly through another Interface), <i>CDOMToken</i>.  Many GameMode tokens remain in the <i>TokenStore</i>, rather than in the new <i>TokenLibarary</i>.
+
In the second pass of a load, we are looking for PCC filesWhile this is no longer strict on the exact file name, we ARE strict on the file suffix (must be PCC)This again allows us to infer the nature of the file we are processing, allowing us to build a strict association between the file name and the file format.
  
===Data Modification during Data Load===
+
In the third pass of a load, we are now data driven.  We are loading contents as defined by a PCC file.  Here, there is no longer a file name format.  Rather, the contents of the PCC file had a specific key:value syntax that defined the format of the file.  The PCC file might have contained "TEMPLATE:rsrd_templates.lst" for example, which indicates that the file "rsrd_templates.lst" is to be processed as a "TEMPLATE" file.  There is no strict requirement that these items end in ".lst" although that is certainly a convention and well enough understood that exceptions would probably be a bit mind-bending to everyone.
  
The Rules Persistence System supports modifying, copying or forgetting objects defined in the data persistence files.
+
==Parsing an LST/PCC file==
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]], [[Design_Concepts_for_PCGen#Data Encapsulation|Data Encapsulation]], [[Design_Concepts_for_PCGen#Increased Flexibility|Increased Flexibility]]
+
Each LST file type has an associated *Loader class within the pcgen.persistence.lst or pcgen.rules.persistence package. Spells, for example, are loaded using the SpellLoader class.  In general, the pcgen.persistence.lst items are older and pcgen.rules.persistence.* is the newer system for doing load from LST files. 
 +
Within a file loader, we parse the file line-by-line.  In most files, lines are independent (each line represents a separate object)
  
<b>Basis:</b> This allows users to modify base data to easily produce new Races, Abilities, or other items without risk of copy/paste error.
+
There are three major file formats we are dealing with:
  
<b>Implementation:</b> The data persistence file format supports three special functions that can be performed on data persistence entries.
+
; Command-based
 +
: The first set are individual commands that will occur on a single object.  This occurs, for example, in the "miscinfo.lst" file.  Each line is processed and loaded into the GameMode object.  Most of the Game Mode files are of this form, as are the PCC files.  The GLOBALMODIFIER file in the data directory also operates this way.  Since this can be seen as a slighly degenerate form of an object-based load (see below), this is not discussed in any detail in this document.
  
:<b>.COPY:</b> allows a data file to copy an existing object. This .COPY entry need not worry about file load order (see [[Rules_Persistence_System#Data Persistence File Load Order Independence|Data Persistence File Load Order Independence]]).  The value preceding the .COPY string identifies the object to be copied.  This identifier is the KEY (or KEY and CATEGORY) of the object to be copied.  The identifier for the copied object is placed after an equals sign that follows the .COPY String, e.g.: Dodge.COPY=MyDodge
+
; Object-Based
:<b>.MOD</b> allows a data file to modify an existing object.  This .MOD entry need not worry about file load order (see [[Rules_Persistence_System#Data Persistence File Load Order Independence|Data Persistence File Load Order Independence]]).  All .MOD entries will be processed after all .COPY entries, regardless of the source file.  The value preceding the .MOD string identifies the object to be modified.  This identifier is the KEY (or KEY and CATEGORY) of the object to be modified.  If more than one .COPY token produces an object with the same identifier, then a duplicate object error will be generated.
+
: This set of files creates one object for each line (or the line represents a modification of an existing object).  The majority of our LST files in the data directory are processed this way, as does the stats and checks file in a Game ModeThis is discussed in more detail below.
:<b>.FORGET:</b> allows a data file to remove an existing object from the Rules Data Store.  This .FORGET entry need not worry about file load order
 
(see [[Rules_Persistence_System#Data Persistence File Load Order Independence|Data Persistence File Load Order Independence]]).  All .FORGET entries will be processed after all .COPY and .MOD entries, regardless of the source file.  The value preceding the .FORGET string identifies the object to be removed from the Rules Data Store.
 
  
===Subtokens===
+
; Batch-based
 +
: The CLASS and KIT files are a major exception to the object-based description above, since they are blocks of information with a new "CLASS:x" or "STARTKIT" line representing the split to a new item.  Investigation of the loading of those files is currently left as an exercise for the reader. ***This should actually be included as it is relevant to future direction
  
Some tags have complex behavior that significantly differs based on the first argument in the value of the tag.  In order to simplify tag parsing and Token code, these Tokens implement a Sub-token structure, which delegates parsing of the tag value to a Token specialized to the first argument in the value of the tag.
 
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Data Encapsulation|Data Encapsulation]], [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]], [[Design_Concepts_for_PCGen#Increased Flexibility|Increased Flexibility]]
+
=Object-based file loading=
  
<b>Basis:</b> This design is primarily intended to separate out code for different subtokens.  This provides increased ability to add new subtokens without altering existing code.  This provides increased flexibility for developers, and ensures that unexpected side effects from code changes don't impact other features of PCGen.
+
For the majority of our files, the first entry on a line represents the ownership and behavior for that line.  This can take a few formats, but in general takes one of these two forms:
 +
<pre>
 +
PREFIX:DisplayName
 +
PREFIX:Key.MODIFICATION
 +
</pre>
  
<b>Implementation:</b> The flow of events during Data Load when Subtokens are present is shown as an optional series of events in Figure \ref{Fig: Flow of Data Load}.
+
The PREFIX may be empty/missing depending on the file type.  The PREFIX may be something like ALIGNMENT: to indicate an alignment.  This is done in files that can define more than one format.  (e.g. Stats and checks used to be shared when they were stored in the game mode)
  
The LoadContext is capable of processing subtokens for a given Token.  Any token which delegates to subtokens can call <i>processSubToken(T, String, String, String)</i> from LoadContext in order to delegate to subtokens.  This delegation will return a boolean value to indicate success (<i>true</i>) or failure (<i>false</i>) of the delegation.  The exact cause of the failure is reported to the <i>Logging</i> utility.
+
The DisplayName is the starting name of the object.  
  
Note that it is legal for a subtoken to only be valid in a single object type (such as a Race), even if the "primary" token is accepted universallyThis greatly simplifies the restriction of subtokens to individual file types without producing burden on the primary token to establish legal values.  Resolution of those restrictions is handled entirely within the LoadContext and its supporting classes.
+
For a modification (or any reference to the object), the KEY MUST be usedIf no KEY: token is provided, then the DisplayName serves as the KEY.
  
===Rules Persistence System I/O===
+
The MODIFICATION is COPY=x, MOD or FORGET.
 +
;.COPY:
 +
: Allows a data file to copy an existing object. This .COPY entry need not worry about file load order (see below). The value preceding the .COPY string identifies the object to be copied. This identifier is the KEY (or KEY and CATEGORY) of the object to be copied. The identifier for the copied object is placed after an equals sign that follows the .COPY String, e.g.: Dodge.COPY=MyDodge
 +
;.MOD
 +
: Allows a data file to modify an existing object. This .MOD entry need not worry about file load order (see below). All .MOD entries will be processed after all .COPY entries, regardless of the source file. The value preceding the .MOD string identifies the object to be modified. This identifier is the KEY (or KEY and CATEGORY) of the object to be modified. If more than one .COPY token produces an object with the same identifier, then a duplicate object error will be generated.
 +
; FORGET
 +
: Allows a data file to remove an existing object from the Rules Data Store. This .FORGET entry need not worry about file load order (see below).  All .FORGET entries will be processed after all .COPY and .MOD entries, regardless of the source file. The value preceding the .FORGET string identifies the object to be removed from the Rules Data Store.
  
The input and output of data persistence information should be an integral part of the Rules Persistence System.  In versions up to and including PCGen 5.14, Tokens and the Rules Persistence System were only responsible for input from the data persistence file format.  Starting with PCGen 5.16, the Rules Persistence System is responsible for both input and output of Tokens.
+
==Data Persistence File Load Order Independence==
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Data Encapsulation|Data Encapsulation]], [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]]
+
This provides specific clarity on the the Order of Operations during file loading.
  
<b>Basis:</b> Adding output to the persistence system provides the ability to reuse the Rules Persistence System in a data file editor, as well as the runtime system.  This sharing of code helps to guarantee the integrity of the data file editorSuch a structure also facilitates unit testing, as the Rules Persistence System can be tested independently of the core code.
+
When files are loaded, they are processed in order as the lines appear in the file, unless the line is a MODIFICATION.  If it is a modification, it is processed after normal loading is complete.  Note this means ALL FILES of a given format (e.g. TEMPLATE) are loaded with their DisplayName lines processed before ANY .COPY is processed.  All .COPY items are processed before any .MOD items are processed.  All .MOD items are processed before any .FORGET items are processed(Note that strictly this is Base/Copy/Mod/Forget by object type, it doesn't strictly inhibit parallelism between file types during file load).
 +
This order of operations is necessary so that a second file can perform a .COPY or .MOD on the contents of another file.  It is also important to recognize that .COPY occurs before .MOD, which gives strict consideration to what items may want to appear on the original line vs in a .MOD line as they are not always equivalent.
  
<b>Implementation:</b> Each token has the ability to both "parse" and "unparse" information for the Rules Persistence System.  Parsing is the act of reading a token value from a data persistence file and placing it into the internal rules data structure.  Unparsing is the act of reading the internal data structure and writing out the appropriate syntax into a data persistence file.
+
==Source Information==
  
In addition to other benefits, this parse/unparse structure allows Tokens to be tested without major dependence on other components of PCGenThese tests are found in <i>plugin.lsttokens package</i> of the <i>code/src/utest</i> source directory.
+
There is one additional exception to the file processing as described above.  If a line starts with a SOURCE*: token, then that line is processed as "persistent information" for that fileAll items on that line will be applied to ALL items in the file. This should be limited to just source information that needs to be universally applied to included objects.
  
As explained in Section \ref{Token/File Loader System}, Token/File Loader System, the File Loaders separate out the tags in an input file and call the parse method on the appropriate Tokens.  In order to unparse a loaded object back to the data persistence syntax, the all Tokens that could be used in the given object type must be called (this makes unparse a bit more CPU intensive than parse)
 
  
Unparsing a particular object is managed by the <i>unparse(T)</i> method of <i>LoadContext</i>.  This process includes delegation of the unparse to all subtokens (See section \ref{Subtokens}), as depcited in Figure \ref{Fig: Flow of Data Unload}.
+
=Tokens=
  
Because all tokens are called when unparsing an object, it is important that tokens properly represent when they are not used.  This is done by returning <i>null</i> from the <i>unparse</i> method of the Token.
+
Subsequent entries on a line represent tags/tokens on that object to give it information and behavior within PCGen.
  
Some tokens can be used more than once in a given object (e.g. BONUS), and thus must be capable of indicating each of the values for the multiple tag instances.  Since Tokens do not maintain state, the unparse method must only be called a single time to get all of the values; thus, the unparse method returns an array of String objects to indicate the list of values for each instance of the tag being unparsed.
+
In general, the format of a token is:
 +
<pre>
 +
NAME:VALUE
 +
</pre>
  
The context is responsible for including the name of the tag in the unparsed result. Just as the token is not responsible for removing/ignoring the name of the tag in the value passed into the <i>parse</i> method, it does not prepend the name of the tag to the value(s) returned from the <i>unparse</i> method. (This also happens to simplify the conversion and compatibility systems.)
+
The list of available tokens is specific to a given data persistence file type. This allows features to be limited to certain objects to avoid non-sensical situations (e.g. you can't assign material components to a Race). A collection of Global tags that can be used in nearly all data persistence files is also available.
  
===Independent Data Persistence Interface===
+
The exact processing occurs within the plugins that are loaded to process each token.  Each Token Class is stored in a separate file/class, independent of the core of PCGen, to allow each token to be independently updated, removed, or otherwise manipulated without altering or impacting other Tokens.
  
The Data Persistence format must be independent of internal data structure. (The subsystems of PCGen other than the Rules Persistence System should not have detailed knowledge of the data persistence file format).
+
This also forces the Token Classes to be fairly simple, which makes them easy to test, modify, and understand (as they are effectively atomic to the processing of a specific token). One goal of the PCGen Rules Persistence System is to ensure that all of the parsing of LST files is done within the Tokens and not in the core of PCGen. This makes adding new tags to the LST files to be reasonably painless (though changes to the core or export system may also be required to add required functionality).
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]], [[Rules_Persistence_System#Catch Errors Early|Catch Errors Early]]
+
Individual Token files are in the pcgen.plugin.lsttokens package.  Many may rely on abstract classes provided in pcgen.rules.persistence.token.
 +
When PCGen is launched, JARs that are within the Plugin directory are parsed for their contents. This actually happens in the gmgen.pluginmgr.JARClassLoader Class. As one of many operations that takes place during the import, each Class is analyzed to determine if it is a persistence Token (a persistence Token is defined as a non-abstract Class that implements the LstToken interface). When a persistence Token is found, it is imported into the TokenLibrary or TokenStore.
  
<b>Basis:</b> This abstracts the data persistence format from the internal data structure.  It forces the entire persistence contents to be parsed on data load.  This ensures any errors in data files are caught in the Rules Persistence System at data load, rather than at runtime.
+
==Discussion==
  
<b>Implementation:</b> During the load of data from the data persistence format, each Token is required to fully parse the information and validate the information as much as possible. This ensures that errors in the data files are caught as they are loaded, and not at runtime.  The Rules Persistence System is responsible for ensuring data integrity of the rules data to the rest of the PCGen system, and the Tokens are the "front lines" of fulfilling that responsibility.
+
As with any architecture, there are tradeoffs in having a plugin system. The first of these is in code association within the PCGen system. Due to the plugin nature (and the use of reflection) there are certain use-associations which cannot be made within an Integrated Development Environment (IDE) such as Eclipse. For example, it is impossible to find where a TemplateToken is constructed by automated search, as it is constructed by a Class.newInstance() call.
  
Beyond the tokens, a load subsystem translates between the data persistence file format parsed by the Tokens and the internal data structure. This system arguably fits the Data Mapper design pattern, although it's not strictly using relational databases. This system is currently known as a <i>LoadContext</i>.  The details of translation takes various forms, and those structures are explained in later sections.
+
One quirk with the plugin system is also that it occasionally requires full rebuilds of the code in order to ensure the core code and the plugins are "in sync" on their functionality. This is reasonably rare, but is a result of the lack of a hard dependency tree in the code (really, the same problem IDEs have in determining usage)
  
===Only valid Tokens may impact the Rules Data Store===
+
There are also some great advantages to a plugin system.
  
There is a risk that a partially-parsed Token from an invalid data persistence entry could lead to an unknown state within the Rules Data Store. Therefore, a Token should only impact the state of the Rules Data Store if the token parse completes successfully.  The Token should not be responsible for tracking successful completion; rather the load subsystem implements a 'unit of work' design pattern to ensure only valid tokens impact the Rules Data Store.
+
By using reflection to perform the import of the classes and using reflection to inspect those classes, some associations can be made automatically, and do not require translation tables. By having all of the information directly within the Token Classes, a 'contract' to update multiple locations in the code (or parameter files) is avoided. There is also a minimal amount of indirection (the indirection introduced by TokenStore's Token map is very easy to understand).
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]]
+
The addition of a Token Class to the Plugin JAR will allow the new Token to be parsed. This makes adding new tags to the LST files to be reasonably painless (actually having it perform functions in the PCGen core is another matter :) )
  
<b>Basis:</b> This greatly simplifies the implementation of Tokens, as they are not required to analyze or defer method calls to the <i>LoadContext</i> until after the data persistence syntax is established to be valid.
+
Also, By keeping each Token in an individual class, this keeps the Token Classes very simple, which makes them easy to test, modify, and understand (as they are effectively atomic to the processing of a specific token).
  
<b>Implementation:</b> During the load of data from the data persistence format, each Token may fully parse the provided value and make any necessary calls to the LoadContext.  This can be done even if subsequent information indicates to the Token that there is an error in the Token value.  Specifically, individual Tokens should be free to take any action on the <i>LoadContext</i>, and are not responsible for the consequences of those method calls unless the Token indicates that the value from the data persistence format was indicated to be valid.  This indication of validity is by returning <i>true</i> from the parse method of the Token.
+
In the future, we may also be able to defer some loading of plugins until after the game mode has loaded, allowing us to only activate and load those tokens relevant for a specific game mode.  Specifically, it would be nice to not have to process any ALIGNMENT based tokens in MSRD, for example (and to have them all automatically be errors as well).  This need may be mitigated by the more data driven design we are working to develop.
  
If a Token returns <i>true</i>, indicating the token was valid, then the File Loader that called the Token is responsible for indicating to the <i>LoadContext</i> to <i>commit()</i> the changes defined by the Token.  This process is shown in the "Transaction Success Response" section in Figure \ref{Fig: Flow of Data Load}.
+
===Future Work===
  
If the Token returns <i>false</i>, then the File Loader is responsible for calling the <i>rollback()</i> method of <i>LoadContext</i> to indicate no changes should made to the Rules Data Store and the tentative changes proposed by the Token should be discardedThis proces is shown in the "Transaction Failure Response" section in Figure \ref{Fig: Flow of Data Load}.
+
It would be nice if there were a method of forcing the isolation without having a slew of JAR files... sunsetting the need to update pluginbuild.xml when a new tokens is created would be nice as wellSo there is probably an architectural choice here that involves the tradeoff between separate tokens, token discovery, contract to have to update pluginbuild.xml, and modularity.
  
The [[Load Commit Subsystem]] provides additional detail.
+
==Identifying the Token==
  
===Data Persistence File Load Order Independence===
+
In determining which token is used, two items are relevant.  First, the name of the token, second, the Class of object processed by the token.  If two tokens are found during plugin load that share the same name and class processed, an error is thrown during PCGen startup.
  
Items in the rules structure may refer to each other, by granting certain features, possessing certain prerequisites, or by other means. For example, an Ability A may grant Ability B, but we cannot reasonably require that Ability A appears before Ability B.  More specifically, due to known interactions, it is impossible to choose a load order for files and entries that guarantees objects will be constructed before references to those objects are encountered.  Order independence of persistent data is therefore an architectural requirement.
+
How are token conflicts resolved? If two tokens have the same key (String before the : in the LST file), AND implement the same persistence Token Interface (e.g. PCClassLSTToken), then an error will be reported by the TokenStore class when the plugin JAR files are loaded.
  
<b>Underlying Requirement(s):</b> [[Design_Concepts_for_PCGen#Information Hiding|Information Hiding]], [[Design_Concepts_for_PCGen#Data Encapsulation|Data Encapsulation]], [[Rules_Persistence_System#Catch Errors Early|Catch Errors Early]]
+
The TokenStore is the older method of storing the tokens.  In these cases, the tokens must be an exact match to both the name (case insensitive - but by convention they are capitalized in the LST files) and the class of object being processed.  The TokenStore effectively has a Map<Class, Map<String, Token>>.
  
<b>Basis:</b> Using references before objects are constructed to ensure full parsing of the data persistence file syntax during load improves error catching capability at load time and should improve runtime performance.   
+
For the TokenLibary more flexibility is allowed.  If an object like a Language is being processed, then first, the system will look for tokens that match Language.class exactly.  If that fails, then the system will use reflection on Language.class to determine the parent class and see if a token of the appropriate NAME exists at that levelThis is repeated until a relevant token plugin is found or the token is determined to be invalid.
  
<b>Implementation:</b> A two pass load system is required in order to ensure separation of the data persistence format and the internal data structureIn PCGen 5.16, any Token may request a reference to an object, regardless of whether that object has  been constructed in the <i>LoadContext</i>.  This is done through a <i>ReferenceContext</i>For more information on the design of this system as well as the types of CDOM References that exist and that the <i>ReferenceContext</i> is capable of returning, see [[CDOM References Concept Document]]
+
This lookup starts within the TokenLibraryWithin that TokenLibrary, exists multiple TokenFamily objects. Each version of PCGen can have its own TokenFamily.  This allows tokens that support backwards compatibility to be contained separately from the primary tokens.   
  
The references requested by the tokens can then be placed into objects (Abilities, Skills, etc.) and the underlying object(s) to which the reference refers can be established at runtime.
+
In some cases there are both Global tags and "local" tags that have the same key (e.g. "TEMPLATE"). As described above, the the "local" key (one that is specific to a certain type of LST file) would take priority over the Global Token. This is the case with TEMPLATE, as the Global tag processing takes place in a call to PObjectLoader.parseTagLevel(), far below the PCClass-specific processing that takes place early in PCClassLoader.parseClassLine()
  
There are two issues introduced with a system that is capable of referencing objects before they are constructed. 
+
===Future need: Interface Tokens===
  
The first issue is that references might be made to objects that don't existThis problem  cannot be detected until the entire load operation is completeThe Rules Persistence System makes a call to the <i>validate()</i> method of <i>ReferenceContext</i> to test whether any references were made where the appropriate referred-to object was not constructed during data persistence file load. In order to provide for minimal functionality without truly understanding the reference, PCGen 5.16 constructs a dummy (empty) object with the given identifier.
+
The current system does suffer from a number of issues.  Some of our current "global" tokens really aren't globalThey may be global in as much as an item is "granted" to a PC, but would fail on other object typesFor other situations, we have begun to move away from the heavyweight and complicated CDOMObject/PObject into a more lightweight object, but we want to share behavior (and load tokens) there as well.
  
Second, the References constructed during data persistence file load must be resolved before they are used during "runtime"Therefore, the Rules Persistence System is responsible for resolving any references after a collection of Campaigns are loaded. This resolution is driven through the <i>resolveReferences()</i> method of <i>LoadContext</i>. Due to the construction of dummy objects during the <i>validate()</i> step, <i>resolveReferences()</i> must be called after <i>validate()</i>.
+
The existing TokenLibrary system has a few weaknesses with that new desireAs we rely more on interfaces than direct inheritance, TokenLibrary will begin to fail.
  
===Shared Persistence System with Editor===
+
We therefore need some infrastructure to load tokens based on the available interfaces on an object as well.  Note that this will produce an ambiguity we will need to resolve.  For example, if there is a REACH token that is appropriate for both CDOMObject.class and SomeInterface.class, then we need a bright-line rule as to which token will apply (or if sharing a name between hard-class based tokens and interface tokens produces an error).
  
The data persistence system should be usable for both a data file editor and the runtime character generation program.
+
==Token processing order==
  
<b>Underlying Requirement(s):</b> Code Reuse (general design characteristic), [[Rules_Persistence_System#Catch Errors Early|Catch Errors Early]]
+
In general, all tokens are processed in the order they are encountered. 
  
<b>Basis:</b> A significant investment made in ensuring that persistent data is read without errors should be reused across both a data file editor and the runtime system.  Consolidation reduces the risk of error and ensures that the editor will always be up to date (a problem in PCGen 5.14).  In addition, additional editing capabilities (e.g. edit data in place) that are not available in PCGen 5.14 can be added once a full-capability editor is available.
+
One exception is CATEGORY: in Ability, which must be on the original line (illegal on COPY/MOD lines), and which is processed by the Loader.
  
<b>Implementation:</b> The Rules Persistence System is responsible for tracking detailed changes made by the Tokens during Data Load (see [[Rules_Persistence_System#Only valid Tokens may impact the Rules Data Store|Only valid Tokens may impact the Rules Data Store]])As a result, this information allows the load system to serve as a runtime load system and a file editor load system.
+
Being processed in the order they are encountered does not mean that they are applied to the PC in the order in which they appear on the given lineThat order of operations is defined within the core.
  
As noted in [[Rules_Persistence_System#Rules Persistence System I/O|Rules Persistence System I/O]], tags may overwrite previous values or add to the set of values for that tag.  In the case of an editor, it is critically important not to lose information that would later be overwritten in a runtime environment.  A simple example would be the use of a .MOD to alter the number of HANDS on a Race.  This alteration should be maintained in the file that contained the .MOD and the value (or unspecificied default) in the original Race should not be lost.  This is done by tracking the exact changes that occur during data load.  This is fully explained in the [[Load Commit Subsystem]].
+
==Subtokens==
  
===Token Compatibility===
+
Some tags have complex behavior that significantly differs based on the first argument in the value of the tag. In order to simplify tag parsing and Token code, these Tokens implement a Sub-token structure, which delegates parsing of the tag value to a Token specialized to the first argument in the value of the tag.
  
***PLACEHOLDER: Describe Compatibility system and impact on TokenLibrary***
+
This design is primarily intended to separate out code for different subtokens.  This provides increased ability to add new subtokens without altering existing code.  This provides increased flexibility for developers, and ensures that unexpected side effects from code changes don't impact other features of PCGen.
  
===Identify Appropriate Token===
+
Note that it is legal for a subtoken to only be valid in a single object type (such as a Race), even if the "primary" token is accepted universally.  This greatly simplifies the restriction of subtokens to individual file types without producing burden on the primary token to establish legal values.  Resolution of those restrictions is handled entirely within the LoadContext and its supporting classes.
  
***PLACEHOLDER: Need a separate document(?) to describe how the appropriate token is selected, and how that works with compatibility tokens
+
==Re-entrant tokens==
  
==Characteristics/Weaknesses of the existing system==
+
There are a few tokens that allow you to drill into a separate object and then apply another token.  In Equipment for example:
 +
<pre>
 +
PART:1|...
 +
</pre>
  
===Prerequisite Tags===
+
In this case the ... above is another token.  This means that the token will have a second ':' used as a separator.  In general (but not universally the case), an embedded ':' as a separator indicates a re-entrant token.
  
Currently the Prerequisite tags are an exception to the parsing system.  The Prerequisite tags have a prefix of "PRE" and are followed by the Prerequisite name, e.g. PREFEAT.  This means that the Prerequisite tags do not follow the traditional method of having a unique name before the colon.  Also, Prerequisite tags can have a leading ! to negate the Prerequisite.
+
==Prerequisite Tags==
  
In order to address this situation of a different token definition system, the <i>PreComatibilityToken</i> provides a wrapper into the new PCGen 5.16 token syntax.  
+
Currently the Prerequisite tags are an exception to the parsing system. The Prerequisite tags have a prefix of "PRE" and are followed by the Prerequisite name, e.g. PREFEAT. This means that the Prerequisite tags do not follow the traditional method of having a unique name before the colon. Also, Prerequisite tags can have a leading ! to negate the Prerequisite.
  
[https://sourceforge.net/tracker/index.php?func=detail&aid=1782186&group_id=25576&atid=384722 FREQ 1782186]  exists to convert the Prerequisite tags into two separate buckets, PRE: and REQ: (Prerequisites and Requirements) based on their current behavior.
+
In order to address this situation of a different token definition system, the PreComatibilityToken provides a wrapper into the new PCGen 5.16+ token syntax.
  
===Class Wrapped Token===
+
==Class Wrapped Token==
  
A <i>ClassWrappedToken</i> provides compatibility for previously allowed bad behavior in data files.
+
A ClassWrappedToken provides compatibility for previously allowed bad behavior in data files.
  
Many Class tokens in PCGen versions up to 5.14 ignored the class level, so they are technically Class tags and not CLASSLEVEL tags. Yet, PCGen 5.14 allows those tags to appear on class level lines. This is a bit deceptive to users in that the effect will always be on the class, and not appear on the specified level.
+
Many Class tokens in PCGen versions up to 5.14 ignored the class level, so they are technically Class tags and not CLASSLEVEL tags. Yet, PCGen 5.14 allows those tags to appear on class level lines. This is a bit deceptive to users in that the effect will always be on the class, and not appear on the specified level.
  
Unfortunately, one cannot simply remove support for using CLASS tokens on CLASSLEVEL lines, because if they are used at level 1, then they are equivalent to appearing on a CLASS line.   Certainly, the data monkeys use it that way. For example, Blackguard in RSRD advanced uses EXCHANGELEVEL on the first level line.
+
Unfortunately, one cannot simply remove support for using CLASS tokens on CLASSLEVEL lines, because if they are used at level 1, then they are equivalent to appearing on a CLASS line. Certainly, the data monkeys use it that way. For example, Blackguard in RSRD advanced uses EXCHANGELEVEL on the first level line.
  
 
Therefore the entire ClassWrappedToken system is a workaround for data monkeys using CLASS tokens on CLASSLEVEL lines, and therefore it should only work on level one, otherwise expectations for when the token will take effect are not set.
 
Therefore the entire ClassWrappedToken system is a workaround for data monkeys using CLASS tokens on CLASSLEVEL lines, and therefore it should only work on level one, otherwise expectations for when the token will take effect are not set.
 +
 +
===Future Work===
 +
 +
This should eventually be removed, so that it is discretely clear from reading the data where a token is legal and where not.
 +
 +
==Format of the value of a token==
 +
 +
In most cases, we use a vertical pipe to separate different components of a VALUE.  Each Token can process the exact contents and load the appropriate information into the Rules Data Store. 
 +
 +
The format of each token is within the documentation of PCGen.
 +
 +
==Unparsing==
 +
 +
Adding output to the persistence system provides the ability to reuse the Rules Persistence System in a data file editor, as well as the runtime system. This sharing of code helps to guarantee the integrity of the data file editor. Such a structure also facilitates unit testing, as the Rules Persistence System can be tested independently of the core code.
 +
 +
All tokens loaded into the TokenLibrary (but not those in the TokenStore) has the ability to both "parse" and "unparse" information for the Rules Persistence System. Parsing is the act of reading a token value from a data persistence file and placing it into the internal rules data structure. Unparsing is the act of reading the internal data structure and writing out the appropriate syntax into a data persistence file.
 +
 +
In addition to other benefits, this parse/unparse structure allows Tokens to be tested without major dependence on other components of PCGen. These tests are found in plugin.lsttokens package of the code/src/utest source directory.
 +
 +
===Shared Persistence System with (future) Editor===
 +
 +
The data persistence system should be usable for both a data file editor and the runtime character generation program.
 +
 +
A significant investment made in ensuring that persistent data is read without errors should be reused across both a data file editor and the runtime system. Consolidation reduces the risk of error and ensures that the editor will always be up to date (a problem that caused its removal). In addition, additional editing capabilities (e.g. edit data in place) that are not available today can be added once a full-capability editor is available.
 +
 +
Tokens may overwrite previous values or add to the set of values for that tag. In the case of an editor, it is critically important not to lose information that would later be overwritten in a runtime environment. A simple example would be the use of a .MOD to alter the number of HANDS on a Race. This alteration should be maintained in the file that contained the .MOD and the value (or unspecificied default) in the original Race should not be lost. This is done by tracking the exact changes that occur during data load. This ability to handle changes is fully explained in the Load Commit Subsystem.
 +
 +
===Unparsing in practice===
 +
 +
The File Loaders separate out the tags in an input file and call the parse method on the appropriate Tokens. In order to unparse a loaded object back to the data persistence syntax, the all Tokens that could be used in the given object type must be called (this makes unparse a bit more CPU intensive than parse).
 +
 +
Unparsing a particular object requires delegation of the unparse to all tokens subtokens to see if they were used.  Because all tokens are called when unparsing an object, it is important that tokens properly represent when they are not used. This is done by returning null from the unparse method of the Token.
 +
 +
Some tokens can be used more than once in a given object (e.g. BONUS), and thus must be capable of indicating each of the values for the multiple tag instances. Since Tokens do not maintain state, the unparse method must only be called a single time to get all of the values; thus, the unparse method returns an array of String objects to indicate the list of values for each instance of the tag being unparsed.
 +
 +
Token should not include the name of the tag in the unparsed result. Just as the token is not responsible for removing/ignoring the name of the tag in the value passed into the parse method, it does not prepend the name of the tag to the value(s) returned from the unparse method. (This also happens to simplify the conversion and compatibility systems.)
 +
 +
=Further Reading=
 +
 +
To understand more about how PCGen handles the reference of an object before it is constructed, see [[CDOM References Concept Document]]
 +
 +
To understand more about how PCGen handles communicating information from the tokens to the Rules Data Store, see [[Load Commit Subsystem]]

Latest revision as of 21:23, 25 February 2018

Background

This document is primarily intended to communicate the design of PCGen Rules Persistence System.

This document provides a detailed overview of the architecture of a specific portion of PCGen. The overall architecture and further details of other subsystems and processes are provided in separate documents available on the Architecture page.

Key Terms

Loader
A Loader is a file used to load a specific file type within the persistent form of either the PCGen Game Mode or a specific book ("Campaign").
Token
A Token is a piece of code that parses small parts of of a file so the appropriate information can be loaded into the Rules Data Store.
Reference
A Reference is a holding object produced when a reference to an object is encountered in the data, and subsequently loaded with the underlying object when data load is complete.


Overview

This document describes the Rules Persistence System, and provides guidance on how to interact with the interface/API of the Rules Persistence System.

Architectural Design

It is probably significant at this point to point out that our LST language is - from a Computer Science perspective - a Domain Specific Language.

PCGen is (for the most part) strongly typed and highly structured, so the guidance we take should be from languages like Java and C++, not from languages like Perl or javascript. Also, we have a very high incentive to get our data loading "correct" - so we should be able to catch errors up front at LST load. So we really *want* the benefits of a "compile" step, not "parse on the fly".

That overall observation gives us the ability to look at a few different aspects of how compilers work. Specifically:

  • Compilers must be able to parse the source files into a format that can be processed internally.
  • Compilers must consider "reference before construction" (we call it building symbol tables during compilation - i.e. "How do you know that variable reference was a declared variable?").

PCGen will have to address both of those items to successfully process files.


Parsing source files

Architectural Discussion

Most compilers do multiple passes at the structure of information in their source files. There may be pre-processors, etc. This is (often) facilitated by a parsing system that produces an object tree (via lex/yacc or JavaCC or equivalent), and it is processed with multiple different visitors, each of which can then depend on information gleaned by the previous one. (This is actually how the new formula system parses formulas - It's a specific JavaCC syntax)

It is very difficult for PCGen to do a similar form of analysis. Our files are not conducive to being parsed by a tree-building system, due to the inconsistent nature of many of the LST tokens. An early version of such a parser from 2008 or so - never placed into a public repository - struggled with the exceptions and lack of consistent "reserved characters" and "separator characters" that are usually major highlights of a structured programming language.

However, we really *want* the benefits of a "compile" step, even though we can't build a tree. Therefore, we have currently designed the system to do a more linear parse of the files, while (for the most part) doing a strong validation of input.

Determining file format

We start with the concept of a File Loader. Knowing the file format is critical to understand which of the few dozen loaders should be used. We therefore have a set of rules so we "know" what file loader to apply to a given file.

For the first pass of a load, which is loading the game mode files from disk, we know the precise format of the file, because the file names are highly rigid. "miscinfo.lst" is a strictly required file name in a game mode. (There is one and only one of that file and it must have that name and not be in a sub-directory of the game mode for the game mode to be valid). Therefore, the code can hard-code this into a sequence of lookup processes in the game mode directly that ties a specific loader to a specific file name.

In the second pass of a load, we are looking for PCC files. While this is no longer strict on the exact file name, we ARE strict on the file suffix (must be PCC). This again allows us to infer the nature of the file we are processing, allowing us to build a strict association between the file name and the file format.

In the third pass of a load, we are now data driven. We are loading contents as defined by a PCC file. Here, there is no longer a file name format. Rather, the contents of the PCC file had a specific key:value syntax that defined the format of the file. The PCC file might have contained "TEMPLATE:rsrd_templates.lst" for example, which indicates that the file "rsrd_templates.lst" is to be processed as a "TEMPLATE" file. There is no strict requirement that these items end in ".lst" although that is certainly a convention and well enough understood that exceptions would probably be a bit mind-bending to everyone.

Parsing an LST/PCC file

Each LST file type has an associated *Loader class within the pcgen.persistence.lst or pcgen.rules.persistence package. Spells, for example, are loaded using the SpellLoader class. In general, the pcgen.persistence.lst items are older and pcgen.rules.persistence.* is the newer system for doing load from LST files. Within a file loader, we parse the file line-by-line. In most files, lines are independent (each line represents a separate object).

There are three major file formats we are dealing with:

Command-based
The first set are individual commands that will occur on a single object. This occurs, for example, in the "miscinfo.lst" file. Each line is processed and loaded into the GameMode object. Most of the Game Mode files are of this form, as are the PCC files. The GLOBALMODIFIER file in the data directory also operates this way. Since this can be seen as a slighly degenerate form of an object-based load (see below), this is not discussed in any detail in this document.
Object-Based
This set of files creates one object for each line (or the line represents a modification of an existing object). The majority of our LST files in the data directory are processed this way, as does the stats and checks file in a Game Mode. This is discussed in more detail below.
Batch-based
The CLASS and KIT files are a major exception to the object-based description above, since they are blocks of information with a new "CLASS:x" or "STARTKIT" line representing the split to a new item. Investigation of the loading of those files is currently left as an exercise for the reader. ***This should actually be included as it is relevant to future direction


Object-based file loading

For the majority of our files, the first entry on a line represents the ownership and behavior for that line. This can take a few formats, but in general takes one of these two forms:

PREFIX:DisplayName
PREFIX:Key.MODIFICATION

The PREFIX may be empty/missing depending on the file type. The PREFIX may be something like ALIGNMENT: to indicate an alignment. This is done in files that can define more than one format. (e.g. Stats and checks used to be shared when they were stored in the game mode)

The DisplayName is the starting name of the object.

For a modification (or any reference to the object), the KEY MUST be used. If no KEY: token is provided, then the DisplayName serves as the KEY.

The MODIFICATION is COPY=x, MOD or FORGET.

.COPY
Allows a data file to copy an existing object. This .COPY entry need not worry about file load order (see below). The value preceding the .COPY string identifies the object to be copied. This identifier is the KEY (or KEY and CATEGORY) of the object to be copied. The identifier for the copied object is placed after an equals sign that follows the .COPY String, e.g.: Dodge.COPY=MyDodge
.MOD
Allows a data file to modify an existing object. This .MOD entry need not worry about file load order (see below). All .MOD entries will be processed after all .COPY entries, regardless of the source file. The value preceding the .MOD string identifies the object to be modified. This identifier is the KEY (or KEY and CATEGORY) of the object to be modified. If more than one .COPY token produces an object with the same identifier, then a duplicate object error will be generated.
FORGET
Allows a data file to remove an existing object from the Rules Data Store. This .FORGET entry need not worry about file load order (see below). All .FORGET entries will be processed after all .COPY and .MOD entries, regardless of the source file. The value preceding the .FORGET string identifies the object to be removed from the Rules Data Store.

Data Persistence File Load Order Independence

This provides specific clarity on the the Order of Operations during file loading.

When files are loaded, they are processed in order as the lines appear in the file, unless the line is a MODIFICATION. If it is a modification, it is processed after normal loading is complete. Note this means ALL FILES of a given format (e.g. TEMPLATE) are loaded with their DisplayName lines processed before ANY .COPY is processed. All .COPY items are processed before any .MOD items are processed. All .MOD items are processed before any .FORGET items are processed. (Note that strictly this is Base/Copy/Mod/Forget by object type, it doesn't strictly inhibit parallelism between file types during file load). This order of operations is necessary so that a second file can perform a .COPY or .MOD on the contents of another file. It is also important to recognize that .COPY occurs before .MOD, which gives strict consideration to what items may want to appear on the original line vs in a .MOD line as they are not always equivalent.

Source Information

There is one additional exception to the file processing as described above. If a line starts with a SOURCE*: token, then that line is processed as "persistent information" for that file. All items on that line will be applied to ALL items in the file. This should be limited to just source information that needs to be universally applied to included objects.


Tokens

Subsequent entries on a line represent tags/tokens on that object to give it information and behavior within PCGen.

In general, the format of a token is:

NAME:VALUE

The list of available tokens is specific to a given data persistence file type. This allows features to be limited to certain objects to avoid non-sensical situations (e.g. you can't assign material components to a Race). A collection of Global tags that can be used in nearly all data persistence files is also available.

The exact processing occurs within the plugins that are loaded to process each token. Each Token Class is stored in a separate file/class, independent of the core of PCGen, to allow each token to be independently updated, removed, or otherwise manipulated without altering or impacting other Tokens.

This also forces the Token Classes to be fairly simple, which makes them easy to test, modify, and understand (as they are effectively atomic to the processing of a specific token). One goal of the PCGen Rules Persistence System is to ensure that all of the parsing of LST files is done within the Tokens and not in the core of PCGen. This makes adding new tags to the LST files to be reasonably painless (though changes to the core or export system may also be required to add required functionality).

Individual Token files are in the pcgen.plugin.lsttokens package. Many may rely on abstract classes provided in pcgen.rules.persistence.token. When PCGen is launched, JARs that are within the Plugin directory are parsed for their contents. This actually happens in the gmgen.pluginmgr.JARClassLoader Class. As one of many operations that takes place during the import, each Class is analyzed to determine if it is a persistence Token (a persistence Token is defined as a non-abstract Class that implements the LstToken interface). When a persistence Token is found, it is imported into the TokenLibrary or TokenStore.

Discussion

As with any architecture, there are tradeoffs in having a plugin system. The first of these is in code association within the PCGen system. Due to the plugin nature (and the use of reflection) there are certain use-associations which cannot be made within an Integrated Development Environment (IDE) such as Eclipse. For example, it is impossible to find where a TemplateToken is constructed by automated search, as it is constructed by a Class.newInstance() call.

One quirk with the plugin system is also that it occasionally requires full rebuilds of the code in order to ensure the core code and the plugins are "in sync" on their functionality. This is reasonably rare, but is a result of the lack of a hard dependency tree in the code (really, the same problem IDEs have in determining usage)

There are also some great advantages to a plugin system.

By using reflection to perform the import of the classes and using reflection to inspect those classes, some associations can be made automatically, and do not require translation tables. By having all of the information directly within the Token Classes, a 'contract' to update multiple locations in the code (or parameter files) is avoided. There is also a minimal amount of indirection (the indirection introduced by TokenStore's Token map is very easy to understand).

The addition of a Token Class to the Plugin JAR will allow the new Token to be parsed. This makes adding new tags to the LST files to be reasonably painless (actually having it perform functions in the PCGen core is another matter :) )

Also, By keeping each Token in an individual class, this keeps the Token Classes very simple, which makes them easy to test, modify, and understand (as they are effectively atomic to the processing of a specific token).

In the future, we may also be able to defer some loading of plugins until after the game mode has loaded, allowing us to only activate and load those tokens relevant for a specific game mode. Specifically, it would be nice to not have to process any ALIGNMENT based tokens in MSRD, for example (and to have them all automatically be errors as well). This need may be mitigated by the more data driven design we are working to develop.

Future Work

It would be nice if there were a method of forcing the isolation without having a slew of JAR files... sunsetting the need to update pluginbuild.xml when a new tokens is created would be nice as well. So there is probably an architectural choice here that involves the tradeoff between separate tokens, token discovery, contract to have to update pluginbuild.xml, and modularity.

Identifying the Token

In determining which token is used, two items are relevant. First, the name of the token, second, the Class of object processed by the token. If two tokens are found during plugin load that share the same name and class processed, an error is thrown during PCGen startup.

How are token conflicts resolved? If two tokens have the same key (String before the : in the LST file), AND implement the same persistence Token Interface (e.g. PCClassLSTToken), then an error will be reported by the TokenStore class when the plugin JAR files are loaded.

The TokenStore is the older method of storing the tokens. In these cases, the tokens must be an exact match to both the name (case insensitive - but by convention they are capitalized in the LST files) and the class of object being processed. The TokenStore effectively has a Map<Class, Map<String, Token>>.

For the TokenLibary more flexibility is allowed. If an object like a Language is being processed, then first, the system will look for tokens that match Language.class exactly. If that fails, then the system will use reflection on Language.class to determine the parent class and see if a token of the appropriate NAME exists at that level. This is repeated until a relevant token plugin is found or the token is determined to be invalid.

This lookup starts within the TokenLibrary. Within that TokenLibrary, exists multiple TokenFamily objects. Each version of PCGen can have its own TokenFamily. This allows tokens that support backwards compatibility to be contained separately from the primary tokens.

In some cases there are both Global tags and "local" tags that have the same key (e.g. "TEMPLATE"). As described above, the the "local" key (one that is specific to a certain type of LST file) would take priority over the Global Token. This is the case with TEMPLATE, as the Global tag processing takes place in a call to PObjectLoader.parseTagLevel(), far below the PCClass-specific processing that takes place early in PCClassLoader.parseClassLine()

Future need: Interface Tokens

The current system does suffer from a number of issues. Some of our current "global" tokens really aren't global. They may be global in as much as an item is "granted" to a PC, but would fail on other object types. For other situations, we have begun to move away from the heavyweight and complicated CDOMObject/PObject into a more lightweight object, but we want to share behavior (and load tokens) there as well.

The existing TokenLibrary system has a few weaknesses with that new desire. As we rely more on interfaces than direct inheritance, TokenLibrary will begin to fail.

We therefore need some infrastructure to load tokens based on the available interfaces on an object as well. Note that this will produce an ambiguity we will need to resolve. For example, if there is a REACH token that is appropriate for both CDOMObject.class and SomeInterface.class, then we need a bright-line rule as to which token will apply (or if sharing a name between hard-class based tokens and interface tokens produces an error).

Token processing order

In general, all tokens are processed in the order they are encountered.

One exception is CATEGORY: in Ability, which must be on the original line (illegal on COPY/MOD lines), and which is processed by the Loader.

Being processed in the order they are encountered does not mean that they are applied to the PC in the order in which they appear on the given line. That order of operations is defined within the core.

Subtokens

Some tags have complex behavior that significantly differs based on the first argument in the value of the tag. In order to simplify tag parsing and Token code, these Tokens implement a Sub-token structure, which delegates parsing of the tag value to a Token specialized to the first argument in the value of the tag.

This design is primarily intended to separate out code for different subtokens. This provides increased ability to add new subtokens without altering existing code. This provides increased flexibility for developers, and ensures that unexpected side effects from code changes don't impact other features of PCGen.

Note that it is legal for a subtoken to only be valid in a single object type (such as a Race), even if the "primary" token is accepted universally. This greatly simplifies the restriction of subtokens to individual file types without producing burden on the primary token to establish legal values. Resolution of those restrictions is handled entirely within the LoadContext and its supporting classes.

Re-entrant tokens

There are a few tokens that allow you to drill into a separate object and then apply another token. In Equipment for example:

PART:1|...

In this case the ... above is another token. This means that the token will have a second ':' used as a separator. In general (but not universally the case), an embedded ':' as a separator indicates a re-entrant token.

Prerequisite Tags

Currently the Prerequisite tags are an exception to the parsing system. The Prerequisite tags have a prefix of "PRE" and are followed by the Prerequisite name, e.g. PREFEAT. This means that the Prerequisite tags do not follow the traditional method of having a unique name before the colon. Also, Prerequisite tags can have a leading ! to negate the Prerequisite.

In order to address this situation of a different token definition system, the PreComatibilityToken provides a wrapper into the new PCGen 5.16+ token syntax.

Class Wrapped Token

A ClassWrappedToken provides compatibility for previously allowed bad behavior in data files.

Many Class tokens in PCGen versions up to 5.14 ignored the class level, so they are technically Class tags and not CLASSLEVEL tags. Yet, PCGen 5.14 allows those tags to appear on class level lines. This is a bit deceptive to users in that the effect will always be on the class, and not appear on the specified level.

Unfortunately, one cannot simply remove support for using CLASS tokens on CLASSLEVEL lines, because if they are used at level 1, then they are equivalent to appearing on a CLASS line. Certainly, the data monkeys use it that way. For example, Blackguard in RSRD advanced uses EXCHANGELEVEL on the first level line.

Therefore the entire ClassWrappedToken system is a workaround for data monkeys using CLASS tokens on CLASSLEVEL lines, and therefore it should only work on level one, otherwise expectations for when the token will take effect are not set.

Future Work

This should eventually be removed, so that it is discretely clear from reading the data where a token is legal and where not.

Format of the value of a token

In most cases, we use a vertical pipe to separate different components of a VALUE. Each Token can process the exact contents and load the appropriate information into the Rules Data Store.

The format of each token is within the documentation of PCGen.

Unparsing

Adding output to the persistence system provides the ability to reuse the Rules Persistence System in a data file editor, as well as the runtime system. This sharing of code helps to guarantee the integrity of the data file editor. Such a structure also facilitates unit testing, as the Rules Persistence System can be tested independently of the core code.

All tokens loaded into the TokenLibrary (but not those in the TokenStore) has the ability to both "parse" and "unparse" information for the Rules Persistence System. Parsing is the act of reading a token value from a data persistence file and placing it into the internal rules data structure. Unparsing is the act of reading the internal data structure and writing out the appropriate syntax into a data persistence file.

In addition to other benefits, this parse/unparse structure allows Tokens to be tested without major dependence on other components of PCGen. These tests are found in plugin.lsttokens package of the code/src/utest source directory.

Shared Persistence System with (future) Editor

The data persistence system should be usable for both a data file editor and the runtime character generation program.

A significant investment made in ensuring that persistent data is read without errors should be reused across both a data file editor and the runtime system. Consolidation reduces the risk of error and ensures that the editor will always be up to date (a problem that caused its removal). In addition, additional editing capabilities (e.g. edit data in place) that are not available today can be added once a full-capability editor is available.

Tokens may overwrite previous values or add to the set of values for that tag. In the case of an editor, it is critically important not to lose information that would later be overwritten in a runtime environment. A simple example would be the use of a .MOD to alter the number of HANDS on a Race. This alteration should be maintained in the file that contained the .MOD and the value (or unspecificied default) in the original Race should not be lost. This is done by tracking the exact changes that occur during data load. This ability to handle changes is fully explained in the Load Commit Subsystem.

Unparsing in practice

The File Loaders separate out the tags in an input file and call the parse method on the appropriate Tokens. In order to unparse a loaded object back to the data persistence syntax, the all Tokens that could be used in the given object type must be called (this makes unparse a bit more CPU intensive than parse).

Unparsing a particular object requires delegation of the unparse to all tokens subtokens to see if they were used. Because all tokens are called when unparsing an object, it is important that tokens properly represent when they are not used. This is done by returning null from the unparse method of the Token.

Some tokens can be used more than once in a given object (e.g. BONUS), and thus must be capable of indicating each of the values for the multiple tag instances. Since Tokens do not maintain state, the unparse method must only be called a single time to get all of the values; thus, the unparse method returns an array of String objects to indicate the list of values for each instance of the tag being unparsed.

Token should not include the name of the tag in the unparsed result. Just as the token is not responsible for removing/ignoring the name of the tag in the value passed into the parse method, it does not prepend the name of the tag to the value(s) returned from the unparse method. (This also happens to simplify the conversion and compatibility systems.)

Further Reading

To understand more about how PCGen handles the reference of an object before it is constructed, see CDOM References Concept Document

To understand more about how PCGen handles communicating information from the tokens to the Rules Data Store, see Load Commit Subsystem