Becoming Data Driven

From PCGen Wiki
Jump to: navigation, search

Input

Historical/Original

Originally PCGen had a series of tokens that imported information into specific locations within the objects within PCGen. These objects were highly specialized and in somecases very long (PCTemplate was at one point well over 10,000 lines of code).

In the long term, however, we end up in a situation where the disadvantages outweigh the clarity of this architecture, so this design is deprecated and largely removed from PCGen. (The major exception is Equipment, where you can see this design in full use).

Defaults

The hardcoding to specific methods made defaults fairly easy to understand - they were hardcoded in the specific method, if no other value was set by the data.

Limitations

This hardcoding has some limitations:

  1. The original designs were directly aligned to how things were originally in the SRD (3.0) and RSRD (3.5). The result was that certain features could only be reached from certain objects.
  2. The original designs presumed certain features, meaning that it was assumed that "Hands" was a relevant characteristic. In a system like MSRD, where all creatures are human, those methods are either unused, or worse, have to be turned off so they aren't processing and checking for data that may not be present.
    1. Alignment is a specific example where there is specific code that has to turn OFF certain features if there is not an Alignment in a specific game mode
    2. This turn on/turn off is inflexible and really not scalable.
  3. In general every new characteristic not only drives a new token but it drives a new set of methods on the underlying objects.
  4. From an output perspective, specific output tokens must call the exact methods, so significant code is requried in the output system as well.

Advantages

This hardcoding is nice for certain things:

  1. It makes it clear to a reader what is being processed, since there is a direct method on an object of, for example, getHands()

Indirect Storage

As we began a transition in reducing duplicate code and making PCGen more flexible, we got to where many of the items are now resolved indirectly. Instead of getHands() for example, we now do: getInteger(IntegerKey.HANDS). These generic storage methods are on CDOMObject.

This has a number of benefits, specifically that adding additional characteristics to an object does not require new methods (there is an ability to store integers, Objects, Lists, etc.). However, a new token is still requried, since there is not a generic method of getting at those put* and get* methods from the data tokens.

The Keys for these indirect storage methods are generally in pcgen.cdom.enumeration

Defaults

The lack of specific methods for any given behavior then leave the question of where defaults should be stored... To provide that behavior, we allow each Key to hold a default value. Then a resolution can use the default value if no other value exists.

You can see the Defaults in places like IntegerKey, where the constructor takes 2 arguments (The name and the default value).

Design Choice

Note that there is a specific set of methods that handle Lists and more complex data structures on objects like Race, et al. This is conscious, in order to fully protect the contents of the list held internally to the CDOMObject from being modified outside that CDOMObject. As mentioned elsewhere, this is a defensive coding syle based on past problems in PCGen and hedging against similar issues.

Indirect with a Generic Token

The next progression in improving how PCGen can be data driven was enabling a token that allowed the Data team to not only define their own content, but define their own "Key" as well.

To do this, there are two new tokens, FACT and FACTSET. Much of the rest of this section refers to "FACT", but is using that as a generic term for both FACT and FACTSET. Few features only support FACT, and in those situations, it will be clearly stated.

Design choice: Predefinition

Note that it would be possible to simply have a generic method that would automatically generate a *Key and inject objects using the same methods as in "Indirect Storage" above. However, this could be highly error-prone. A small typo of "Hadns" would result in data never actually controlling the right value. We therefore made a conscious decision to force the pre-definition of the key values. This is done through FACTDEF and FACTSETDEF tokens in the Data Control file.

Design Choice: Static

An almost immediate response upon the introduction of FACT and FACTSET was: "Why can't I change them". Answer: Facts shouldn't change, so semantically the behavior matches the name.

Well, on a more-useful answer (of why FACT in the first place), there are practical reasons to have them not be modifiable. The first is that we encounter many items that are not-PC specific that can therefore be cached. To guarantee we can cache that information, we need to know we will not be PC dependent. The design of this makes that guarantee.

Second, this was design was done in context to the new variable system being in the design pipeline, so we clearly didn't want to duplicate efforts of a variable system with local variables. If the data team needs to modify something, then clearly it's a local variable, not a FACT.

Data Driven Grouping

When a FACT is provided, if it is something that is known to be universal (or nearly so) it is possible to use that to group objects. Groups are usable in places like CHOOSE:. To enable the use of a FACT in a Grouping situation, it needs to be enabled, by using the GROUPABLE: token on the FACTDEF.

Data Driven Output

A FACT can also be enabled so the information is visible to output. This is also controlled on the FACTDEF line.

Output

Traditional output was done using a custom PCGen output system. Some time ago, we adopted Freemarker, and many of our output sheets now use Freemarker to do output

Traditional

Traditional output uses the Tokens generally in plugin.export. A number of them are in the core, in pcgen.io.exporttoken. These items process information read from the output sheet and process it into the appropriate String.

Challenges

  1. Many of these tokens take a significant amount of processing and keep it within PCGen. The weapon and ability output tokens both are extremely large.
    1. The ability tokens require a cache to have reasonable performance. Because of how the traditional output system works, the tokens become re-entrant, and have to continually build the same ordered list. Since that is not in the core, some of the tokens cache the list and it can be confusing about why the cache exists to an unaware reader of the code.
    2. The weapon tokens have a ton of processing, much of it very specific to d20 systems.
  2. The tokens themselves are not at all modular - they are all generally independent and are a huge source of duplicated code. Others have to be held in the core so that derivative tokens can have them as a parent class.
  3. The traditional output tokens can be used in the traditional variable system as a value.
    1. You probably can't imagine how difficult this makes following what is possible in the old variable system in terms of how much code will be called when a formula is processed

Freemarker Compatible

As we transitioned to using Freemarker, we were able to stop using a number of the control tokens we used to have (IF, etc). Others we continued to use in a fashion that was very similar to their existing usage. This still carries with it many of the existing issues articulated above.

Freemarker Native

Going forward, we are moving toward a Freemarker-native form of output. You can find this in pcgen.output. Instead of doing custom String processing relative to PCGen, these classes provide TemplateModel objects that are fully compatible with Freemarker's internal knowledge. This will provide a number of advantages:

  1. Since the values are effectively native to Freemarker, they will require no caching, et al. (Freemarker will capture and reuse the list while it is in scope)
  2. There is a huge ability to be modular in terms of reuse when certain things show up in different places.
    1. For example, the PC's Alignment may desire certain features (showing the abbreviation, etc.). This may also be true of the Deity's alignment. Under the old system, this would have required duplicate code or some very clever hoop jumping in the old tokens. In the Freemarker native system, we well Freemarker how to wrap an Alignment, get back a TemplateModel, and then regardless of the method used to get to an Alignment, we can provide all of the features of an Alignemnt all of the time. IT naturally reuses code.

Challenges

Right now a limited set of items are available in Freemarker-native output. Some of this is conscious, since the desire to bring across a lot of the custom code in things like weapon output tokens is low... we want to drive those into the data and into the new formula system.

Design

The various methods of input and output hav esome consequences going forward. One of those in understanding what choices to make in being data driven is in using Variables vs Objects.

The example used here relates to Movement.

It is definitely possible to have new variables like:

Movement_Normal
Movement_Fly

We could perhaps start these at -1 to indicate no movement is available, or even have boolean values like:

Has_Movement_Fly

However, this quickly breaks down for a number of reasons:

  1. It gets very difficult to increment across all Movement.
    1. The output sheets will need to do that, and if it was strictly variables in the new system, the output sheet would need to check all the variables.
    2. This potentially traps all games modes into using the same variable names (and potentially other dynamics), which is a bit rediculous. If data driven means there are handcuffs on the data, it may be worse than having magical code.
  2. It hardcodes values like "Fly" into variable names, and those are things that should eventually be localized (meaning translated)
    1. This would thus demand unique infrastructure for translating variable names
  3. An item that attempted to modify all movement of any type would have to know all the variables and would thus be fragile (meaning you'd have to constantly be modifying that object).
    1. This indicates an underlying architecture failure of the design if a simple change results in breakage elsewhere.

To get around these items, it is recommended that items of this nature (Movement and Vision being the most obvious) use a separate object type:

DYNAMICSCOPE:MOVEMENT

This then gives an object type of "Movement", and new items can be defined:

MOVEMENT:Normal
MOVEMENT:Fly

Now we end up with a few neat characteristics:

  1. Since these are objects, we can still focus on translating info about objects and we will thus capture and be able to define how to translate these
  2. We can have a local variable on each item, and thus address that from various locations
    1. It's no longer a series of global variables, so we have reduced the number of global variables
    2. If we have something that ever modifies all movement of a given type, it is as simple as a MODIFYOTHER:MOVEMENT|ALL|... so it's not fragile to introduction of a new type of movement.
  3. We can easily increment across all objects of that type. It can easily be exposed on "pc" as something like "movement", and thus flows directly into the same process as other items that are Freemarker-native output.