Inside People Soft Trees

http://blog.greysparling.com/2005/08/inside-peoplesoft-trees-or.html

Friday, August 12, 2005

Inside PeopleSoft trees - or - Photosynthesis at PeopleSoft

I've talked with many PeopleSoft customer who have needed to use PeopleSoft trees outside of nVision, Query, or Cube Manager. One of the services I had pushed for was to add this functionality into PeopleCode (especially app engine).

Doesn't Tree Manager already have an API?Sort of. Although there is an API available in PeopleCode for trees (called the tree classes), these APIs are focused on describing and maintaining trees (essentially tree manager functions). However, as any good BI person will tell you, it's not the act of maintaining a dimension that's important... it's what you can do with the dimension when you join it to other stuff.

So, what sort of things would you want to do?The things you would want to do is to filter data with a tree or subtotal data using a tree as the grouping mechanism. The former is available to both nVision and PS/Query as either Filtering criteria or "in tree" criteria. The latter is only available in nVision through nPlosion. However, there are lots of things outside of nVision or Query that you would want to do.

Aggregate data using the tree. Those familiar with Summary ledgers in GL understand the concept. You can aggregate data using a tree for other purpose, such as improving performance for downstream processes. For example, if you want to display a count of reported problems by product family for all product families, you're doing this type of aggregation at the product family level of the tree.

Filter data using the tree. This is generalizing the "in tree" criteria in PS/Query. A couple of good examples of this are the batch programs used in GL for selecting open receivables to restate, or for selecting what data to use in Allocations. Another example of where this would be valuable would be in search records, where you could find all cases for eastern region customers, that are reported against any product in the Finanicals product line.

Today, PeopleSoft applications and customers use the technique I'm describing below to accomplish this.

First, a little more on Trees

The first thing to understand is how trees work. A tree is not a standard structure (such as a strict self-referencing table, as described in Ralf Kimball's definitive book on data warehousing), or a standard denormalized structure. It has a unique structure that allows it to do the following things:

http://blog.greysparling.com/2005/08/inside-peoplesoft-trees-or.html

Things that are not supported well by denormalized tables.

1. It supports unbalanced (or ragged) hierarchies. 2. It supports dynamic leveling.

3. It supports joining at any level of the hierarchy.

Things that are not supported well by a self-referencing table.

1. It supports finding all children at any level in a single SQL statement regardless of the number of intermediate levels.

So, how does it do it?

The key to it is the data model, and the fields added to it (that are maintained by tree manager). Here are the tables in the model:

PSTREEDEFN - This identifies the tree, and the attributes of it (such as where to get application attributes of the nodes, leafs, and levels).

PSTREENODE - This is the heart of the tree. It is a self-referencing table that also has additional attributes to facilitate non-recursive access to children of a node.

PSTREELEAF - This maps a node to the sets of leafs or detail values that can be used with it. This can be thought of as the "leaf" level in the BI world, but that the leafs can be attached to any level of the tree, even to nodes that have children nodes.

User tables - These are tables that store additional attributes to the node, leaf, or level, such as description, size, manager, etc. This allows a tree to be defined against any data element in the application and also opens up a whole host of opportunities for using trees in a given customer enviornment.

It's also important to understand that the tree definition is the starting place for everything, and that all the tools owned tables (i.e. not the user tables) strictly follow the key structure of the tree definition. Because trees are effective dated and utilize setid indirection, this is an important thing, because effective dating and setid indirection is only evaluated for the PSTREEDEFN table and the user tables. This logic is not used when joining between PSTREEDEFN and PSTREENODE. This means that from the perspective of effective dating, any nodes that have not changed between to tree instances are duplicated in the PSTREENODE table.

Once you've identified the tree you want to use (setid, tree name, and effdt are the unique keys), you also know the high order keys to use in the PSTREENODE table to get the nodes for that tree. Now, let's look at the structure of the PSTREENODE table:

Setid Tree Name

Effdt

TREE_NODE

TREE_LEVEL_NUM

TREE_NODE_NUM

TREE_NODE_NUM_END

PARENT_NODE_NUM

As you can see, this table is self-referencing (the PARENT_NODE_NUM tells you the NODE_NUM of a tree node's parent). However, tree manager also maintains the TREE_NODE_NUM_END field, which is what prevents the need to do a recursive set of selects to find all the descendants of a node. Tree nodes are strictly numbered. A tree nodes' descendents will always have a node number that is between the value of its TREE_NODE_NUM and TREE_NODE_NUM_END.

Therefore, if I want to select all the descendents of the node "Assets", here is the SQL I would issue:

Select B.TREE_NODE, B.TREE_LEVEL_NUMfrom PSTREENODE A, PSTREENODE Bwhere A.SETID = B.SETID and A.TREE_NAME = B.TREE_NAME and A.EFFDT = B.EFFDTand A.TREE_NODE = "Assets"and B.TREE_NODE_NUM between A.TREE_NODE_NUM and A.TREE_NODE_NUM_END

You could also filter by level number to get the descendents at a given level.

Now, to add in the leafs. For winter trees, this step is not necessary (winter trees are trees where the nodes themselves represent data points of interest, such as positions, whereas with summer trees, the nodes are the hierarchy that categorizes something else, such as most account trees or customer trees).

Because leafs are essentially mapping tables, you either join them directly to another table (either the user table to get attributes, or a fact table to get rows). However, for performance purposes, most PeopleSoft processes will stage the data values before joining it to a fact table.

Getting back to the subject at hand, here's the table structure of the PSTREELEAF table:

Setid Tree Name

EFFDT

TREE_NODE_NUM

RANGE_FROM

RANGE_TO

DYNAMIC_DTL_FLG

As you can see, the high level keys are again the same. Also, you can see that the mapping to the tree node is by node ID, and the mapping to the target table is by either individual key values or by range.

So, let's say we want to find out all account numbers and account types for all accounts under the "Asset" node of the tree. Now, there are three things that need to be looked up in order to determine what to put in the SQL:

1. The table and field where the Accounts are stored. This is stored in a tree table I haven't discussed, which is the PSTREESTRUCT table (which has a reference in the tree defninition).

2. The setid to use for the accounts. As mentioned, setid indirection occurs when going after the user tables. Depending on where your getting your business unit or setid, you will have to do the lookup separately (which is a more complex topic, but is often hard-coded at a customer anyway).

3. The effective date to use for the accounts. Again, this occurs when going after the user table. This means that you have to perform the standard correlated subquery for effective date logic. I won't go into detail on this in this posting, but you can see the SQL by building a query against the table in PS/Query.

So, your SQL will look like the following when after the list of accounts under a tree node:

Select C.ACCOUNT, C.ACCOUNT_TYPEfrom PSTREENODE A, PSTREELEAF B, GL_ACCOUNT_TBLwhere A.SETID = B.SETID and A.TREE_NAME = B.TREE_NAME and A.EFFDT = B.EFFDTand A.TREE_NODE = "Assets"and B.TREE_NODE_NUM between A.TREE_NODE_NUM and A.TREE_NODE_NUM_ENDand C.ACCOUNT between B.RANGE_FROM and B.RANGE_TO

As you can see, this SQL statement didn't have to look at any children nodes at all! Because the leafs are attached using node numbers, and the range on the tree node record identifies all the node numbers of all its children, you could join the leaf directly to the node.

Now, it's important to note that this SQL will not perform in DB2, because indexes are not used when joining two fields of different size (which is why nVision has 30 different tables for staging the join between the tree leaf and the data table).

Staging the join

As mentioned previously, when joining to a fact or data table, one generally stages the join into an intermediate table. nVision and all other PeopleSoft delivered processes use selector tables (in nVision it's PSTREESELECT##, where the ## represents the field size of the field joined to. In nVision, there's a whole subsystem dedicated to managing what goes into the PSTREESELECT tables. For the purposes of this discussion, you can merely create your own selector tables that identify what you are putting in there, and the set of values that it represents. Let's assume, for example, you have a table called "PS_ACCOUNT_TREE_FLAT" to represent the flattened account tree. The SQL to load the data related to the "Assets" node might look as follows:

Insert into PS_ACCOUNT_TREE_FLAT asSelect "Assets", B.RANGE_FROM, B.RANGE_TOfrom PSTREENODE A, PSTREELEAF Bwhere A.SETID = B.SETID and A.TREE_NAME = B.TREE_NAME and A.EFFDT = B.EFFDTand A.TREE_NODE = "Assets"and B.TREE_NODE_NUM between A.TREE_NODE_NUM and A.TREE_NODE_NUM_END

You could then join to the ledger table as follows:

Select Sum(B.POSTED_TOTAL_AMT)from PS_ACCOUNT_TREE_FLAT A, PS_LEDGER Bwhere A.FILTERED_NODE = "Assets"and B.ACCOUNT between A.RANGE_FROM and A.RANGE_TO

Hopefully, this gives a starting point to determining how to approach this. I suggest playing around with a SQL editor and PS/Query to make sure you understand (I did the same thing with nVision when I was first learning). Keep in mind, however, that the SQL that gets displayed in either an nVision trace or the SQL tab in Query does not include the SQL for populating that tree select table. However, it does show the filtering conditions used for joining the select table to the data table. You can select directly against the tree select table to see what values were put in it, and then look at the PSTREE% tables to compare the two.

Follow-on Info

For those who use tree manager extensively, we recently added a product that makes it easier to maintain multiple trees together. Additional information can be found here. A demo of this product can be found here.

http://www.greysparling.com/GSDemos/Tree_Maintenance_Snapon_web_demo.html

http://blog.greysparling.com/2007/02/getting-control-of-all-your-trees.html

Inside People Soft Trees

Documents

Transcript of Inside People Soft Trees