Referential Integrity for WebObjects
version 0.3, 11 November 2001
Enterprise Objects, or EOF, a part of WebObjects, is a programming tool to adapt object oriented programming to use relational databases as a persistent object store. Relational databases have their own sets of rules for design and usage that EOF doesn't always conform closely to. In addition, EOF places a number of constraints on the use of table attributes as keys (primary and foreign).
In traditional relations database terms, these constraints are referred to as "referential integrity", and define how these keys must be structured for use by EOF. If you use a table with keys that break these rules, programs written using EOF will behave unpredictably. I have personally seen only every alternate row returned from a table, unexpected exceptions, and even application crashes (Dr Watson errors on Windows NT), when the underlying keys don't match the rules. These errors are known as "referential integrity" errors.
When accessing a legacy database, one that is used by other, pre-EOF, programs, it is possible that the database may be come corrupted by a breach of these rules, either as the result of a genuine problem or programming error, or from ignorance of the additional EOF referential integrity rules. Databases that are often accessed using hand generated SQL are particularly subject to these problems, which means that developer's test database are the most prone to destruction by these means.
I have written a simple EOF tool, in WebObjects, that uses raw SQL access to analyse tables from an eomodel for referential integrity problems. It is also an example program to demonstrate a few other, minor, WebObjects techniques.
This is available for download in WebObjects 4.5 and WebObjects 5 versions.
What is Referential Integrity?
A table attribute that has a unique value for every row in the table can be designated as a primary key. Sometimes, a combination of several different attributes is unique for the table, in which case that combination may be called a compound primary key. One of these primary key values can be entered as an attribute into a different table, in order to be able to refer to the original table. In this situation, the key is called a foreign key.
In classical relational databases, any key must always be unique; duplicates are not permitted. A key field must also never take null values. Foreign keys must always refer to a current, existing row, although in some cases a null value, indicating an absent relationship, may be permitted.
EOF adds some constraints to this model. In particular, integer primary keys may never take the value zero. It also requires that all tables must be allocated a primary key. Some attribute types have special support in EOF, such that EOF can automatically assign unique values to the row - these types commonly include integers, as well as some certain binary types.
Referential integrity is a very important concept: once it is broken, data runs the risk of being duplicated in the database, with nothing to identify what the correct version is. Programs written assuming RI will generate exceptions, and will retrieve incorrect data.
Structure of an EOModel
The nature of an eomodel adds some further complexity to this picture, as well as provide the resources for us to verify referential integrity.
Every EOF program has one or more model groups; this is a structure used to hold references to loaded eomodels. EOModels contain EOEntities, which correspond to tables in a relational database; each eomodel has a single connection dictionary, so can represent a single database signon at any given moment. One characteristic of EOF is that entity names must be unique in a model group. An EOModel can be loaded explicitly, or can be loaded from a search path, which includes previously loaded frameworks, by requesting a specific model name. Given that each entity in a model may be represented by a custom class, it makes sense to associate models with the class implementations.
At this level, a program will refer to entities by name, rather than models or model groups. If you check the parameters to the methods in EOUtilities, you will see that entity name is common parameter.
An EOEntity contains EOAttributes, and EORelationships. It also keeps a separate track of its primary key, which can be requested as either an array of attributes, or an array of attribute names. The properties of these are reasonably evident: attributes have names, and can be identified with specific database fields and object values; relationships connect together two different entities (possibly across multiple models).
Raw SQL Access
To verify referential integrity, we have to bypass EOF's normal process of retrieval, as this is one of the areas where flaws can trigger exceptions and data corruption. To do so, we use raw SQL access, using specially constructed SQL statements to detect problem areas.
In all cases, I just use EOUtilities.rawRowsForSQL() to detect errors.
Raw SQL is most often used as a fast access means of retrieving records from a database as dictionaries. However, SQL is far more poerful than that, and this approach lets us send arbitrarily complex SQL to the database, with the expectation of a small return array of dictionaries: in this case, just the primary keys of records in error.
RIChecker Program Structure
The program, RIChecker, allows the user to load an EOModel on the server, browse through loaded EOEntities, and produce a report on the primary key and relationships for any selected entity.
To do this, it is divided into three layers: a utility class for browsing the server file system to locate EOModels; two wo components used to coordinate the user interface and display of results; and three closely related classes that are used to hold primary keys and relationships, and check their integrity. It should be no surprise that these three layers correspond exactly to the MVC pattern (in the order V, C, M).
File Browser (subproj)
This component was found inside the WebObjects supplied Monitor.woa application, which uses it to browse through the file system to locate WebObjects applications. I copied this code, which came as a WebScript class, and converted it into Java, and to use the Java File class to read directories; some bugs may remain from the conversion. It uses NSPathUtilities to manipulate path strings.
I have extended it to display file dates, filter by file suffixes, and grouped the various resources for it into a sub-project.
It requires three bindings: one a string to set the initial path, and to return the selected path into; and two actions, one of which is called when a file is selected. A further binding, filter, can be used to request that only files matching the given suffix are returned; this is my addition to the original code.
This coordinates two FileBrowsers (one for EOModels, one for frameworks), a list of loaded models, a popup browser of loaded entities, and a button to check the selected entity from the popup.
This is initialised with an EOEntity, which it uses to populate a single RIPrimaryKey, and an array of RIRelationships. Everything else is carried out through the bindings to these sets of objects.
Each RI object has a checkIntegrity() method returning a boolean, and which populates a set of arrays containing information about rows in error (where that is possible to retrieve).
It also displays an error string (highlighted in red) from each of the main classes used. As the RIKey subclasses all return an array of records in error, having a secondary catch-all string variable for error messages is a convenient and simple way of interpreting exceptions caught in the mechanism. These classes are very likely to cause exceptions because the SQL used is fixed, and can't cater for all databases.
This is an abstract class that is the parent of both RIPrimaryKey and RIRelationship. It is basically used to share code between the two concrete classes, both of which follow the container pattern. If I was writing this in Object C rather than Java, each class would be much simpler, and would be categories extending basic EOF classes.
RIKey maintains a reference to an EOEntity, which can be retrieved via accessors, as well as an editing context used for its database accesses, and two arrays for the problems that are applicable to both primary keys and relationships. Accessors are provided as a convenience to return the table name and model name belonging to the contained entity.
The main work of this object is carried out by checkIntegrity(), which clears the error arrays when it is called (to minimise the number of accesses to the database), and then calling checkForZeros() and checkForNull(). CheckForZeros() uses "select key from table where key = null", and checkForNull() uses "select key from table where key = 0". As the array returned is made available to the controllers above, a simple count of errors can be used.
Results are generated by calling EOUtilities.rawRowsForSQL; as there is a possibility that the SQL given may be invalid for some databases, each of these calls (also in the two subclasses of RIKey) is wrapped with an exception trap, and the error message generated displayed on the page highlighted in red. Unfortunately, OpenBase doesn't support the 'GROUP BY' syntax, making it useless for checking for duplicate primary keys.
This subclasses RIKey, and adds a convenience constructor (which is passed an EOEntity), and a checkForDupes() method.
checkIntegrity() extends the RIKey checkIntegrity() method by calling checkForDupes(), which uses "select key, count(*) from table group by key having count(*) > 1". Not all possible supported databases will support the group by SQL syntax.
This follows a similar approach to RIPrimaryKey, but has rather more to do. It has to use an EORelationship as its contained object, which can't be derived from just the entity; setting the relationship automatically sets the entity from the relationship.
In exactly the same way as RIPrimaryKey, I have extended checkIntegrity() to call checkForMissing(), and implemented the latter. It uses the SQL: "select fKey from table where fKey not in (select key from fTable)".
One further feature is that I have created a static method that returns an array of RIRelationships when called with the EORelationships array from an entity - which is used in CheckEntity.wo. This approach is something I often make use of; any class should be responsible for handling standard means of initialising itself. For example, I will use a simple static method on an EO class for any commonly used fetches for that class.