File Reorganization

by Alex Kodat

Background

Model 204 is a high performance database package with an integrated teleprocessing monitor and programming language (User Language). Because of these latter features it is easy to forget that the primary function of Model 204 is to provide your end users with rapid access to the data stored in your database files. While Model 204 is relatively forgiving of a bad database file design, it is the rare Model 204 user that cannot reduce Model 204 resource consumption and/or improve end user response time with some changes to database file design.

Often, the only way to achieve the changes required to improve Model 204 performance is to reorganize your Model 204 database files. This generally requires that you unload the contents of your database files to intermediate sequential datasets and then reload the sequential datasets back into the same or new database files. Most Model 204 customers must weigh the possible benefits of a file reorganization against the cost in both machine resources and programmer time.

Fast Reorg has dramatically changed this equation. By reducing resource consumption of a file reorganization by 50 to 90 percent and by greatly simplifying the progamming required to perform a file reorganization, users who could not justify performing reorgs before should consider them now, and users who are performing reorgs now can perform them more frequently and/or at lower cost.

Fast Reorg is a product developed by Sirius Software that is intended to simplify and speed up the Model 204 database file reorganization process. It is made up of two component products - Fast Unload and Fast Reload that simplify and speed up the unload and reload parts of the reorg process.

This document is intended to help you determine whether you need to reconsider or revise your current reorg strategies given the added speed and ease of use of the Fast Reorg package.

Reasons for Performing Reorganizations

Model 204 database files can be broadly classified into three types based on their usage patterns.

Read only files
These are files that do not change over time and are used for retrieval purposes only. These types of files can often benefit from a reorganization but do not require regular reorganization.

Add only files
These are files that have data added to them but do not have data deleted. These files often need reorganization especially if they are sort or hash key files.

General update files
These are files that have data added and deleted. These are the most complex files to deal with and almost always need regular reorganization to maintain good performance.

There are several reasons why one might wish to reorganize Model 204 database files :

The above list is not meant to be all inclusive. There are many other reasons for reorganizing your Model 204 database files. However, the list does include most of the common reasons.

In summary, the benefits of performing file reorganizations tend to be reduced disk I/O, reduced disk buffer pool requirements and better use of DASD cache. These all come as the result of denser packing of data in the various Model 204 tables and as the result of clustering of related data. In some cases, these will even produce reduced CPU utilization. Ultimately, all of these produce better response to your end users.

The Ordered Index

One of the areas of database design that is a source of mystery to many Model 204 DBAs is the use of the ordered index. Specifically,

The following are the advantages of "KEY" fields over "ORDERED" fields.

The following are the advantages of "ORDERED" fields over "KEY" fields.

Comparing the advantages and disadvantages of "ORDERED" and "KEY" fields it is difficult to escape the conclusion that it is generally better to define a field as "ORDERED" rather than "KEY" unless there are compelling reasons otherwise. Similarly, it is generally not a good idea to define a field as both "KEY" and "ORDERED". The two major cases where one would define a field as "KEY" rather than "ORDERED" are :

The major problem with the ordered index that can occur with the ordered index that can would require a reorganization is low ordered index density. That is, you might have more ordered index pages than are required by the quantity of ordered index data. This can result in excess I/O or buffer pool usage for the ordered index. Fortunately, there are two simple formulas that measure ordered index density. These formulas are :

OINODEP = 100 * (OINBYTES / OINODES) / PAGESZ OILEAFP = 100 * (OINBYTES / OILEAVES) / PAGESZ where OINBYTES, OILEAVES and OINODES are viewable file parameters, PAGESZ is a viewable system parameter, OINODEP is the overall ordered index density and OILEAFP is the leaf page density of the ordered index. OINODEP and OILEAFP will usually be between 0 and 100 and will generally be approximately equal. After an ordered index reorganization, OILEAFP will be approximately 100 - LRESERVE (assuming LRESERVE is the same for all fields).

Unfortunately, the values of OILEAFP and OINODEP are affected by Model 204 ordered index prefix compression where any common part of all values on an ordered index page is stored only once rather than for each value. But Model 204 counts the uncompressed size for OINBYTES. For example, if all the values on a page start with '1999062211' Model 204 compresses out the common prefix saving only one copy of it on the page but counting the full 10 bytes for each entry. Let's say for the sake of argument that the rest of each of these keys is an additional 6 bytes plus 1 byte for a length and 3 bytes for a single record number (let's say each record has a unique time stamp). This would mean each entry on a page would be credited with 20 bytes for OINBYTES purposes while only using 10 bytes. In such a situation, a page that's only 70% full would register as being 140% full by the OILEAFP measure.

The bottom line is that OINODEP and OILEAFP can exceed 100 and are not a completely accurate representation of ordered index density in cases where ordered index prefix compression is being used to great advantage.

In add only files, OILEAFP will gradually decrease after a reorg, never getting below 50 (unless SPLITPCT is set to something other than 50). After a while OILEAFP might start increasing again and will fluctuate between 60 and 85 over time.

In a general update file, OILEAFP will gradually decrease after a reorg possibly getting to below 50 and perhaps fluctuating over time.

OILEAFP gives one a simple value that indicates the density of data in the ordered index. One can increase ordered index density to approximately 100 - LRESERVE by doing an ordered index reorganization. For example, if OILEAFP is 52.8 in a particular file one can set LRESERVE for all "ORDERED" fields to 90 and do an ordered index reorganization, getting a 70 percent increase in ordered index density.

What are the correct values for the ordered index field parameters - LRESERVE, NRESERVE, SPLITPCT and IMMED? Because of the relatively forgiving nature of the ordered index, the defaults are never going to produce significantly worse performance than anything else. In a read only file one would get some benefit to setting NRESERVE and LRESERVE to 0 before doing a reorg. If one is willing to do frequent reorgs to keep OILEAFP at a high value, one might benefit by setting NRESERVE and LRESERVE to a smaller value (such as 5), although the benefits of this will probably be too small to justify the need for extra reorgs. Finally, the optimal setting of the IMMED parameter is highly data dependent and is not likely to produce significant performance benefits except in a few unusual applications.

How to Reorganize your Files

The classic options for reorganizing Model 204 database files are frought with tradeoffs. The two basic ways one can perform reorgs are :

  1. By doing a 'PAI' dump of table B and then using a generic FLOD program to reload the data.
  2. By doing a formatted unload of your data using PRINT or WRITE IMAGE statements and then writing special purpose FLOD programs to reload this formatted data.
The advantages of the PAI format are : The disadvantage of the PAI format are : The advantages of a formatted reorg are : The disadvantages of a formatted reorg are :

As can be clearly seen from the above lists of advantages and disadvantages, neither format provides a highly satisfactory mechanism for performing reorgs. Fast Reorg combines the advantages of the two techniques and adds a few that neither has. A typical reorganization using Fast Reorg consists of a Fast Unload program that contains

OPEN filename UAI and a Fast Reload program that contains FILELOAD -1,-1,0,1000000,10000,10000,64 LAI END If you wish to sort the unloaded data for clustering by field FOO or because field FOO is the sort key simply specify OPEN filename UAI SORT FOO in the unload program. If you are reorganizing a hash key file and you are changing BSIZE to 20000 in the reorganized file simply specify OPEN filename UAI HASH FOO BSIZE 20000 on the unload. If you want invisible ordered index data to be preserved over a reorg simply specify OPEN filename UAI OINDEX on the unload. The OINDEX parameter can be combined with the HASH and SORT parameters described above. There are other options on UAI and LAI that effect performance but UAI and LAI even without any tuning will generally be considerably faster than any of the old style reorgs.

To summarize, then, the advantages of the UAI/LAI format are :

The disadvantage of UAI/LAI format : This latter point should be viewed mainly as a migration issue. If you redefine your invisible key fields as invisible ordered index fields and regenerate your invisible keys, you will never again have to worry about losing them over a reorg.

Conclusion

In order to provide good system response and throughput to your end users, it is important to develop and maintain a good database file design. Most Model 204 /sers recognize this and have designs in place and strategies for maintaining them. Fast Reorg, however, provides such a significant improvement in the speed and simplicity of reorganizing Model 204 database files that you should at least reconsider and perhaps rework your database designs and file maintenance strategies. Fast Reorg could provide that critical tool that lets you control your Model 204 database files rather than having them control you.