IDUG North American Conference 2004 â€" Orlando, Florida, May ...

33
IDUG North American Conference 2004 – Orlando, Florida, May 9-13 Trip Report – Charlie Perkins, FPCMS Database Technical Services What is this conference? IDUG, the International DB2 Users Group, manages several DB2 technology conferences each year. The North American conference is the largest and was established first (1988). There are also conferences in Europe and Asia as well as “mini” conferences in Toronto, and, for the first time this year, Europe and San Francisco/Los Angeles. The conferences are run by a volunteer- based organization (IDUG) and supported by a professional management company. The North American conference is the premier venue for DB2 knowledge acquisition. It’s only true rival is the IBM-managed, Data Management Technical Conference (September). Many of the same speakers grace both events, but the IDUG conference features more “user experience” presentations. Conference attendance peaked in the late 90’s in the 2500 attendee range. As with most travel-related events, tight budgets have pushed registrations down since then. This year’s attendance was 1300, trending up again. The conference was very upbeat, and IDUG is positive about a return to growth. Why this conference? It is difficult for technology managers and senior data management technicians to find meaningful training that can provide both breadth and depth in targeted technologies. The IDUG conference(s) have proven, in my experience, to deliver the best value as a training alternative to pure depth training (e.g. a 4- day class on Database Performance Tuning). I have attended IDUG 3 other times over the years and have found the quality and quantity of educational value to be consistent and cost beneficial.

Transcript of IDUG North American Conference 2004 â€" Orlando, Florida, May ...

Page 1: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG North American Conference 2004 – Orlando, Florida, May 9-13Trip Report – Charlie Perkins, FPCMS Database Technical Services

What is this conference?IDUG, the International DB2 Users Group, manages several DB2 technology conferences each year. The North American conference is the largest and was established first (1988). There are also conferences in Europe and Asia as well as “mini” conferences in Toronto, and, for the first time this year, Europe and San Francisco/Los Angeles. The conferences are run by a volunteer-based organization (IDUG) and supported by a professional management company.

The North American conference is the premier venue for DB2 knowledge acquisition. It’s only true rival is the IBM-managed, Data Management Technical Conference (September). Many of the same speakers grace both events, but the IDUG conference features more “user experience” presentations. Conference attendance peaked in the late 90’s in the 2500 attendee range. As with most travel-related events, tight budgets have pushed registrations down since then. This year’s attendance was 1300, trending up again. The conference was very upbeat, and IDUG is positive about a return to growth.

Why this conference?It is difficult for technology managers and senior data management technicians to find meaningful training that can provide both breadth and depth in targeted technologies. The IDUG conference(s) have proven, in my experience, to deliver the best value as a training alternative to pure depth training (e.g. a 4-day class on Database Performance Tuning).  I have attended IDUG 3 other times over the years and have found the quality and quantity of educational value to be consistent and cost beneficial.

In addition to the conference content, there are ongoing opportunities to discuss technology with DB2 developers, customers, and vendors across a broad continuum of database, tool, and related product issues. For example, in a single venue, you can see demos of competing vendor products and talk one-on-one with experienced vendor implementation experts. In Special Interest Group sessions you can swap implementation tips and techniques and listen to others describe the proper and improper ways to use DB2 in ways you might be considering.  As one new attendee at this conference described to a colleague, the value of the networking opportunities alone exceeds the price of admission.

This year, I had a vested interest in the conference as I started volunteering support for IDUG Regional User Group services last fall. I stumbled onto this opportunity as I planned relinquishing my 6-year role as chairperson of the New England DB2 Users Group in June 2004.

[NOTE: throughout this paper, I use “DB2” as the name for the product across all platforms; “UDB” is widely misused as a differentiator since DB2 on every platform is labeled “DB2 UDB for ‘OS’”. When necessary to differentiate, I will use “distributed DB2” or, more formally, DB2 UDB for LUW (Linux, Unix, Windows). IBM has also begun using DB2 Multiplatform (which seems to be the DB2 Tools group moniker for LUW). The various names of the product don’t do

Page 2: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

DB2 any great favors, and I have pointed this out on trips to both the Silicon Valley and Toronto Labs. As a Regional User Group Chairperson, I have tried very hard to follow the proper naming usage. Often, IBM salespersons are the worst offenders. I am reminded that Shakespeare may have felt the same way when he asked “What’s in a name?”.]

What did I learn?To answer that question, I will provide a brief set of details for each of the sessions that I attended over the 4+ day conference. In addition, I will share the conference proceedings by several means including presentations to technical groups within FPCMS.

As a prelude, the goals that I set before attending IDUG comprised:obtain up-to-date knowledge on the next distributed DB2 release (“Stinger”)update my view of DB2 in the DBMS landscape and that of IBM as a preferred vendor of data management productsbecome certified as a DB2 UDB for Linux, Unix, Windows DBA by passing DB2 Certification exam 701 (I passed the prerequisite DB2 Family exam last fall).

My participation met all the preceding goals in addition to providing the opportunity to discuss database and data management practices with many diverse users of DB2 technology.

Detailed Session Information:Monday, May 10, 8:30 AM--Keynote Speaker Presentation:Dr. Pat Selinger is an IBM Fellow and Vice President, Data Management Architecture and Technology, IBM Software Group, at the IBM Silicon Valley Lab. She is best known for her seminal paper on database optimization technology that laid the foundation for the first ever relational database optimizer. Today, Pat leads IBM’s efforts in information integration of data across all targeted sources.

Pat set the stage with a discussion of DB2’s recent growth (15% in 2003, 13% in Q1 2004) and defined the IBM Data Management goal: get control of unstructured data by building an information infrastructure with customers and partners to provide a total solution for integrating structured and unstructured data. IBM’s key data management investments are in: Database Services and Tools, BI and Advanced Analytics, Content Management and Information Integration.

Pat introduced several important players in the IBM organization to provide status on critical initiatives. Curt Cotner from the Silicon Valley Lab updated the attendees on DB2 Version 8 on zOS, the largest single database release ever for IBM. Jeannette Horan, the new VP of Development for Information Management, described the role of middleware as componentry for enabling IBM’s “e-Business On-Demand” initiatives from pervasive computing to delivery of regulatory compliance such as Sarbanes-Oxley.

The key message that Pat imparted, however, was contained in a list of 6 words:Integrated, Linux, XML, Content, Real-Time, and Autonomic. This list defines the vision for IBM data management  work in progress. From a DB2 perspective, “Integrated” is defined by

2

Page 3: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

the DB2 Information Integrator product which enables Federated Heterogeneous data access across key data platforms. “Linux” embodies the support IBM has injected to the open source OS in terms of its viability for delivering DB2 solutions. “XML” is another area where IBM has placed a lot of resource (e.g. SQL creator Don Chamberlain in the XQuery development). “Content” focuses on the efforts to bring unstructured data under control via products such as DB2 Content Manager. “Real-Time” targets the need to shrink the time delay for decision-making data delivery (today’s data warehouses are more likely to require 15-minute “fresh” data as opposed to overnight refreshes). Finally, “Autonomic” computing (the ability of computer systems to respond to internal stimuli and automate as much of the performance and tuning load as possible) is being delivered in products across the IBM spectrum of software. [For more detail on Autonomic improvements in DB2 see sessions below.]

Monday, May 10, 10:30 AM—“DB2 UDB V8: Exploiting Its New Advancements in High Availability”; Matt Huras, IBM Toronto Lab:Matt Huras, a Lead Architect for DB2 in the Toronto Lab, became a familiar face over the next several days as he delivered several presentations outlining the new features currently being developed for DB2 on Linux, Unix and Windows (LUW). This presentation was mainly an overview of Version 8 features for those who either had not moved to Version 8 or were cautiously exploiting the many new features. In addition, it laid the groundwork for several ensuing discussions regarding the expansion of V8 features with the “Stinger” release (fall 2004).  Most of the technology upgrades reviewed have been covered in previous Version 8 presentations (see my New England DB2 Users Group presentation: “DB2 UDB for Linux, Unix and Windows Version 8 Overview and Early Observations”): Online “trickle” REORG; Online Index CREATE/REORG; Online LOAD; Online Container maintenance (Drop Container and Adding Stripe Sets—add container w/o rebalance); Dynamic Bufferpool commands and Configuration Parameter changes; Infinite Logging and other log enhancements; and early steps into utility throttling (Backup in V8 Fixpack 2). Many of these features are taken to a new level in Stinger and are covered in more detail in those presentations.

Monday, May 10, 12:40PM—“DB2 Information Integrator—A Sneak Preview of Q Replication”; Beth Hamel, IBM Silicon Valley Lab:Beth Hamel, though working from the Silicon Valley Lab, manages the Replication work across all the DB2 platforms. When I was in FISC Data Engineering, we spent a lot of time giving Beth requirements to enable DB2 replication to compete with Sybase Replication Server. She listened as evidenced by the pace of improvement in DB2 replication. Since replication in DB2 LUW has been free (it is an added charge on the mainframe), we have been willing to accept less in the product than the chargeable Sybase technology. Note that I said “has been” free. I wanted to say this up front because it was the big disappointment for me in this presentation. After reviewing the new architecture and great performance features of Q-based replication, the bombshell was dropped at the end. Q replication would only be delivered in DB2 Information Integrator (Replication Edition), which is an added cost product. Though significantly cheaper than Standard DB2 II, the Replication Edition’s cost could be a major barrier to its implementation, both here and industry-wide. So, the free replication technology delivered with the DB2 engine will continue to be change-table-based, at least for now.

3

Page 4: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

Now that I’ve stopped the replication mavens from salivating, I might as well cover a few of the key features that will be delivered as Q-based replication. First, the reasons for a new architecture are: high availability, high volume and low latency (pretty much the mantra of replication for critical OLTP applications). The solution is, at least initially, targeted at a small number of servers and adds better multi-directional functionality, event publishing, a table-diff utility, and manageability improvements (mostly due to fewer objects to manage—all those Change Data Tables go away). So, the architecture’s major change is that instead of the Capture task scraping the DB2 log and delivering row changes to interim DB2 Tables, it delivers the changes to Websphere MQ message queues. [NOTE: a limited MQ license is provided with this technology.] From there, the Apply task pulls the messages to send the updates to the target database(s). Each MQ message is a transaction. The architecture is advertised to deliver much better parallelism for the apply task; better conflict resolution; and superior performance due to fewer writes (to change tables), elimination of fetches in the apply phase, and a new apply browser task (delivers the parallelism). Some of the high points of the technology include:

Queues are set up in MQ since there are likely to be experts at queue management already available in existing teams

Queues are cleaned up (pruned) by background tasks Persistent queues are used for recoverability Multiple queues can be used with multiple apply browsers, BUT, for transaction

consistency, all related data should be in the same queue There are no limits on subscription set size in this architecture! Q-based replication can be mixed with SQL-based replication (e.g. use multiple

Captures: one SQL-based, one Q-based) Publishing can be accomplished by running Capture without an associated Apply (an

sample usage might be integrating captured changes with an ETL tool to perform complex transformations prior to distribution); there is no Publish-Subscribe yetSQL replication is still better for one source to many targets

Subscription types: unidirectional,  bi-directional, peer-to-peer Subsetting can be accomplished at the row level (apply predicates or do lookups on other

data) and/or column level Deletes can, optionally, be ignored Capable of 15K rows per second with less than 1 sec latency Several conflict detection options for peer-to-peer (value-based and version-based) Administration done via Replication Center or Command Line (scripts or interactive);

new Q Create Subscription Wizard in GUI Table Diff Utility (aka Reconciliation Utility) compares source and target tables; brings

Sybase Subcompare function to DB2 replication Q Replication is the IBM blueprint for event publishing (from DB2 and beyond) and lays

groundwork for expansion into heterogeneous database replication environments.

Monday, May 10, 2:00PM—Vendor Solution Presentation; Venetica Software:The vendor solution presentations enable the third-party vendors attending the conference to showcase the various DB2 solutions that they provide. Venetica is a specialist in the content management arena (Enterprise Content Integration) and delivers a product called VeniceBridge that integrates with DB2 Information Integrator to build access to various unstructured data

4

Page 5: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

sources. Out of the box, VeniceBridge comprises pre-built, real-time, bi-directional adapters to sources such as Documentum, FileNet, Interwoven, etc. In concert with DB2 II, it can add access to workflow and content data sources. In addition, it integrates with DB2 Content Manager as an add-on to DB2 CM’s available content sources.

Monday, May 10, 3:30PM—“Tuning DB2 in the New Galaxy”; Phil Gunning, Gunning Technologies:Phil is a long-time Gold Consultant to IBM who has worked with both DB2 mainframe and distributed platforms. He recently published a very good book on DB2 Version 8: DB2 Universal Database V8 Handbook for Windows, Unix and Linux.

In this presentation, Phil defines the “New Galaxy” of DB2 by contrasting DB2 Version 7 capabilities with the features of DB2 V8 and the upcoming Stinger release. He adds the fact that SAN technology is becoming de rigueur for data storage in this new galaxy. The technical details revolve around features such as direct IO/concurrent IO to improve performance in DB2 filesystem data access to be nearly equivalent to raw device access. Other key areas include Linux, Historical Performance tracking (“write-to-table” monitoring in V8), 64-bit addressability, Storage Provisioning (similar to SMS on the mainframe), Connection Concentration, etc.

To highlight several of the details, “direct IO” is important to the performance/usability trade-off as it bypasses filesystem buffering (via a DB2 registry variable in 8.1 FP4); the subsequent performance is close to raw device IO and does not come with the added management requirements. Phil also discussed new Page Cleaning algorithms in 8.1.4 enabled via another registry variable. He discussed the new ability to create a performance warehouse by using the V8 write-to-table monitoring options. An important hint he dropped regarding upcoming features mentioned that Storage Provisioning similar to functions available for years on the mainframe should be available within the next year (maybe not by IBM but competing vendors). This was a huge productivity boost for DB2 on OS/390 (zOS) and would be welcomed on distributed DB2.

Many other features were covered in more detail in Stinger-specific presentations. I will leave the details to those reviews.

Monday, May 10, 5:00-8:00PM—Regional User Group Leaders Business Meeting and Networking Session:This was one of my volunteer role responsibilities. Prior to the business meeting, the volunteer team met to validate our goals for 2004-2005. This was followed by a presentation to all the Regional User Group Leaders and IBM User Group Liaisons that were attending the conference. In addition, several members of the IDUG Board of Directors joined us since significant funding is being directed to reach out to the local user groups. I co-presented the business plan with, Paul Turpin, a RUG Leader from North Carolina. The presentation went well. It was followed by an opportunity to network with fellow user group leaders and several IBM dignitaries. Keynote Speaker, Pat Selinger, who has also spoken at Fidelity and the New England DB2 Users Group, attended, and I had the chance to chat with her over a glass of Shiraz. In addition to being one of the leading lights of DB2 technology over the past 3 decades, Pat is very down-to-earth and an

5

Page 6: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

excellent source of practical “futures” information. I could feel my SQL knowledge enhanced just by proximity to the Optimizer Guru.

Tuesday, May 11, 7:15 AM—Regional User Group (RUG) Leadership Special Interest Group Breakfast:This was another business meeting of sorts for the user group leaders at IDUG. I moderated a discussion of key topics of concern to the North American RUG heads. In general, our conversation covered management, costs, membership expansion and various strategies for structuring the groups. However, the most interesting dialog was led by one of my fellow volunteers from Canada, Yvonne Kulker. She is managing the startup of several mini-conferences on the West Coast this fall. The agendas sounded fascinating and the costs very favorable. I’m hoping we’ll get the chance to do these on the East Coast next year. Though Boston is not a likely target, New York/New Jersey/Pennsylvania is prime DB2 territory.

Tuesday, May 11, 8:30AM—“DB2 UDB in 2004: Technology Update”; George Baklarz, IBM Toronto Lab:George is an interesting presenter with very broad database knowledge. He often sprinkles his discussions with comparisons of how DB2 implements a particular feature versus the competition (Oracle, SQL Server). This presentation was a preview of the various Stinger sessions that followed.  There was an interesting breakdown of the investment in the Stinger release: 56% of the enhancement costs were targeted at Customer/Partner requirements, 22% at technology enhancements (e.g. autonomic), 11% for Information Integration, 5% Business Intelligence, 4% Linux and 2% IBM infrastructure.

In terms of technical highlights, SQL enhancements, Security improvements, Development support (updated Design Advisor), Autonomic Computing, Linux changes and HADR (High Availability Disaster Recovery) were key.

Starting with SQL, there have been a plethora of minor tweaks:

larger SQL statements (now 2Mb, up from 64K), SET LOCK WAIT on individual statements, CALL statements in triggers, a new BIND parameter (REOPT) that enables optimization at run time for queries with widely varying host variable input, new ALTER capability for IDENTITY columns, and nested SAVEPOINTs.

 

However, the big feature from my perspective is called “Native PSM (Persistent Stored Modules—aka Stored Procedures)”; essentially, the new SQL Procedure Language generates byte-code and an execution plan (stored in the Catalog) instead of relying on the C compiler. Performance should be comparable and the headache of managing C compilers is gone.

6

Page 7: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

Security improvements included data encryption (on the wire), new authentication types, 56-bit encryption, etc. These will be covered in more detail in the next session. The long-term security strategy is Secure Sockets Layer.

The Design Advisor brings broad new capabilities to the development arena. Now, in addition to Index advising, this tool will be a one-stop shop for design advice on Materialized Query Tables (MQTs), Multi-Dimensional Clusters (MDC) and Partitioning (if using the optional Data Partitioning Feature, DPF). 

Autonomic Computing comes to life in Stinger. Some of the neatest functions being transitioned to autonomic are Automated Backup with Policies (essentially, the rules to run: when, how often, etc.), self-tuning backup and restore, Automated Runstats, Automated REORG with Policy (offline only), integrated/automated log file management, and simpler memory configuration (specify minimum guaranteed memory and let DB2 manage the usage).

Linux improvements feature CPU pinning on Linux (tie specific DB2 processes to specific CPUs), use of NUMA (Non-Uniform Memory Architecture) technology, Asynchronous IO, big memory page support with Linux 2.6 (64Gb on a 32-bit system w/4Gb memory limit), etc.

Finally, HADR, the topic of its very own session (reviewed later in the week), is introduced as an alternative to log shipping and DB2 replication. In a nutshell, it provides log transaction shipping (as opposed to the full log feature available today) to a secondary site. It will be useful for high availability situations where simplified maintenance is critical, finer recovery granularity than full log file is required and the secondary site is not used for read processing. See the HADR presentation review for full details. This is very important new feature.

Tuesday, May 11, 10:00AM—“The New DB2 Authentication Model”; Il-Sung Lee, IBM Toronto Lab:I have to admit, this was probably the most difficult session for me. Il-Sung covered the material very well, but this area is so far from my strong suit that it could be gym shorts. Of course, that’s one of the reasons I attended.

So I don’t go too far beyond my depth, I will attempt to provide a list of key changes in the model with Stinger. The biggest change is that all authentication, group lookup and AUTHID mapping are accomplished with external plugins (as opposed to the OS system authentication and Kerberos approach used in previous releases). In addition, new features such as userID/password remapping and the ability to refuse connections based on communication information have been added. Authentication plugins can be for userID/password or GSS-API (General Security Services Application Programming Interface). Plugins are shared libraries dynamically loaded by DB2. The existing security methods provided by OS and Kerberos are still available but re-implemented as plugins. Three types of plugins are required: client, server and group.

Why plugins, you might ask? IBM’s answer is straightforward: improved flexibility and extensibility; DB2 security can be tailored to interoperate with other system security

7

Page 8: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

components; new or obscure authentication techniques can be easily added; and multiple authentication types can be used concurrently.

Most of the presentation dwelled on the details of implementing plugins and the nuances of each type. I won’t go there, and you’ll be glad I didn’t. 

There are several restrictions related to AUTHID case and length, but those are no more restrictive than today’s implementations. In the future, plugins will run in a fenced process, delegation support will be added and multiple flow authentication (e.g. an iterative challenge/response scenario) will be increased from the flow limit of 2 currently imposed by DRDA.

Tuesday, May 11, 12:40PM—“64-bit DB2 on Linux”; Berni Schiefer, IBM Toronto Lab:Berni is a performance guy (and a darned good one) so it was timely to entertain his advice on Linux implementations. In his overview, he made a key point: Linux is maturing faster than any OS in history.  He supported that statement by contrasting the state of the OS in 1998 when Linux ran on inexpensive uniprocessors as a desktop alternative for computer geeks (excuse me, aficionados) with 2004 when Linux distributions (“distros” to the enlightened) are expected to support 16 to 32-CPU systems. He gave examples of the leading distros, SuSE SLE9 and Red Hat RHEL4, both Linux 2.6 kernels, designed for 32-way scaling with 16-way available by Q3 2004. Also, the availability of 64Gb memory support on 32-bit systems and Enterprise Volume Management (e.g. disk support) are key to this year’s progress in the revolution. DB2, the first commercial DBMS to have Linux support (since May 1999), has marched in step with this progress.

So, why is 64-bit support such a big deal? To clear any arguments on the performance side, Berni stated that there is no inherent performance benefit to being on 64-bit and could even argue that the larger footprint on the  box would run slower if related improvements didn’t mask this fact. However, when you leave 32-bit, you leave behind a lot of convoluted memory management tricks that affect usability and manageability. In addition, 64-bit is the foundation for the future across the industry. The sooner you get there, the faster you can get to new solutions that will be built on that foundation.

The latest Version 8 improvements for Linux include the use of scattered IO via a new registry variable, the ability to yield the CPU on a scheduled basis (which can result in better concurrent OLTP performance), and increased addressability on 32-bit systems that are not tied down by direct addressability constraints. In Stinger, the list of pending improvements comprise: use of  Direct IO, CPU and Memory pinning, improved scalability through internal locking (latching) updates, better monitoring performance with reduced overhead in handling the monitoring switches (Berni’s favorite switches? Bufferpool, Table and Statement), asynchronous IO (via a registry variable) and large kernel pages (more new registry configuration).

Some time was spent covering the reasons for/against moving to Linux kernel 2.6 (Berni couldn’t come up with a compelling reason at this time) and whether a thread-based processing model would be implemented in DB2 on Linux (again, no compelling short-term reason). Berni’s advice was to go with the standard kernel from your shop’s selected distro, use 64-bit

8

Page 9: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

only if you have more than 4Gb of RAM on your box, don’t use the anticipatory IO scheduler in OLTP environments (use the deadline scheduler) and use the distro preference for filesystems (e.g. ext3 for Red Hat).

The most useful slide in the presentation from my perspective was a flow depicting how to select a 64-bit Linux platform based on a simple decision tree. After reviewing the various hardware options, Berni concluded with a couple simple thoughts: Stinger brings 64-bit to all supported platforms; the 2.6 kernel will yield benefits, but more validation is required to determine the correct time to upgrade.

Tuesday, May 11, 2:05PM—Vendor Solution Presentation; Princeton Softech, “Database Archiving—When and How”:This was another vendor presentation--in this case, a vendor whose products I knew somewhat. Princeton Softech has been around in the “test data build” and archive space for some time. Their products are generally perceived as best of breed in their niche. I attended this session to see if I could pick up any tidbits or rules-of-thumb regarding when to archive, etc. I also wanted to see if they talked about market penetration on the distributed side since I was more familiar with their mainframe products. They did talk of being established on all major DBMS platforms, but there were no nuggets to take away other than “our products have been successfully assisting Fortune 500 companies with their burgeoning data requirements for many years. The major analysts (e.g. Gartner, Meta) believe our solutions are the most complete in the marketplace.”  Among the challenges they encounter, the list was topped by Referential Integrity (DBMS-enforced and application-embedded) and the rapidly growing data stores facing their customers. If anyone wishes further details, I took copious notes.

Tuesday, May 11, 3:30PM—“Cut Query Times in Half: Sort and IO Tuning”; Scott Hayes, BMC (formerly Database Guys, Inc):Scott is bearish on SORTs, in particular. Anyone who has seen his presentations immediately recognizes his style. He reviews his “Science Experiments” in his talks and whittles away at performance until he reaches his goal. His work is very interesting because he takes a very logical, methodical approach to tuning. The presentation is peppered with formulas to measure things like “%Sort Overflow”, tips on what to do if an OLTP application exceeds 3% (sort overflows), and how to avoid Sorts in the first place. With his science experiments, he points out the impact of improper container placement (“Put TEMPSPACE where data isn’t”), the impact of Intra-partition Parallelism and when not to use it (most OLTP environments), when one large bufferpool can work better than many smaller, targeted BPs (many DSS environments) and the associated tuning for the number of IO Servers and Page Cleaners. His “take home tips” list included:

for transaction-based applications: keep sorts small or eliminate; reduce sort size (# rows, row width [V8 Tip: no longer need ORDER BY column in SELECT list]); as a last resort, increase SORTHEAP;

for DSS queries: SORTS Happen! Set Database Manager configuration MAX_QUERY_DEGREE = #CPUs; optimize IO paths via #containers and placement on devices, prefetch size, number of IO_SERVERS, one Bufferpool for data and

9

Page 10: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

TEMPSPACE  (Warehouse; multiple BPs for OLTP). 

Tuesday, May 11, 5:00PM—RUG volunteers meet with current IDUG PresidentIn my final volunteer-related gig, I had a chance to chat with David Buelke, a well-known IBM Gold Consultant and current IDUG President. He asked the volunteer team what we liked and/or what we would change related to IDUG’s current initiatives.  Our feedback tended to center on the new mini-regional events as a positive direction to get the IDUG and DB2 word spread further and deeper. As RUG leaders, we also asked for continued and broader user group support from both IBM and IDUG.  Funding, speaker provision, and local leads (companies starting up with DB2 that could be invited to participate in local user groups) were all trends we wanted to expand.

Wednesday, May 12, 8:30AM—“DB2 UDB V8 Memory Management Updates”; Dwaine Snow, IBM Toronto Lab:Well, here we are in another of my specialty areas (NOT). I must admit I felt better after this session, particularly on learning the details of DB2’s improved (simplified) memory management. This presentation did a nice job level-setting current memory techniques and the improvements made in early Version 8 (prior to Stinger). First, the distinction was made between system (OS-level) resources (physical RAM, process virtual memory and system virtual memory—paging/swap space) and DB2 resources (memory heaps). At DB2 start, shared memory (e.g. Bufferpools, shared sort space, application control heap) is allocated though physical memory is not used till referenced.  Paging space may or may not be consumed immediately depending on OS (for example, Solaris does early swap allocation whereas AIX defers swap; this “lazy swap” on AIX can cause paging space to be volatile whereas the Solaris swap use is easier to predict). Common out-of-memory situations in DB2 are attributed to (1) a request that exhausts a heap (hard limit within DB2) memory manager (most private heaps, e.g. statement and application, are allocated on demand) or (2) DB2’s memory manager makes a request to the OS that can’t be satisfied.

Currently, 32-bit DB2 runs into shared memory limits due to the 4Gb virtual memory addressability limit. Shared memory at the database level can include bufferpools, database heap, utility heap, package cache, sortheap_threshold_shr, catalog cache, lock list and approximately 10% overhead. By OS, the management of shared memory differs (e.g. Solaris pins all shared memory by default). Increasing shared memory on a 32-bit system can get tricky as there is limited room to expand things like Bufferpools. As of Version 8 Fixpack 3, online changes impacting shared memory will be changed to DEFERRED (next startup) if there is insufficient available memory. You can use the DATABASE_MEMORY configuration parameter to reserve space for potential increases although this could waste memory on some systems (Solaris) where shared memory is pinned. The INSTANCE_MEMORY parameter can be used similarly at the instance level (AUTOMATIC is the best setting at this point).

In 64-bit mode, memory allocation issues virtually (pun intended) disappear. On AIX, a 64Gb shared memory segment is allocated up front so it’s easy to find room for all the database memory resources.

10

Page 11: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

If heaps are exhausted, the recommended approach is to increase the size of the heap (if not OS exhaustion); use Version 8 Snapshot or Memory Visualizer to check persistent heap sizes. Sortheap allocation failures are usually related to insufficient private memory for the sort; complex application plans can result in multiple sortheaps allocated concurrently. Going back to Scott Hayes’ tuning presentation, “the best sort is no sort” (and easier to tune).

Wednesday, May 12, 10:10AM—“DB2 Design Advisor: Not Just for Indexes Anymore!” Sam Lightstone, IBM Toronto Lab:This was a very interesting presentation, perhaps my favorite that wasn’t delivered by Matt Huras. Sam did something that I really like: he set the stage with some historical perspective and very interesting statistics. For example, around 2000, the cost of disk storage became cheaper than paper storage! Now, who wouldn’t want to know that?

One of the artifacts of today’s system complexity is brought about by the pace of technology evolution; the ubiquitous internet; and interconnections among businesses, customers and vendors. Things have progressed beyond the human capability to understand the myriad moving parts. There are potentially thousands of tuning parameters in today’s complex infrastructures across a growing list of integrated products that continue to add feature upon feature with each new release. How can a human or even groups of humans continue to manage this freight train? IBM’s answer: autonomic computing. Build systems that can manage themselves and make complex decisions based on heuristics and learned behaviors. The goals and assumptions of the past 15 years in systems have been to improve performance and cost. Now, however, complexity has crept into the mantra. Instead of just faster and cheaper, we now have faster, cheaper, easier to manage.

So, what is IBM doing about it in the DB2 space? Enter the much-improved Design Advisor. In early releases of Version 8, the Design Advisor helped to recommend indexing improvements and started to address Materialized Query Tables (fka Automated Summary Tables).  Stinger takes the Design Advisor to the next level, addressing Materialized Query Tables (MQTs), Multi-Dimensional Clustering (MDC) and Partition design (if using the Data Partition Feature, DPF) in addition to indexes. The goals of the Design Advisor of the future is to provide a design that optimizes database design to produce the best average workload response time AND ease-of-use. The first implementation of these new goals is delivered in Stinger. It is key to the autonomic computing strategy, has no additional cost, and is a “push button” solution. The assumptions in all this are that the workload provided to the advisor is representative of the system performance to be optimized. Inputs to the advisor are the workload (via several options), system-detected database and system characteristics, and disk constraints. Also new in the Stinger release of the advisor is a concept called workload compression. What this provides is consistency in the time to deliver the performance recommendations. In other words, no matter what the workload size and complexity, the advisor will deliver the results in a reasonable amount of time.

Given the new scope of the Design Advisor, the complexity of the problem is enormous, but so is the value of the solution. Inherent in the assumptions above is that there are really good rules in the advisor to ensure the recommendation comes back in your lifetime. Here, IBM research is the key.

11

Page 12: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

In terms of the goal of ease-of-use, the advisor is fairly straightforward. Most variables on the db2advis command are optional. The compression options are simply high, medium and low. There is a time limit that governs how long it will take to return its results.  With a small workload (e.g. 20 queries), the selection of compression level will make little difference, but with highly complex workload input, that selection is key to the time to deliver. To capture the workload, the options range from a single query on the command line to text file input, derivation from the dynamic package cache, and input from Query Patroller (extra cost product) or event monitors. The GUI interface can be used to add things like frequency to the workload to bias or change results.

Under the hood, the Design Advisor is applying a whole bag of tricks to the analysis of the workload. “What if” analysis is used with the query optimizer to evaluate alternatives, multi-query optimization is used to find commonalities among queries, statistical sampling is used to model cardinality and correlations that the query optimizer might miss, and sophisticated search algorithms are borrowed from the Artificial Intelligence community.

Interestingly, tests were conducted using TPCH workloads to experiment with the Design Advisor’s capabilities. The rules of the TPCH benchmark prohibit using tools like the advisor, so it was unavailable to the experts tuning the actual benchmark. However, the advisor was able to reduce performance 6.5 times. Other experiments included a test against a similar MicroSoft design technology. Using the same database, server and workload, the results (which could not be directly divulged due to licensing issues) were astoundingly different in DB2’s favor. Finally, a test was done pitting the Design Advisor against 3 of the best performance gurus at the Toronto Lab. The advisor beat all but one expert. The moral of this is that the best human expert can still beat a tool, but most of the mortals can’t. How good are you?

Wednesday, May 12, 12:40PM—“DB2 Kernel—The Inside Tour”;  Matt Huras, IBM Toronto Lab:Another good one! Matt really knows his stuff and presents it in an understandable fashion. He defined the kernel as the inner core of DB2 comprising data and index management, logging, buffer and IO management and locking. Since most of his presentation spoke to diagrams, I won’t spend a lot of time reviewing. I do recommend this as a good overview for understanding the internals of DB2. Matt went through each component making up the kernel, describing what each does, how to influence them positively (e.g. related configuration parms) and how to address problems related to each.

Also mentioned, features added in Stinger to the various component managers. For example, the log manager has been updated to write more data to the monitor output to identify where log disk allocations may be insufficient and the log buffers undersized. Additionally, Stinger revises the archive log process. A boon for systems administrators, the user exit for creating archive logs is now replaced with 2 straightforward configuration parameters. On the IO manager side, changes in Stinger include the ability to allow DB2 to automatically determine the best prefetch size. Check out this presentation if the inner workings of the DBMS interest you.

Wednesday, May 12, 2:05PM—Vendor Solution Presentation; IBM “Multi-Platform Tools”:

12

Page 13: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

I haven’t paid a lot of attention to the status of the IBM Multi-Platform (Linux, Unix, Windows) Tools since coming to FPCMS, so here was a chance to catch up. The message seemed very positive, but I still have a hard time finding other customers using IBM tools outside what is included with the DB2 installation. The tools covered included: DB2 Performance Expert, Recovery Expert, High-Performance Unload, Test Database Generator, Table Editor and Web Query Tool. They have all come a long way since I last paid attention, and I think they have a future. I’m just not sure where their cost-benefit sweet spot exists. For companies with IT functions that are fairly decentralized, there may be no sweet spot.

Wednesday, May 12, 3:30PM—“Automate Statistics Collection in DB2 Stinger”; Volker Markl, IBM Research:This presentation addressed one of the important autonomic features in Stinger and provided lots of details including the impact of the LEO (Learning Optimizer) Project from IBM Research making its way into the distributed DB2 engine. Of course, being from the research side of the house, it was a relatively complex presentation.

As background for the new features, Volker reviewed the job of the Optimizer. When SQL is supplied to DB2, it declares what data is desired but not how to get it (that is the beauty of the relational model). It is the Optimizer’s job to figure out how to obtain the data in the most efficient, cost-effective manner. To do this, the Optimizer mathematically models the cost of execution for the various available methods of accessing the data. These models depend heavily on the cardinality of  data values. Since this information can be inaccurate, bad access plans can be created. The majority of bad plans are due to outdated statistics, data skew, correlations between join and local predicates, and assumptions that must be made when cardinality values are unavailable. Simplifying assumptions are nearly always made because the modeling takes place before execution.

To address these issues, one might ask “why can’t the database fix itself?” If the process of collecting statistics on the data in the database tables could be automated, this would be possible. However, it would be best if it were a continuous process for maximum reliability. And, of course, it should be transparent to the user.

The IBM solution is to build a low-impact, throttled process (with throttling parameters for the user to manage the impact) to collect statistics from update activity in the background. The process is controlled via a Policy that defines the tables targeted for this automated support and the window within which it is accomplished. The policy is managed via a RUNSTATS profile which filters feedback from various monitors and analyzers to determine if RUNSTATS should be executed. In addition, certain assumptions are made (e.g. tables defined as volatile are ignored).

Some of the questions addressed are:

When to use? Answer: in general with throttle set to low impact. How to activate? Answer: via command line processor using several new config parms or

via the Control Center (right click on database and select “configure automated statistics

13

Page 14: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

maintenance”). 

There are additional features added that target an existing challenge when correlation values among correlated columns are not known. An example is in the automotive industry where the manufacturer (e.g. Honda) and make (e.g. Accord) have a relationship that is unknown to the optimizer (i.e. Accords are only manufactured by Honda, hence, making normal predicate filtering assumptions invalid). The automated collection of grouped (e.g. correlated) columns can help here and is another feature of the Stinger release (both for automated and normal RUNSTATS execution). Further, the automation of figuring out these correlations is added via features from the LEO project. It is called Automated Statistics Profiling and uses the results of feedback from previous queries to determine where the Column Group Stats will provide benefit. To support the collection of this information a Feedback Warehouse is created in the SYSTOOLS schema to save the learnings of the optimizer. Stats Profiling can be run in two modes: Recommendation Mode which only stores recommendations in the Feedback Warehouse and Modification Mode where the recommendations are applied by changing the RUNSTATS Profile for each table that has query feedback indicating better stats are needed. There are a bunch of new config parms to drive turning all this neat stuff on. Whew!

When would you use this feature? In general, complex OLAP and DSS applications are the target. It could be turned on and off in certain systems to assist in debugging problem queries or left on in systems that can tolerate the continuous overhead.

Currently, the automated stats features are only available for DB2 ESE. 

Wednesday, May 12, 5:30PM—Speaker and IBM Developer networking session:This was an opportunity to talk with the speakers and IBMers at the conference. I engaged several of the user speakers to see how the other half lives. I had a hard time finding user speakers on distributed DB2 topics. However, I did discover that my co-presenter at the RUG Business Meeting did a presentation on improving distributed DB2 performance that saved his company roughly $2M.  So, we talked and I discovered that much of the tuning work we were doing on GPWS paralleled the steps he had used.

Thursday, May 13, 7:58AM—DB2 Certification exam #701: This was the last day to fit in my certification exam, and the pressure was on! I checked with the registration folks late Wednesday as to what time I’d need to queue up to make the first wave of exams (since I wanted to miss as little of the conference proceedings as possible). They told me 8AM. I arrived at 7:58 and was second in line. By 9:30 I was certified, and the pressure was off.

Thursday, May 13, 10:00AM—“High Availability and Disaster Recovery”; Dale McInnis, IBM Toronto Lab:HADR is one of the coolest features delivered in Stinger. Not only is it a valuable function that provides a viable alternative to log shipping and DB2 replication, but it delivers on the promises that IBM made when they acquired Informix. They said they would take the best features of Informix and fold them into DB2. Most DB2 bigots didn’t believe this would happen and felt there wasn’t much worth porting. They were wrong then and have been proven wrong with this

14

Page 15: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

feature. When the DB2 LUW Panel was asked what feature in Stinger was their favorite, the first one picked was HADR and each ensuing panel member said “well, HADR is already picked, so I’ll go for ‘feature x’.” Telling. . .

As a lead-in to the DB2 implementation discussion, Dale McInnis related a 9/11 story . A key Informix financial services customer in New York City lost their data center services but simply recovered on the other side of the river using HADR with no loss of data. There are thousands of Informix customers using this feature for disaster recovery.

The problem with High Availability and Disaster Recovery is that the cost of complete redundancy and zero data loss is so high as to be practically unattainable. So, the goal is to minimize cost while shrinking data loss. That’s where HADR comes in. The key attraction of HADR is its fit into the sweet spot for disaster recovery operations: low cost, simple to implement. Essentially, it takes the log shipping concept one step further to the log transaction level, bringing the maximum data loss down from the level of a log file to a log buffer. Data loss is confined to what’s in the log buffer at failure time that didn’t get flushed.

The goals for the HADR design are:

very fast failover ease of administration minimal performance overhead configurable granularity of data loss exposure ability to upgrade software and avoid downtime simple integration with (and eventual elimination of) HA software transparent failover/failback (with automatic client re-routing)

 

The initial design is for 2 active machines, a primary and standby. The standby is cloned from the primary and receives log transactions shipped from the primary. It is in perpetual rollforward mode, hence, at least in the initial version, the standby cannot be a read-only reporting environment. If the primary fails, the standby takes the workload. When the  primary is available again, resynchronization makes it easy to return it to primary status.

HADR setup is very straightforward (a wizard is available in Stinger): (1) clone the primary database using restore [NOTE: database names must match], flash copy or split mirror, (2) start the standby, (3) start the primary. At this point in its development, the rules of engagement dictate that strict symmetry at the tablespace and container level is maintained across the primary/standby to ensure that any container operations are replicated. Names, paths, sizes must match. On starting the standby, local log catchup occurs followed by remote log catchup (primary must be available). To ensure that only one primary is designated at any time, the primary start waits for the standby contact (unless force-started); it will wait a configurable period of time, after which HADR will not start. After the primary is started (“peer state”), HADR is active and failover to the standby is available. A heartbeat process is established between the two servers to monitor status. Whenever log buffer pages are flushed to disk (commit, log full, backup, database deactivate), the same pages are pushed to HADR and

15

Page 16: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

shipped to the standby. A log shredder process breaks down the log buffer data to individual log records (only updates are shipped).

The granularity of data loss exposure between the 2 HADR servers can be controlled by the Synchronization Mode (a new database configuration parameter, HADR_SYNCMODE). There are three choices:

1. Synchronous (zero data loss)—log data is flushed to stable storage on the standby; the log flush and TCP send/receive are serialized; standby will not write to log until primary says the same log data has been written to disk; primary cannot move to next log flush until standby acknowledgement (log data written to log); commit successful when log data on disk at both sites.

2. Near-Synchronous—log data successfully sent to standby but it may not be on stable storage when primary commits; log write at primary and send to standby occur in parallel; commit successful when log data on disk at primary and received at standby.

3. Asynchronous—log data given to TCP/IP and socket send to standby returned successfully; does not indicate successful receive (!); since TCP/IP sockets guarantee delivery order, while socket is alive there will be no missing or out-of-order packets at standby; log write at primary and send to standby occur in parallel; commit successful when log data on disk at primary and sent to standby. 

Failover from primary to standby is accomplished via the TAKEOVER command. In normal operations, this would change the standby to primary when the primary fails. This avoids the current process of db2start/restart database/rollforward required with HACMP and other clustering implementations. It can be embedded in simple heart-beat scripts and will be delivered via a heart-beat/coordinator in a future release. The TAKEOVER is issued at the standby in either normal (no force) or emergency (force) mode. For client applications connected to the primary, automatic re-routing to the new primary is available. They will see a communication error for the failed transaction, but the reconnect will be done for them. In addition to supporting normal failover situations, TAKEOVER can be used to facilitate software upgrades on-the-fly to avoid downtime. While in peer state, the standby can be suspended, HADR stopped, standby upgraded, standby started (catches up with primary), TAKEOVER issued, new standby suspended, etc.

As mentioned above, there is an HADR Wizard to implement HADR. It walks the user through identifying the server pair, preparing the primary for shipping, executing a backup, using the backup to clone the standby with a restore, moving any database objects not included in the backup, updating HADR config parms, and, optionally, starting HADR. Note that the config parms (7) impacting HADR do not enable/disable HADR; this separation allows disabling HADR without losing config info. To monitor HADR, the configuration settings are available via “db2 get db cfg for ” and “db2 get db snapshot. . .” will show “HADR Status” info. An interesting bit of data in the snapshot is the “Log gap running average” which shows how far behind (in bytes) the standby is.

16

Page 17: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

Of course, everyone will want to know what the performance impact of running HADR is on the existing systems. In theory, the standby (assuming a similar machine) should be able to apply transactions faster since it doesn’t have to worry about locking, connection management, generating logs, etc. At the primary, choice of mode impacts the overhead. Synchronous and Near Synchronous mode will delay commit processing slightly. If the connection between the pair is lost, transactions could experience a commit delay equivalent to the HADR_TIMEOUT parameter setting. In the case of a failover, the performance will be similar to that at DB2 startup. Network requirements can be determined by monitoring log activity and measuring the logging rate. This rate will indicate the network transfer rate for planning. Future releases of HADR may introduce shipping non-logged operations which could generate significant traffic. Also, logging of index builds will increase logging requirements (if they are not logged, they will need to be rebuilt on the new primary after a takeover).

Backup and restore have a few restrictions when running HADR. Backup is supported on the primary but not on the standby (HADR does not replicate the Recovery History File). Database level restores are allowed on both sides BUT, after restore, the database role is changed to standard and will not automatically reconnect to the HADR partner. Tablespace level restores are disallowed on both sides as well as redirected restores.

Several additional ground rules for HADR: circular logging must not be used (logs are archived only at primary; standby deletes archives based on primary rules); log mirroring works normally on both sides; configuration parms are not shipped (may be in a future release); shared libraries/DLLs for UDFs and Stored Procs are not shipped (I need to check on the impact of the new native PSM compilation process to see if this precludes the need to worry about moving the executables from primary to standby). The following are unsupported in the initial HADR release: infinite logging, non-logged operations, DataLinks, log file backups on standby, LOAD with the COPY NO option, ESE DPF and any version of DB2 other than ESE. Also, the primary and standby databases must be at the same bit-level (32 or 64) and reside on the same OS version and DB2 version (except for the brief time during a rolling upgrade). At this time, there are only two servers allowed in the HADR configuration; a planned improvement to move to a three machine configuration to enable failover and DR. HADR does work with cluster managers (e.g. HACMP); the cluster manager is responsible for detecting failover and would drive by issuing the TAKEOVER command.

Finally, HADR was compared to the new Q-replication facilities implemented via DB2 II Replication Version. Significant differences draw a clear picture as to where each would be useful. Q-replication is significantly more difficult to manage (though easier than SQL replication) but can replicate subsets of data whereas HADR is easy to manage but replicates at the database level. For normal DR operations, HADR would be a good choice, but if the DR environment is required in read mode (e.g. reporting), replication is the solution. HADR can be configured to guarantee “zero data loss” whereas Q-replication cannot.

In summary, HADR is easy to use and automates shipping of log changes at the buffer level. Its goal is to perform a clearly defined problem well.  Stinger sets the stage for future HA and DR improvements.

17

Page 18: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

Thursday, May 13, 12:30PM—Spotlight Session “Stinger: An Insider’s Look at the Next Release of DB2”; Matt Huras, IBM Toronto Lab:

This was an excellent overview of Stinger.  I recommend review of this presentation to anyone wishing a complete view of Stinger without having to review multiple presentations.

Since I have already covered much of the detail in other Stinger presentation reviews, I’ll try to give a quick rundown of the tips that I may have missed in the other sessions.  I may be a bit redundant where I think new features may be very valuable.

In summary, the big, glitzy Stinger features include: HADR; Integrated Design Advisor; Automated Statistics Collection (and LEO features); Autonomic utilities such as BACKUP, RUNSTATS and REORG. Though not a Stinger feature (included in Masala), Q-Replication is another major enhancement. However, there are lots of smaller improvements that may actually outweigh the jumbo features in terms of immediate value. These are:

Native PSM (SQL Procedure Language) eliminating C compiler requirement REOPT bind options enabling runtime optimization of host variables Ability to use Direct IO on available OS Improved MDC Insert performance New locking/concurrency option (EVALUATE UNCOMMITTED) enabling valuation of

predicates without locking Sampling in RUNSTATS (finally), not just in automated RUNSTATS Log Manager improvements eliminating the user exit for archive logs (user exit setup

replaced by a simple database config parm update) New RECOVER command combining RESTORE and ROLLFORWARD (for example,

to recover a database to the end of logs using the best available backup: RECOVER DB ; this brings mainframe recovery features to LUW; point-in-time recovery is also simplified as the recovery history file has been updated to capture info on multiple log chains).  

Thursday, May 13, 2:00PM—IBM Panel Discussion: DB2 UDB for Linux, Unix, Windows; George Baklarz, Berni Schiefer, Matt Huras, Bill O’Connell, Leon Katsnelson, IBM Toronto Lab; Pat Selinger, Mike Swift (moderator), IBM Silicon Valley Lab:This has been a great session at every IDUG I’ve attended. The panel members take questions from the floor and prepared questions that have been submitted throughout the week. The panel this year was an excellent mix of the best of Toronto, Silicon Valley’s Pat Selinger and Mike Swift as moderator. I consider this to be one of the highlights of the conference and always ask people why they leave early and miss what is a combination “wrap up” and futures discussion. Anything is fair game and you often hear things that would never come up during the regular sessions.

This was the first year, however, that I attended the distributed DB2 panel. Previous years found me at the mainframe session when I was still a semi-practicing DBA. However, I was not disappointed. The LUW session was just as vibrant as I had found the event in other years.

18

Page 19: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

Most of the questions (and answers) at this session are listed:

1. Will there be a MacOS port for DB2? Client is there today; no plans for server. 2. What about NUMA Architecture exploitation? DB2 is becoming more NUMA-aware

(see slides in Matt Huras overview of Stinger). In Data Warehouse area, already using NUMA; starting to use in OLTP arena.

3. What’s the status of Linux on the Mainframe?  64-bit DB2 will be on zLinux this year. zLinux is more client/server to DB2 on zOS versus Federated. DB2 Connect is 64-bit on zLinux today.

4. What is the DB2 direction on packaging Industry Solutions? Compliance with standards is a major direction. Rational acquisition adds more to the total business solution picture in terms of packaging. The whole business process workflow is being enhanced. Sales force is now more solution-focused.

5. When should RUNSTATS sampling be used and do you have any rules-of-thumb for percentages? It depends on size of tables, frequency of updates, etc. For small tables, do the whole table; for large, compare full stats timing to various sample percents and measure query results along the way to find an optimal setting. Remember that RUNSTATS can be throttled in Stinger if sampling is not desirable.

6. What is the future of DB2 on the disconnected database side? DB2 Everyplace is the solution; it’s missing a lot due to size of footprint, but all SQL is upwardly compatible. Synchronize available via Websphere. Now shipping Everyplace with DB2 servers (unlimited devices)—cheap disconnected solution.

7. What is the IBM response to “free” (open source) databases? There is a Migration Tool Kit for MySQL to DB2.

8. Are there plans to include process design into the Design Advisor? Websphere and Rational are part of Information Integration directions toward unified solutions.

9. Is there an IBM shift from Java to .NET? No, there were more sessions on .NET at this conference due to less maturity and its current popularity (there have been many Java sessions at previous IDUGs).

10. Is there an easy way to get multiple databases to work together without Federation? Federation is the IBM solution. You could use MQTs to hide Federation in some cases.

11. What is the difference between DIO and CIO on AIX? They are two different flavors of direct IO to bypass filesystem cache. DIO (Direct IO) is older; CIO (Concurrent IO) is new and a superset of DIO on AIX 5.2, JFS2. These are available in Version 8 Fixpack 4. Stinger provides for SMS and DMS.

12. What is the easiest way to influence the Optimizer? We’d like you to trust the Optimizer. IBM has a very sophisticated Optimizer. “Hints” is a known requirement (mostly from Oracle users). Stinger should help with its Correlated Statistics Groups.  [At this point, long-time DB2 devotee and Gold Consultant Martin Hubel voiced his opinion that IBM should not spend time on manual optimizer overrides (Hints or otherwise); he asked for audience opinion and we universally agreed.]

13. Is there any place for junior DB2 people to get up-to-speed on best practices? More books are coming (Prentice Hall, Redbooks, white papers) but no solution targeted directly at junior folks.

19

Page 20: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

14. Are customers running warehouse applications on the same server as OLTP? On big pSeries boxes, we see folks carving them up. The trend seems to be toward ODS and mixed DSS/OLTP.

15. Is there a trend in package vendor movement to DB2?  Yes, Microsoft and Oracle continue to drive customers to DB2 by competing with package vendors. This is also happening in the BI space.

16. Will there be a client GUI on non-Windows platforms? Feedback still indicates lots of command line folks are out there. Some discussion ensued as to whether this was a real requirement or an “anything but Windows” environment issue.

17. What percentage of customers are moving to 64-bit? Most on V8 are going 64-bit as they migrate.

18. Has Version 8 closed the gap in the DB2 Family platforms? Yes. 19. What are the Eclipse directions? Eclipse moving from pure Java to IDE approach. 20. Will all the features described for Stinger at the conference be available in the initial

release of Stinger? Yes, there are no plans at this point to deliver them as later Fixpacks. 21. What is the most important feature in Stinger? George: HADR; Berni: Design Advisor;

Matt: Automated Stats Collection; Pat: Q-Replication (though not really “Stinger” the timeframe is the same); Bill: native PSM compiling; Leon: the Bee. In response to user opinions, a side discussion arose regarding features “in the works” though not in Stinger: Table Compression and Range Partitioning led that list.

22. When will we be able to read the Standby HADR database (as in Oracle DataGuard)? Well, Oracle doesn’t give you instant recovery and read-only—you choose one or the other. Also, they are shipping logs then creating SQL at the standby to apply changes (not high performance). [Additional differentiator: Oracle does not provide the automatic resynchronization available when the primary becomes available again.] IBM is working with third-party vendors to provide read capability in the near-term.

23. Can we expect new data modeling/diagramming with Stinger? Rational Rose is not part of Stinger but IBM is shipping some products to get the exposure.

24. Where can we find information on mapping data to devices (especially on Windows)?  Windows folks tend to take a whole pile of drives and make one big logical volume. There is a new Redbook addressing this subject (Phil Gunning is one of the authors). Several of the Prentice Hall books also address this topic.

25. What’s after Stinger? See Pat’s “6 little words” in her Keynote (Integrated, Linux, XML, Content, Real-time, Autonomic); these define the direction.

26. Where do you see DB2 in terms of small-to-medium business penetration? We are competing fiercely with Oracle and Microsoft in that space. Delivering application tools is key and Stinger helps. Manageability is also important as those shops often are low on experienced resources. Autonomic strategy helps there. ISVs are critical and the indications from the Express initiative look very good.

27. When is Stinger in beta and GA? Beta 2 is on the web; Q3 is GA target but it depends on the Beta feedback.  

Thursday, May 13, 3:30PM—Closing Session: Best Speaker AwardsTop User Speaker: Bryan Paulsen, John Deere; “DB2 V8 Point-in-Time Recovery for ERP”

20

Page 21: IDUG North American Conference 2004 â€" Orlando, Florida, May ...

IDUG 2004 – North America Conference Report

Best Overall Speaker: Steve Rees, IBM Toronto Lab; “The Doctor is In! Advanced Performance Diagnostics in DB2”    

IDUG 2005 will be hosted by Denver, Colorado, May 22-26.

21