P582

download P582

of 1

Transcript of P582

  • 8/8/2019 P582

    1/1

    Extracting Large Data Sets using DB2 Parallel EditionSriram Padmanabhan

    IBM T.J. Watson Research CentersrpOwatson.ibm.com

    Commercial parallel database systems such as DB2Parallel Edition (DB2 PE) [l, 21 are delivering the abil-ity to execute complex queries on very large databases.However, the serial application interface to these data-base systems can become a bottleneck for a growinglist of applications such as mailing list generation anddata propagation from a warehouse to smaller datamarts . In this abstract, we describe the CURRENT ODEand NODENUMBERunctions provided by DB2 PE andshow how these two functions can be used to retrievedata in parallel in a linearly scalable manner with re-spect to the number of nodes in the system.

    Before proceeding further, we should point out thatDB2 PE uses a hash partitioning strategy to distrib-ute rows of a table to nodes in a nodegroup which isa user-specified subset of system nodes. We apply asystem-specified hashing function on the user-specifiedpartitioning key values to generate a partition num-ber. This number is used as an index into a partitionmap (which can be modified by users) to find the nodenumber where the row will be stored.CURRENT ODE nd NODENUMBERunctions

    The CURRENT ODEvalue function has a valueof the node where the application is connected.CURRENT ODE oes not have any arguments and aquery could use the CURRENT ODEunction whereverother value functions can be used, e.g., in the selectlist or in search conditions. The NODENUMBERunctionreturns the node number of each row to which it isapplied. The argument, of this function is a columnname, i.e., NODENUMBER(col),ut in reality the col-umn name is an alias to obtain the table name andapply the function on the partitioning columns of thePermiss ion to copy without fee all 07 part of this material isgranted provided that the copies are not made or distributed fordirect commercial advantage, the VLDB copyright notice andthe title of the publication and its date appear, and notice isgiven that copying is by permission of the Very Large Data BaseEndowment. To copy otherwise, or to republish, requites a feeand/or special permission from the Endowment.Proceedings of the 22nd VLDB ConferenceMumbai(Bombay), India, 1996

    table.Extracting data in Parallel

    If the task is to perform a large extract on a singletable T tiith the following SQL statement:

    SELECTT.a, T.b,... FROMTWHEREarbitrary predicates)then we can write an application using the followingmodified statement.

    SELECTT.a, T.b,... FROMTWHEREarbitrary predicates) ANDNODENUMBER(T.a)=CURRENTODEThe application will return all rows residing on thenode where it is connected. In order to perform thecomplete extract, one must, issue this statement fromall nodes in the nodegroup containing table T andcombine the set of partial results. The user may findthe list of nodes containing partitions of table T byquerying the system catalog tables. DB2 PEs op-timizer recognizes the predicate NODENUMBERCT.) =CURRENT ODE nd creates an optimized plan whichexecutes the query only on one node. For a large ta-ble, the parallel extract is only bounded by the timeto retrive any one partition of the table and therefore,the solution scales linearly with the number of nodesacross which a table is partitioned. This basic schemecan be extended to support complex queries but wecannot present the details due to the space limit.Acknowledgement: Thanks to L. Kollar, A. Jhin-gran, and T. Malkemus for their significant contribu-tions to the design and develeopment of these func-tions.References[l] BARU, C. K., ET AL. DB2 Parallel Ed ition. IBM

    Systems Journal 34, 2 (1995), 292-322.[2] IBM. DB2 Parallel Edition for AIX, Admin istra-tion Guide and Reference, Sept. 1995. Product

    Manual SCO9-1982-00.

    582