The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me...
Transcript of The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me...
The one-million table partitions challenge in an ATLAS experiment DB application @ CERN
Gancho Dimitrov (CERN), on behalf of the ATLAS collaboration
Bulgarian Oracle User Group conference (BGOUG), 7th -9th June 2019, Borovets resort
About Me•Member of the ATLAS experiment database group since 2006
•Acquired (some) DB related knowledge throughout the last 15+ years
•Main focus on :- data management- database schema design- database performance tuning
•Certified in Oracle RDBMS (9i, 10g, 12c)
•Certified Toastmasters International Competent Communicator
2
3
CERN - European Organization for Nuclear Research founded in 1954. Situated on the French-Swiss border near Geneva
2400 staff members work at CERN as personnel 10 000 more researchers from institutes world-wide
LHC
• 27km ring of superconducting magnets
• 100m beneath the France–Switzerland border
• 2 high-energy particle beams travel at close to the speed of light before they are made to collide
• 600M collisions per second
LHC (Large Hadron Collider) is the World’s Largest Particle Accelerator.
4
ATLAS experiment data taking video…
5
Events (particle collisions) in the ATLAS detector @LHC
Particle collisions are called Events
Datasets
Files
Events
6
25m
Muon Spectrometer
Calorimeter MagnetSystemInner Detector
44m
7000t
Events’ data are stored into files outside the DB (435 PB)
The Event WhiteBoard (EWB) project @ Oracle 18.3
•EWB concept: logically groups particle collision Events (aka “collections”)
•EWB collections: consumed by a system which process them in Event ranges•The smallest event range for processing: a single Event•Collection’s flexible metadata: JSON block in VARCHAR2 (upto 32K)
•EWB collection removal: once processing of a given collection is finished•Lifetime of an EWB collection: from week(s) to month(s)•Daily rate of EWB collection creation: tens to hundreds or thousands
•Rows per EWB collection: thousands to millions
7
Challenge in the DB schema design
Common engineering solution (might not scale in several years)
versus
Over-engineered system (should scale, but likely to be unnecessarily complex)
8
An Event Whiteboard (visual representation)
9
Idea for the EWB data physical organization
•Data of each EWB collection: self-contained in a table partition
•Table indices: partitioned in the same fashion as the table
•EWB ”sponge” process: removes EWB collection(s) data by dropping partition(s)
10
Quiz
What is the maximum allowed number of partitions (or sub-partitions) in a single table in Oracle RDBMS?
1) 100 thousand
2) 500 thousand
3) 1 million
4) More
11
Quiz answerThe allowed maximum number of partitions in a single table in Oracle is
1048575(1024 * 1024) – 1
12
The max number of EWB collections at any given time is limited to
1048575
The ATLAS DBAs experience (and how far can we go?)•The ATLAS EventIndex is a catalogue containing information about the basic
properties of the Events as well as references to its storage location.
13
DATASETS
PARENT TABLE
DATASET_IDPROJECTRUNNUMBERSTREAMNAMEPRODSTEPDATATYPEAMITAG
EVENTS
CHILD TABLE
DATASET_IDEVENTNUMBERLUMIBLOCKNBUNCHIDGUID0GUID1GUID2
Unique Key
Primary Key
LIST PARTITIONED BY DATASET_ID
CREATE TABLE child_table (DATASET_ID NUMBER(10,0),...CONSTRAINT cons_name PRIMARY KEY (..) using index COMPRESS 1LOCAL) pctfree 0 COMPRESS BASIC tablespace &&m_tbsPARTITION BY LIST(DATASET_ID)( PARTITION DATASET_ZERO VALUES(0) );
The ATLAS DBAs experience (cont.)•The EventIndex system is in production since March 2016
•As of May 2019, the EventIndex hosts 177 billion rows (event records unevenly distributed in about 73300 list-type partitions (rate of about 25K new partitions per year)
•Note: no concurrency in data load, compression on (basic deduplication)
14
TABLE PARTITIONS3.3 TB
INDEX PARTITIONS 2.9 TB
EWB simplified tables layout (partial)
15
CO
LLE
CTI
ON
S
COLLECTION ID
COLLECTION NAMECOLLECTION TYPE
COLLECTION STATUS
CREATION TIME
MODIFICATION TIME
…
…… more columns …
COLLECTION METADATA
PRIMARY KEYUNIQUE KEY
JSON block in VARCHAR2(32K)
CO
LLE
CTI
ON
_OB
JEC
TS COLLECTION ID
STORAGE SERVER ID
EVENT RANGE MIN ID
EVENT RANGE MAX ID
EVENT RANGE STATUS
…
…
... more columns …
EVENT RANGE METADATA
PRIMARY KEY
JSON block
How to partition the EWB “collection_objects” child table?
16
Design approach “1” Range partition on COLL_ID
INTERVAL(value)
Idea: introduce automatic RANGE + INTERVAL partitioning based on the COLL_ID’s incremental values.
The INTERVAL value can be changed/adjusted over the time.
17
Range partitioned table with INTERVAL(10)
18
CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY RANGE(COLL_ID) INTERVAL(10)( PARTITION COLLOBJ_ZERO VALUES(0) );
If necessary, the INTERVAL(value) can be changed over the time:
ALTER TABLE COLLECTION_OBJECTS SET INTERVAL(50);Table altered.
Range Interval automatic partitions creation
19
INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E3); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E4); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E5); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E6); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E7);
Created six relevant table partitions:
table_name part_position partition_name high_value
COLLECTION_OBJECTS 1 COLLOBJ_ZERO 0COLLECTION_OBJECTS 2 SYS_P560581 1010COLLECTION_OBJECTS 3 SYS_P560587 10010COLLECTION_OBJECTS 4 SYS_P560591 100010COLLECTION_OBJECTS 5 SYS_P560610 1000010COLLECTION_OBJECTS 6 SYS_P560663 10000010
Range Interval automatic partitions creation (cont.)
20
INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (10485740);ORA-14300: partitioning key maps to a partition outside maximum permitted number of partitionsWhy? The table currently has only 6 partitions(!)
Oracle allows max 1048575 partitions per table.With INTERVAL(10) => the MAX allowed COLL_ID should be 10485750?
INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (10485739);1 row created.
table_name part_position partition_name high_value ...COLLECTION_OBJECTS 6 SYS_P560663 10000010COLLECTION_OBJECTS 7 SYS_P561534 10485740
“Range Interval” partitioned table •There is a cap of 1048575 partitions, even if these partitions are not yet in existence. Oracle “pre-defines” all potential partitions.
•Sooner or later (depending on the COLL_ID value and the INTERVAL value) we will get ORA-14300 error no matter that fraction of the existing partitions will be removed over the time.
•Partition removal using the PARTITION FOR(…) is particular! ALTER TABLE COLLECTION_OBJECTS DROP partition FOR (0);removed partition with high value 1010 ( 0 is its lower boundary)
21
Design approach “2” List partition per COLL_ID single value
Idea: each table partition contains data of a single EWB collection. Removal of any EWB collection data would be straightforward.
22
List-type partition for each data collection
AUTOMATIC option is avoided because partitioned JSON SEARCH INDEX could not be created (Oracle 18c, 19c)CREATE SEARCH INDEX idx_name ON COLLECTION_OBJECTS(OBJ_METADATA) FOR JSON LOCAL;
ORA-29886: feature not supported for domain indexes
23
CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID)( PARTITION COLLOBJ_ZERO VALUES(0) );
Automation in List-type partitions creation •A dedicated List partition per Collection is created by an AFTER INSERT
trigger on parent table which calls an in-house created PLSQL proc.
•Partition pruning is straightforward: SELECT * FROM COLLECTION_OBJECTS partition FOR (5276);
•Partition removal is easy:ALTER TABLE COLLECTION_OBJECTS DROP partition FOR (5276);
•All worked well, but does not seem scalable because of the 1048575 partitions limit per table.
24
Design approach “3” List partition per sequence of COLL_ID values
Idea: Each List-type table partition to host sequence of data collections (e.g. 10, 20, 50 or more collections per partition).
Pros: can accommodate much more collections over time
Cons: all collections of a given partition must be set “not needed” (status = ‘NN’) before a partition removal action
25
List-type partition for sequence of data collections
LIST AUTOMATIC option is not possible in this approach.
26
CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID)( PARTITION COLLOBJ_ZERO VALUES(0) );
Automation in List-type partitions creation (COLL_ID set)•A dedicated List partition per collection set is created by a BEFORE INSERT
trigger on the parent table which calls an in-house created PLSQL proc.
27
PLSQL code and created partitions
28
Interesting findings: •Achieved flexibility as the number of sequential collections per partition can be
changed just by changing a single value in the ”before insert” trigger:
Sequence of 10 list values: created 88485 partitionsSequence of 5 list values: created 32745 partitions
•After creation of 121230 partitions:
“Error "ORA-14309: Total count of list values exceeds maximum allowed"
• What is the maximum number of list values allowed by the DB (Oracle 18.3)?
Count on the existing list partition key values showed :
1048575= (1024*1024)-1
29
ORA-14309 and ORA-14299•In that case
"ORA-14309: Total count of list values exceeds maximum allowed"
and
“ORA-14299: total number of partitions exceeds the maximum limit”
are similar as Oracle imposes the same limit of 1048575.
•Outcome: ORA-14309 is a showstopper for ”Approach 3”
30
Design approach “4” List partition on virtual column based on
COLL_ID
Idea: List partition on virtual column MOD(COLL_ID, nnn).It guarantees maximum “nnn” partitions on the child table (note: ”nnn” must be smaller than 1 million)
Avoids the number of partitions limit (ORA-14299) and the number of list-key values limit (ORA-14309).
31
List-type partition on virtual column
MOD function returns the remainder of COLL_ID divided by 500000.
The table will have max 500K partitions
The choice of the division value is crucial, it cannot be changed over time (ORA-14060). 32
CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),COLL_ID_VIRT_GROUP NUMBER(10,0) GENERATED ALWAYS AS (MOD(COLL_ID,500000)) VIRTUAL,...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID_VIRT_GROUP)( PARTITION COLLOBJ_ZERO VALUES(0) );
”Trigger & Proc” for List-type partitions creation •A dedicated List partition per Collection is created by an AFTER INSERT
trigger on parent table which calls an in-house created PLSQL proc.
33
Note: 1) both, the parent and the child tables have the same virtual column definitionCOLL_ID_VIRT_GROUP NUMBER(10,0) GENERATED ALWAYS AS (MOD(COLL_ID,500000)) VIRTUAL
2) The PLSQL proc handles “ORA-14312: Value x already exists in partition y”
Considerations with List partitions on virtual column•Test: 266650 partitions were created
•For PK or UQ indexes to be equally partitioned as the table, the virtual column must be part of them.
•Partition pruning to kick in, an extra condition in the queries WHERE clause has to be added:
SELECT * FROM ... WHERE COLL_ID_VIRT_GROUP =MOD(:coll_id_value, 500000) AND COLL_ID = :coll_id_value AND STATUS = :status_val ;
34
Design approach “5” List automatic partition on virtual column based
on COLL_ID
Idea: the same as “Approach 4” but relies on the Oracle List AUTOMATIC mechanism instead of “Trigger & Procedure” machinery.Note: the requirement for JSON search index was withdrawn.
35
List-type automatic table partitioning on virtual column
MOD function returns the remainder of COLL_ID divided by 500000.
The table will have max 500K partitions
36
CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),COLL_ID_VIRT_GROUP NUMBER(10,0) GENERATED ALWAYS AS (MOD(COLL_ID,500000)) VIRTUAL,...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID_VIRT_GROUP)AUTOMATIC( PARTITION COLLOBJ_ZERO VALUES(0) );
Data insertion into “List automatic” partitioned table
•In case of many rows insertion into a “List automatic” partitioned table:
Oracle sorts the rows beforehand based on the partitioning key and wait events “PGA memory operation” could be seen.
Such insert operations are slower than insertion into a standard List partitioned table.
37
“List automatic“ partitions on virtual column•Test: 500000 partitions were automatically created using “INSERT INTO collection_objects …” statement.
•It took about a week time.Over the time, a partition creation was becoming slower and slower(certainly DDL actions are serialized).
38
Upto 30K partitions: rate of 50-60 partitions/secondAfter 70K partitions: rate of 3-4 partitions/secondAfter 80K partitions: rate of 3 partitions/secondAfter 160K partitions: rate of 1-2 partitions/secondAfter 180K partitions: rate of 1 partition/secondAfter 200K partitions: rate of 1 partition/secondWithin 200K-400K partitions: rate of 1 partition per 1-2 secondsWithin 400K-500K partitions: rate of 1 partition per 2 seconds
“List automatic“ partitions on virtual column (cont.)•Without concurrent read or write operations on the table:
Wait events "PGA memory operation” and ”Local write wait” (0 ms) on partition creation.
• With concurrent read (even the Oracle auto stats gathering job) or write operations on the table:
Wait events “Library cache lock” and “Library cache load lock” (20ms-500ms depending on the number of existing table partitions)on partition creation.
39
Considerations with List automatic part. on virtual column•For PK or UQ indices to be equally partitioned as the table, the virtual column must be part of them.
•Partition pruning to kick in, an extra condition in the queries WHERE clause has to be added:
SELECT * FROM ... WHERE COLL_ID_VIRT_GROUP =MOD(:coll_id_value, 500000) AND COLL_ID = :coll_id_value AND STATUS = :status_val ;
40
EWB “sponge” on “List automatic” virtual column part.•How safely remove partitions in DML concurrent environment?
ALTER TABLE … MODIFY PARTITION FOR(499077) READ ONLY;Table altered.
ALTER TABLE … DROP PARTITION FOR(499077);ORA-14466: Data in a read-only partition or subpartition cannot be modified.
41
EWB “sponge” process in action 1. Check all collections of a given virtual group have certain status
SELECT COUNT(coll_id) - SUM(DECODE(status, ‘NOT NEEDED', 1, 0)) FROM collections WHERE coll_id_virt_group = 499077;
2. If the query result = 0, then:LOCK TABLE COLLECTION_OBJECTS PARTITION FOR(499077) IN SHARE MODE;
3. Repeat (1) to ensure that concurrent of step 2 transaction did not happen meanwhile.
4. If the query result = 0, then: ALTER TABLE COLLECTION_OBJECTS DROP PARTITION FOR(499077);
42
Which is the best approach out of the explored five?
Approach 1: “Range Interval”
versus
Approach 5: “List Automatic on virtual column”
43
More studies are necessary
Take away message
44
Work with Oracle table partitions is mixture offunand
challenges
---Obviously
NO PAIN, NO GAIN !