DICOM Objects: Unstructured Data in Oracle 11g

Post on 13-Jul-2015

742 views 0 download

Transcript of DICOM Objects: Unstructured Data in Oracle 11g

DICOM Objects: Unstructured Data in Oracle 11g

Naomi RafaelBioGrid Australia

16 August 2010

The most comprehensive Oracle applications & technology content under one roof

Agenda• Purpose and Description of BioGrid Australia• The MRI Images Collection for Melbourne

Health• Oracle 11g Advantages

• Design of the Images Database• Examples of Utilities and Techniques• Current Challenges

• References

AgendaPurpose and Description of BioGrid

Australia• The MRI Images Collection for Melbourne

Health• Oracle 11g Advantages• Design of the Images Database• Examples of Utilities and Techniques• Current Challenges• References

BioGrid Australia Vision• Facilitate multi-disciplinary medical research• Leverage research collaboration• Link heterogeneous data from multiple institutions• Confer value, retain and re-use health data• Enforce system security

• Respect patient privacy

• Select pragmatic technology

BioGrid Architecture

Oracle 11g

VPN

Federated Data IntegratorThe Federated Data Integrator(FDI) is the hub of the BioGridarchitecture . Data from heterogeneous data sources are integrated into one virtual repository on this server .

Local ResearchRepository (LRR)

USIDB

FDIPRDInternet

LRRs

Research RepositoriesAt OtherInstitutions

Oracle 11g ServerSize: 2.5 TbImages: 15 million

VPN

DemographicsDe-identified Data

DBIMGS

DemographicsDe-identified Data

USIServer

Data Linka

ge

User

User

DMZ

TerminalServer,Reverse

Proxy

Agenda• Purpose and Description of BioGrid Australia

The MRI Images Collection for Melbourne Health

• Oracle 11g Advantages

• Design of the Images Database• Examples of Utilities and Techniques• Current Challenges

• References

IF YOU WANTED TO COMPAREIF YOU WANTED TO COMPAREthe volume and shape of the the volume and shape of the

brain of individualbrain of individual

EPILEPSY EPILEPSY PATIENTSPATIENTS

where would you start?where would you start?

The Images Local Research Repository (1)• First take 7 million proprietary Magnetic

Resonance Images (MRIs) on over 1000 DAT format tapes

• Convert to Digital Imaging and Communications in Medicine (DICOM) format

• Store and index images on-line• Extract DICOM header information

• Link into BioGrid Australia and issue record linking ID

The Images Local Research Repository (2)

• Retrieve identified and de-identified images on demand

• Be economical and sustainable • Add 8 million more MRI images (stored on

Optical Disk storage technology)

Agenda• Purpose and Description of BioGrid

Australia• The MRI Images Collection for Melbourne

Health

Oracle 11g Advantages• Design of the Images Database• Examples of Utilities and Techniques• Current Challenges• References

Oracle 11g Advantages (1)• Oracle Database 11g stores the images on

line• The images can be retrieved on demand • Oracle Database 11g indexes and partitions

for fast query• Oracle Multimedia 11g has a dedicated

DICOM data type with rich feature set • SQL*Loader can be tuned for fast image load

Oracle 11g Advantages (2)• Security features are available• Compression is available at the LOB level, on

backup, and on DataPump export.• Application Express is available for rapid

application development• And for Melbourne Health: installation

licensed by Victoria Department of Health statewide Oracle license

ORDDICOM object type• Digital Imaging and Communications in Medicine• http://medical.nema.org/• The Digital Imaging and Communications in Medicine

(DICOM) feature was first introduced to Oracle interMedia in Oracle Database 10g Release 2 as a feature of the ORDImage object type

• Metadata tags associated with DICOM data were extracted into an XML document

• Oracle Database 11g Release 1 provides more complete DICOM support in a new ORDDicom object type.

• This object type holds the DICOM binary data and extracted metadata, and contains the methods to manipulate the DICOM binary data.

Oracle 11g: Using DICOM Images - Features

• Built in function to extract the DICOM metadata (tags)• Ability to select and view DICOM attributes• Ability to convert images from DICOM to other image

formats, eg, JPEG, GIF, PNG and TIFF• Built in function to remove identifying tag information

ie, de-identify images• Ability to import and export images on other servers

using mapped drives

Agenda• Purpose and Description of BioGrid Australia• The MRI Images Collection for Melbourne

Health

• Oracle 11g Advantages Design of the Images Database• Examples of Utilities and Techniques• Current Challenges• References

Database Architecture (1)• Windows Server 2003 (64bit)• Oracle Database 11g Release 1 Version 11.1.0.6

– Single Instance Database• 2.8 TB Data Stored on 20 spindles

• Separate physical drive contains flashback recovery area

• Partitioned by range

Agenda• Purpose and Description of BioGrid Australia• The MRI Images Collection for Melbourne

Health• Oracle 11g Advantages

• Design of the Images Database

Examples of Utilities and Techniques• Current Challenges

• References

Example: Using DICOM Object Type inCreate Table

create table medical_image_table (id varchar(50),

TAPE_ID number, dicom orddicom,

USI varchar(50) )LOB (dicom.source.localdata) STORE AS SECUREFILE

(COMPRESS HIGH)PARTITION BY range (TAPE_ID)( PARTITION PART1 VALUES less than (50) TABLESPACE

TBLS_PART1_FROM_TAPE1);

Example: Using setProperties to Extract Metadata into ORDDICOM Object

-- Set Data Model Repository. This procedure must be called at the -- beginning of each database session.

execute ordsys.ord_dicom.setDataModel();declare obj orddicom; res varchar2(1000);begin select dicom into obj from medical_image_table where id =

'E11200S001I001.dcm' for update; obj.setProperties;end;/

Example: Select and View DICOM Attributes

select t.dicom.getAttributebyTag('00200010') as

STUDY_ID, t.dicom.getAttributebyTag('00100010') as

PATIENT_NAME, t.dicom.getAttributebyTag('00100020') as PATIENT_ID,TO_DATE(t.dicom.getAttributebyTag('00100030'),'YYYY

-DD-MM') as PATIENT_DOB,from medical_image_table t where t.dicom.id =

'E11200S001I001.dcm';

Example: Create View for Patient DetailsCreate or replace view patient_details asselect t.id,t.tape_id,t.usi,………,(t.dicom.getAttributebyTag('00080030')) as

STUDY_TIMEfrom medical_image_table

(Note: A prerequisite is to execute ordsys.ord_dicom.setDataModel() to load datamodel repository to be able to fetch attibutes by tag.)

Example: Convert Image from DICOM to JPEG and Make Anonymous

declare dcm ordsys.orddicom;begin ord_dicom.setDatamodel; for rec in (select * from medical_image_table for update) loop rec.dicom.setProperties(); -- create a JPEG thumbnail rec.dicom.processCopy('fileFormat=jpeg fixedScale=75,100',

rec.imageThumb); -- make a new anonymous version of the ORDDicom object

rec.dicom.makeAnonymous(genUID(rec.id), rec.anonDicom); -- write the objects back to the row …….. end loop; commit;end;/

Example: Import and Export Images

CONNECT / AS SYSDBA

--Directory IMAGEDIR for export/import DICOM

create or replace directory imagedir as 'O:\ORACLE_DICOM_IMAGES';

grant read,WRITE on directory IMAGEDIR to Administrator;

-- import() method can be used to import (where ORDDICOM source

-- attributes contain ‘FILE’, ‘IMAGEDIR’, and filename)dcm.import();-- export() method can be used to exportdcmSrc.export('FILE', 'IMAGEDIR', filename);

Example: Use of Compression for DICOM Objects

On Tables and LOBS using SECUREFILE (COMPRESS HIGH):

create table medical_image_table (id varchar(50),

TAPE_ID number, dicom orddicom,

USI varchar(50) )

LOB (dicom.source.localdata) STORE AS SECUREFILE (COMPRESS HIGH)

PARTITION BY range (TAPE_ID)( PARTITION PART1 VALUES less than (50) TABLESPACE

TBLS_PART1_FROM_TAPE1);

Impact of CompressionDicom images are stored as

SECUREFILE (COMPRESS HIGH)

Compression level example from load of first cohort of images:

• Space utilization on file system: 1515.52 Gb• Space utilization in Oracle 11g: 816 Gb

Compression achieved:

(1515-816)/1515= approx. 46%

Example: Compression in Backup Using RMANRMAN> configure device type disk backup type to

compressed backupset;

RMAN> configure channel device type disk maxpiecesize 50g;

RMAN> show compression algorithm;RMAN configuration parameters for database with

db_unique_name RMHIMG are:CONFIGURE COMPRESSION ALGORITHM 'BZIP2';

--ZLIB compression algorithm offers speed but not a good compression ratio. The alternate compression algorithm, BZIP2, is slower but provides a better compression ratio.

RMAN> backup database;..

Maximize SQL*Loader Performance• Use Direct Path Loads(direct=true) - The conventional path

uses standard insert statements whereas the direct path loader loads directly into the Oracle data files and creates blocks in Oracle database block format.

• Disable/Drop Indexes and Constraints• Disable Archiving During Load • Use unrecoverable- This disables the writing of the data to the

redo logs. • The parallel load option is not allowed when loading lob

columns • Do remember to create indices or enable them after direct load.

Otherwise performance will be affected.

SQL*Loader Performance Results

• Using these options we were able to reduce time for loading 50 tapes from 13 hours to approximately 5 hours.

Compression and Parallelisation with Data Pump Export

• expdp Images_admin/WELCOME DIRECTORY=BACKUP_64BIT JOB_NAME=IMAGES_ADMIN_EXP_JOB dumpfile=IMAGES_ADMIN%U.dmp PARALLEL=3 COMPRESSION=all

• With PARALLEL=3 three Dump files IMAGES_ADMIN%u.DMP are created making the export process much faster.

• After export, each partition is further compressed to 21-30 GB (originally 40-50GB after SecureFiles compress high).

Example: Create Table for Best ORDDicom with SecureFiles Performance (1)

create table medical_image_table (id varchar(50),

TAPE_ID number, dicom orddicom,

USI varchar(50) )Pct free 60

lob(dicom.source.localdata) store as SecureFile( nocache filesystem_like_logging),

Example: Create Table for Best ORDDicom with SecureFiles Performance (2)

lob(dicom.extension) store as SecureFile( nocache disable storage in row )

xmltype dicom.metadata store as SecureFile clob( nocache disable storage in row )

Example: Options for Indexing Metadata (1)1. Build indices on the ORDDicom metadata column

• If few attributes, index each: create INDEX dcm_patientfamilyname_idx ON dicom (extractValue(src.metadata, '/DICOM_OBJECT/PERSON_NAME[@tag="00100010"]/NAME/FAMILY','xmlns="http://xmlns.oracle.com/ord/dicom/metadata_1_0"'));

• If many attributes, create full text index: create index dcm_md_idx on dicom (src.metadata) indextype is ctxsys.context parameters('STOPLIST dicom_stoplist storage dcm_text_idx_pref') parallel 4;

Example: Options for Indexing Metadata (2)

2. Build indices on separate extracted metadataa) Create a mapping documentb) Call extractMetadatac) Store and index results

Load Options (1)

• Disable logging on lob column rather than SQL*LOADER recoverable

• Use SQL*Loader with Conventional Path Loads and submit parallel jobs

– Get all the advantages of SQL*Loader– Plus the advantages of parallelism– Parallelism makes up for lack of direct path load

Load Options (2)

If adding load function to a Java application:• Use JDBC thin driver for best performance• Use getBytes() and setBytes() from oracle.sql.BLOB

class to read/write from SecureFile BLOB• Read and write large buffers to the database, for

example, 10MB• Balance application traffic over available network links for

parallel load

IF YOU WANTED TO COMPAREIF YOU WANTED TO COMPAREthe volume and shape of the the volume and shape of the

brain of individualbrain of individual

EPILEPSY EPILEPSY PATIENTSPATIENTS

where would you start?where would you start?

Example: Query Linking Patient Clinical Information with Images

16 CREATE TABLE SASUSER.QUERY_FOR_PARTY_0000 AS SELECT PARTY1.USI,

17 PARTY.USI AS USI1,18 VISITDETAILS.SYNDROMEDIAGNOSIS19 FROM EPIL_RMH.PARTY AS PARTY,20 IMGRMH.PARTY AS PARTY1,21 EPIL_RMH.VISITDETAILS AS VISITDETAILS22 WHERE (PARTY.USI = PARTY1.USI AND PARTY.USI =

VISITDETAILS.USI)23 ORDER BY VISITDETAILS.SYNDROMEDIAGNOSIS;

Example: Results

Example: Query Linking Syndrome Diagnosis with Images15 PROC SQL;

16 CREATE TABLE SASUSER.Query_for_QUERY_FOR_PARTY_0000 AS SELECT DISTINCT QUERY_FOR_PARTY_0000.USI,

17 IMAGE_DETAILS.ID,18 IMAGE_DETAILS.IMAGE_DATE,19 IMAGE_DETAILS.STUDY_ID,20 IMAGE_DETAILS.STUDY_DESC,21 IMAGE_DETAILS.INSTITUTION_NAME22 FROM SASUSER.QUERY_FOR_PARTY_0000 AS

QUERY_FOR_PARTY_000023 INNER JOIN IMGRMH.IMAGE_DETAILS AS IMAGE_DETAILS

ON (QUERY_FOR_PARTY_0000.USI = IMAGE_DETAILS.USI)24 WHERE QUERY_FOR_PARTY_0000.SYNDROMEDIAGNOSIS =

"Symptomatic Generalised";

Agenda• Purpose and Description of BioGrid Australia• The MRI Images Collection for Melbourne

Health• Oracle 11g Advantages

• Design of the Images Database• Examples of Utilities and Techniques

Current Challenges• References

Current Challenges• Find a sponsor after the capitalisation phase• Improve the deployment of the Oracle 11g R1

database to bring it up to best practice• Upgrade to Oracle 11g Release 2• Promote the use of the MRI images for

research

Agenda• Purpose and Description of BioGrid Australia• The MRI Images Collection for Melbourne

Health• Oracle 11g Advantages • Design of the Images Database• Examples of Utilities and Techniques

• Current Challenges

References

References (1)• Jain, Pranabh and Melliyal Annamalai, Oracle Open

World 2008, “Images and Oracle Database 11g” presentation.

• http://www.oracle.com/technology/products/database/application_express/howtos/howtos.html• http://www.oracle.com/technology/obe/11gr1_db/index.htm• http://download.oracle.com/docs/cd/B28359_01/appdev.111/b28416/ch_dev_apps.htm#CIHEIGBC

References (2)

• http://www.remote-dba.net/teas_rem_util18.htm

• Oracle Documentation

Thank you!

BIOGRID AUSTRALIA

Naomi Rafaelnaomi.rafael@biogrid.org.au

Tell us what you think…

• http://feedback.insync10.com.au