Evolution of Database Technology - IBM Research | Almaden ...

28
© 2006 IBM Corporation Evolution of Database Technology C. Mohan, PhD IBM Fellow & IBM India Chief Scientist Member, IBM Software Group, Asset Architecture & Information Management Architecture Boards http://www.almaden.ibm.com/u/mohan/ [email protected]

description

 

Transcript of Evolution of Database Technology - IBM Research | Almaden ...

  • 1. Evolution of DatabaseTechnology C. Mohan , PhD IBM Fellow & IBM India Chief Scientist Member, IBM Software Group, Asset Architecture &Information Management Architecture Boards http://www.almaden.ibm.com/u/mohan/ [email_address]

2. Someof Our Database Research Legacy

  • Invention of Relational DBMS & SQL
  • Research prototypes
    • System R & SQL
    • R* Distributed DBMS
    • Starburst Extensible Object-Relational DBMS
    • Garlic Heterogeneous DBMS
  • Product Contributions
    • Data sharing on DB2 390 Sysplex
    • DB2 UDB Query Processor
    • Intelligent Miner
    • Lotus Notes R5 Recovery
    • Discovery Link & DB2 Information Integrator
  • 6 IBM Fellows from team of < 50

3. Why We Have Experience with Customers

  • Over 2 decades of partnershipwith SWG Toronto & SVL
    • Incorporation of Starburst prototype into DB2
    • Component Owners of DB2 for LUWs Query Compiler
    • Versions 2 5 (1992-1997)
    • Dealt with customer APARs, Visits, & Presentations
  • Responsible for many DB2 innovations
    • Query Graph Model (internal query representation, key to extensibility)
    • Query ReWrite and Optimizer technology
    • ARIES transaction methods
    • Triggers and Constraints
    • Star Join and Hash Join
    • Object-relational features
    • Automatic Summary Tables (materialized views)
    • Visual Explain
    • Index Advisor
  • Respected for our vision
    • World-class publications in leading database conferences
    • Cognizant of industry trends

4. Leveraging Technology and People IMS Development DB2 Development IDS / U2 Development CustomerRequirements IBM Products IBM Research 5. SVLDB2 UDB for z/OS & OS/390 IMS Business Intelligence Content Management DB2 Everyplace Red Brick Icing Traditional AD Languages BoeblingenDB2 Text Extenders SAP/R3 Enablement Intelligent Miner for Data Intelligent Miner for Text Somers Hawthorne Advanced Technology Almaden Advanced Technology Austin GBIS PortlandXPS & DB2 Lenexa IDS Boulder & Denver Content Management U2 Datablades Boca Raton & Miami EMMS LA Informix Support RochesterDB2 UDB for AS/400 TorontoDB2 UDB for UNIX,Windows, & OS/2 IBM Information Management Teams Beijing Information Integration DB2 for zOS Content Management DB2 and IMS tools Las Vegas Entity Analytics Over 6000 employees worldwide Menlo Park & Oakland IDS XPS JDBC Visionary Cloudscape Datablades Object Connect & Translator Content Management India DB2 UDB Service Business Intelligence IDS YamatoHigh Speed Inverted Index Search Business Intelligence Content Management HursleyEnterprise Master Data Solutions

  • India Software Lab
    • 1700 employees
    • Broad range of skills all SWG Brands
    • Linux Competency Center
  • DB2 Lab within ISL
    • 100+ developers
    • Lab based services teams DB2, CM, BI
  • Other Resources
    • India Research Lab
      • http://www.research.ibm.com/irl/projects//
    • Solution Porting Center
    • Education Center for IBM Software
    • IBM Academic Initiative

6. A Spectrum of Data Serving Requirements Platform: Mobile Desktop Small Servers Large ServersData Size: MicroCompact LargeExtremely Large Workload: BatchOnline TransactionsReal-time AnalysisData Mining Structure: HierarchicalRelationalMulti-ValueXML OS: SymbianPalmOSWindowsLinuxUnix(s)i5/OSz/OS Scope: EmbeddedIntra-applicationSingle applicationMulti-application Support: None Web/E-mail Business hours 24x7 7. Products to Match the Spectrum of Data Serving Needs DB2 Everyplace OLTP Relational Mobile EmbeddedLinux PalmOS Symbian Cloudscape OLTP Relational Intra-App / Single-App Java IDS OLTP Relational Intra-App / Single-App AIX, etc. Linux Windows DB2 OLTP & Analysis Relational& XML Single / Multi-App z/OS I5/OS AIX, etc. Linux Windows IMS OLTP Hierarchical Single / Multi-App z/OS U2 OLTP Multi-Value Intra-App / Single-App AIX, etc. Linux Windows Superior capabilities across the spectrum of requirements 8. DB2 for z/OS

  • The power and function of an open, industry standard data serverwith zSeries industry leading availability, performance, and security
  • What it takes to be the industrys most extreme data server
  • Continuous application availability measured in years
  • Ability to process over1B SQLtransactions per hour
  • Uninterrupted growth from 1 byte to over a peta-byte
  • Serving 100s of applications for 100,000s of users
  • US Governments highest security classification (zSeries)
  • Support for industry standards:XML, Web services, Java, C, COBOL
  • Support for complex business applications:SAP, PeopleSoft, Siebel

Extreme qualities of serviceXML and Relational data server 9. Technology Evolution with Mainframe Specialty Engines Integrated Facility for Linux (IFL) 2001 IBM System z9 Integrated Information Processor (IBM zIIP) planned for 2006 System z9 Application Assist Processor (zAAP) 2004

  • Building on a strong track record of technology innovation with specialty engines, IBM intends tointroduce the System z9 Integrated Information Processor
  • Supportfornewworkloadsand openstandards
  • Designed to help improve resource optimization for eligible data workloads within the enterprise
  • Centralizeddata sharingacross mainframes
  • Incorporation of JAVAinto existing mainframe solutions

Internal CouplingFacility (ICF) 1997 10. Data Challenges

  • Variety, Velocity, and Volume
  • New composite applications need data from multiple sources
    • Consumers expect holistic, personalized, and value-added content
    • Relational, XML, packagedapplications, content repositories,file systems all contain critical business information
  • Increasing emphasis on current data
    • Real-time analytics
    • Business activity monitoring
  • Petabytes will be the measure of available online data
    • All client interactions are important ( e.g., instant messages, audio records, web traffic,)
    • Internet and intranet content

The world produces 250MB of information every year for every man, woman and child on earth. 10-100GB 100sGB - 1TB 1 - 20 GBs 100sMB 100s KB 1999 1s TB 1s TB 100s TB 100s TB 1s TB 1s TB 10s GB 10s GB 1s GB 1s GB 2004 10X 100X 100X 1,000X 10,000X Common Database Sizes Common Database Sizes Transactions Warehouses Marts Mobile Pervasive 37%CGR Disk Growth 96-07 70,000 TBof TV and Radio content in 2002 alone; 30% growth/year 11. Addressing the Changing Characteristics of Data Actionability Heterogeneity Scale Satellite & Surveillance Images and Video Gene Sequences Transactions Text and Web Increasing need to manage and analyze new data types Protein Folding 12. Key Customer Pain Points

  • Cant Find Information Discovery
  • Cant combine Information Integration
  • Cant extract value from Information Insight
  • Cant consume Information Dissemination

13. Research in Information and Interaction Drive our leadership technologies for search, structured and unstructured information processing and analytics, natural language processing, and conversational and multimodal interaction, across multiple tiers of business activities in SWG products and solutions.Foster the exploitation of components with these leading research technologies in IGS services offerings. CM Information Retrieval NLP Analytics Video Analysis Conversational andMultimodal Interactions Unstructured Information Management Information Management Database Synthesis Information Integration Metadata Speech Recognition 14. Worlds of Structured & Unstructured Data Come Together Analytical Complexity Collect Store Retrieve Drill Mine ETL Warehouse SQL OLAP Cluster, Classify, .. Crawl ECM Search Navigate Cluster, Classify, .. Solutions II Structured Data Unstructured Data 15. Need for Business Intelligence

  • Loyalty
  • Profitability
  • Buyer Behavior
  • Targeted Offers

Homeland Security

  • Internet Buzz
  • Anti-Money Laundering
  • Border Control
  • Crime Information
  • Globalization
  • Business Controls
  • Mergers and Acquisitions
  • Supply Chain Efficiencies

Accountability and Compliance Customer Knowledge Business Performance

  • Risk Management
  • Fraud and Abuse
  • Public Protection

HIPAA Basel II Patriot Act Sarbanes-Oxley Capitalism and Its Troubles: A Survey of International Finance-May 24, 2002Preparing for terror How scared should you be? Nov 28th 2002From The Economist print edition 16. Industry Solutions Deliver Insight On Demand

  • Law Enforcement
    • Crime Information Warehouse
    • Entity Resolution
    • Anti Money Laundering
  • Banking
  • Basel II and BankingData Warehouse
  • Entity Resolution
  • Health Care
  • Aligned Clinical Environment
  • Retail
  • RFID
  • Retail Data Model
  • Telco
  • Telco Data Warehouse
  • Insurance
  • CustomerInsight
  • IIW
  • Automotive
  • Quality Insight Early Warning
  • Life Sciences
  • Drug Discovery

17. OmniFind Key Technologies Content Crawling

  • Scalable Web crawler
  • Data Source crawlers
  • Content Push

Parsing/ Tokenizing

  • HTML/XML
  • 200+ Doc Filters
  • Advance Linguistic

Search Collections Categorization

  • Taxonomy
  • Rule-based

Annotation

  • Text Analytics
  • Plug-in

Indexing

  • Global Analysis
  • Static Ranking
  • Store
  • Dynamic Ranking
  • Fielded Search
  • Dynamic Summary
  • Parametric Search
  • Spell Checking

Searching Security 18. Content Management Portfolio Strategy

  • Capture, store, and manage all forms of content
  • Complete and scalable, content management functionality
    • Document management
    • Image management
    • Digital asset management
    • Report management
    • Web content management
    • Records management
    • Digital rights management
    • Email/Messaging archiving and management
    • Collaboration tools
  • Enterprise-scale business process management
  • Cross-portfolio, out-of-the-box integration
  • Rich, common client platform

19. IBM Content Management Platform Roadmap 4Q2004 1Q2005 2005 2006 and Beyond WebSphere Portal V5.1 Embeds DB2 Content Manager Runtime Edition (JCR) Records Manager V4.1.1 A Dynamic RM Infrastructure Workplace Web Content Management V2.0 Leveraging DB2 Content Manager and WebSphere Portal Framework DB2 Content Manager V8.3 Enhance Doc Routing Enable BPM Extend Integration Capabilities Seamless RM DB2 Document Manager V8.3 Compliance/RM Extending Native Language Support DB2 CommonStore V8.3 Full-Text Search Seamless RM First Step ECM Unified Client New Portlets J2EE Web Components Extend to DPM Extend Document Management Email/Messaging Archiving and Management Enhancements Physical Records Management Virtual Records Management WCM Leveraging Workplace and DB2 Content Manager Runtime (JCR) Common Content Repository Workplace Unified End-User Experience (Client) Event Framework Integrated / Interoperable DPM/BPM Extended ECM Capabilities as Add-On Features Enterprise JCR IBM CM SDK Enterprise Content Integration JSR170 DB2 Content Manager Runtime in ISV Applications LDDM* Fully Supports JSR170 Autonomic Capabilities Content Preservation Content Intelligence Pervasive Enablement and More * Lotus Domino Document Manager 20. Query Optimization

  • Industry-Leading Optimization
  • Extensible SQL to XQuery!
  • Optimizes for Parallel
    • I/O accesses
    • Within a node (SMP)
    • Between nodes (MPP)
  • Powerful for complex OLAP & BI queries
  • Industry-Strength Engineering
  • Portable
    • Across HW & SW platforms
    • Databases of 1 GB to > 300 TB
  • Continuing "technology pump" of improvements from Research

21. Unstructured Information Management Architecture

  • Common Research infrastructure for advancing Text Analysis and NLP capability
    • Promotes re-use of best-of-breed components
    • Promotes combination hypothesis through ease of integration

UnstructuredInformation Application Libraries Specialized Application Libraries Provide basic functions common to a broad class of application libraries & applications (e.g. Glossary Extraction Taxonomy Generation, Classification, Translation, etc.) Question Answeringe-Commerce Semantic Search Engine Token and Concept Indexing Query Key words, concepts, spans, ranges -> Ranked Hit List National & Intelligence Business Bioinformatics Technical Support Document & Meta Data Store Documents with meta data based on key-value pairs Enables view & collection management (Text) Analysis Engine (TAEs) Combination of analysis engines employing a variety of analytical techniques and strategies Structured Knowledge Access Knowledge Source Adapters - (KSAs) deliver content from many structured knowledge sources according to central ontologies Collection Processing Manager KSA Directory Service Dynamic query & delivery of KSAs TAE Directory Service Dynamic query & delivery ofTAEs UIMA Standard Application Libraries Relevant Application Knowledge StructuredData UIM Solutions 22. Analyticsbridge theUnstructured & Structured worlds Unstructured Information UIMA High-Value Most Current Content Fastest Growing BUT ... Buried in Huge Volumes Lots of Noise Implicit Semantics Inefficient Search Explicit Structure Explicit Semantics Efficient Search Focused Content Text , Chat, Email, Audio, Video Indices DBs KBs

  • Identify Semantic Entities, Induce Structure
    • Chats, Phone Calls, Transfers
    • People, Places, Org, Events
    • Times, Topics, Opinions, Relationships
    • Threats, Plots,etc.

UIMA - The Big PictureStructured Information 23. Evolution of Metadata Hierarchical Data ModelRigid Metadata Single Application Domain Specific Ontologies Flexible Metadata Cross Industry Integration Increased Business Value of Metadata Syntacticannotation of data: what this data represents Semanticannotations of data: what this data means Relational Data Model Rigid Metadata Integration Within Enterprise Extensible Data Model (XML) Flexible Metadata Integration Within Industry 1970 1990 2000 2010 1980 24. Information Management Trends

  • Information Intensive Applications
    • Shift from transaction-centric to information-intensive applications
  • Information Diversity
    • Delivering insight over increasingly diverse sources of information
  • New Business & Delivery Models
    • Information as a Service, Outsourcing, New Licensing Models
  • Democratization of Information
    • Changing User Expectations & the Parent Test
  • Massive Collaboration & Societal Intelligence
    • Collaboration over shared information to creating business insight

25.

  • STEM is a tool to help scientists and public health officials create and test models for emerging infectious diseases.
    • Understand disease dynamics
    • Test outcomes of preventative actions
  • Diverse Data Sources
    • GIS data for every county borders, populations, shared borders, highways, airports
    • Susceptible/Infectious/Recovered (SIR) models
    • Susceptible/Exposed/Infectious/Recovered (SEIR) models
    • Multi-serotype disease models
    • Public health policy events
    • User specified disease vectors

Spatiotemporal Epidemiological Modeler http://www.alphaworks.ibm.com/tech/stem 26. Metadata-driven Design for Integration Web Service Build These Using These New Business Process New Integrated View Legacy and packaged apps Relational databases XML documents New DataFlow WBI II ETL 40% of IT budgets may be spent on integration 30% of peoples time is searching for relevant information 30% of development time is copy management

  • Remember It
  • Remember relationships and dependencies
  • Find It
  • Find and visualize related information
  • Connect It
  • Generate the integration glue

27. Metadata Will Be Used to Facilitate Information and Application Integration

  • Today manual integration, custom hard-wired integration
  • Tomorrow semi-automated integration by using tools and connectors
  • Future automated integration through metadata standards and tools
    • Dictionaries
    • Taxonomies
    • Ontologies

28.