Database Career Opportunities -...
Transcript of Database Career Opportunities -...
Database Career OpportunitiesLeading Database Industry Research• MicroSoft
• Oracle
• IBM
• Teradata
• EMC
• SAP
• Vertica Columnar
Database Career OpportunitiesLeading Big Data Industry Research• Google
• Yahoo
• Amazon
• Ebay
• And So Many Others
What to Study: Advanced Database Topics• Operational RDBMS Based:
• Modeling and Design
• Normalization, Functional Dependency Theory
• Database Performance Tuning
• Index, Index, Index !!
• Query Optimization
• Database Security
• Data Analytics:• Parallel Data Warehouse and OLAP
• Star Scheme OLAP Query Optimization
• Columnar Database
Advanced Database Research Topics
• How to Build a Database Server
== (How to Build an Expert System of Artificial Intelligence)• How To Build a Query Processor
• Optimization Techniques of Performance of Data Processing
• Index Strategy Development
• Parallel Data Processing Techniques
• In Memory Database Processing and Optimization
• Columnar Database
• Concurrency Control Techniques
• Database Security
Data Warehousing and OLAP
• Introduction• Decision Support Technology
• On Line Analytical Processing
• Star Schema
• Relational Aggregation Operators• Data Cube
• Roll Up, Drill Down
• Papers to Cover• An Overview of Data Warehousing and OLAP Technology by Surajit Chaudhuri (Microsoft)
and Umeshwar Dayal (HP Labs) , in the proceedings of IEEE 1995
• Data Cube: A Relational Aggregation Operator Generalizing Group By, Cross Tab, and SubTotals by Jin Gray (Microsoft), et al, in the proceedings of IEEE 1996
S.Chung-CIS611_Lecture_Notes
What to Study: Advanced Database TopicsMassively Parallel Big Data Processing Systems
• Map Reduce and Hadoop
• NoSQL Database Processing Systems
- Mongo DB
- HBase
- Hive
- Pig Latin
Semistructured Data vs Structured Data• Structured Data
• Relational Database, Data Warehouse
• SQL
• Semistructured Data• XML, HTML, JSON
• XQuery, XPath
• Unstructured DataText, Web Data (Mixed of Text, Image, Audio, Video)
• Problems and Solutions of Processing Semistructured/Unstructured Data
• Introduction of Mark Up Languages:
XML, HTML
S.Chung-CIS611_Lecture_Notes
Semi Structured Data and Big Data
• Introduction of XML
• XML Schema, Semantics, Protocol
• XQuery, XPath
• XML Query Processing• XML 1.0(DOM and SAX2 APIs)
• XQuery 1.0 and XPath 2.0 Semantics
• XSLT
• JSON
S.Chung-CIS611_Lecture_Notes
Information Retrieval: How Google Search Engine Works
• Web data Processing
• Building Index
• Relevancy Metric for Search Engine
• How Google Search Engine Works
S.Chung-CIS611_Lecture_Notes
Map Reduce and Apache Hadoop
• Introduction of Google Map Reduce
• Introduction of Apache Hadoop
• HDFS Architecture
• Parallel programming and the MapReduce programming model
• Algorithms of Map Reduce/Apache Hadoop
• Papers to Cover• MapReduce: Simplified Data Processing on Large Clusters by Jeffrey Dean (Google) and Sanjay Ghemawat
(Google) in the proceedings of OSDI 2004
//labs.google.com/papers/mapreduce-osdi04.pdf
• Lammal, Ralf. Google's MapReduce Programming Model Revisited.http://www.cs.vu.nl/~ralf/MapReduce/paper.pdf
• Open Source MapReduce http://lucene.apache.org/hadoop/
• Apache Hadoop in White Papers by Apache, Yahoo
S.Chung-CIS611_Lecture_Notes
Data Warehouseon Parallel Processing
• Architecture of Parallel Processing
• Flow of Data Processing in Data Warehouse on Parallel Processing Architecture
- Retrieval
- Aggregation
• RDBMS For Integration with HDFS
• Example: • Teradata Parallel Architecture
• Oracle Parallel Architecture
S.Chung-CIS611_Lecture_Notes
Data Warehouse and OLAP for Business Intelligence• What is Business Intelligence ?
Big Trend for Every Business for Current and Next Generation
• Data Analytics on Web Transaction Data of Business
• MicroSoft Data Analytic Service System, Multi-Dimensional OLAP
• R, MapR, Phython, So Many Other Systems and Tools
S.Chung-CIS611_Lecture_Notes
Data Analytics / Data Mining
• Data Cleaning, Preprocessing, Transformation
• Data Mining Algorithms
• Machine Learning Algorithms
• Optimization Techniques of Data Analytics Algorithm
S.Chung-CIS611_Lecture_Notes
Big Data: Web Data Processing
• Problems of Big Data Processing
• Solutions
• Processing Web Transaction Data(Unstructured Data) on Data Warehouse (Structured Data Processing Server)
S.Chung-CIS611_Lecture_Notes
Real World Examples
• Face Book
• Papers to Cover• Data Warehousing and Analytics Infrastructure at Facebook by Ashish Thusoo
(Facebook), et al, in the proceedings of Sigmod 2013
• Ebay• Data Warehousing and Analytics at Ebay
S.Chung-CIS611_Lecture_Notes
How To Get an Access to Study These Advanced Database Topics For Development
• Major Database Industry (Microsoft, IBM, Oracle, Teradata) Offer
Database Administration Exams and Supporting Materials
in 3-4 different levels of Proficiency – Every Company Takes it as
Credentials and Every Company Encourages (Pay For it) their Database
Employees to Pass the Exams
• Graduate Level Advanced Database Courses Covers These Topics For the Exams on
Database Server, Query Processing, Data Processing Optimization, Database Tuning
Advanced Database Topics For Research and Advanced Industry Developments
• To Learn How to Build a Database Server :• How To Build a Query Processor• Optimization Techniques of Performance of Data Processing• Index Strategy Development• Parallel Data Processing Techniques • In Memory Database Processing and Optimization• Columnar Database• Concurrency Control Techniques• Database Security
• Graduate Level Advanced Database Courses and Data Analytics
Courses Covers All These Advanced Topics:
• CIS 611 Enterprise Database Systems and Data Warehouse
• CIS 612 Big Data and Parallel Database Processing Systems
• CIS 660 Data Mining