Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2....

13
2/8/11 1 Introduc.on to Database Systems CISC437/637, Lecture #1 Ben Cartere@e 1 Copyright © Ben Cartere@e Database Systems The overview in 5 Ws (and one H): What is a database? What is a database management system (DBMS)? Why use databases? Why study them? Who works with databases? How does a DBMS work? Where and when did databases originate? Copyright © Ben Cartere@e 2

Transcript of Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2....

Page 1: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

1  

Introduc.on  to    Database  Systems  

CISC437/637,  Lecture  #1  Ben  Cartere@e  

1  Copyright  ©  Ben  Cartere@e  

Database  Systems  

•  The  overview  in  5  Ws  (and  one  H):  – What  is  a  database?    What  is  a  database  management  system  (DBMS)?  

– Why  use  databases?    Why  study  them?  

– Who  works  with  databases?  – How  does  a  DBMS  work?  – Where  and  when  did  databases  originate?  

Copyright  ©  Ben  Cartere@e   2  

Page 2: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

2  

What  is  a  Database?  

•  A  database  is  a  collec.on  of  data  – Usually  large  quan..es  of  interrelated  data  •  E.g.  student  records,  faculty  records,  courses,  classrooms,  payrolls,  …  

•  A  database  management  system  (DBMS)  is  a  soZware  system  designed  to  store  and  manage  data  

Copyright  ©  Ben  Cartere@e   3  

Why  Use  a  DBMS?  

“So  a  bunch  of  text  files  on  disk  can  be  a  database.    I’ll  just  process  them  with  Python.    Why  do  I  need  to  learn  about  design  and  DBMSs?”  

•  Data  too  large  to  fit  in  memory;  files  too  big  for  random  access  on  disk  

•  Arbitrarily  complex  queries  that  must  be  answered  quickly  •  Many  users  accessing  data  concurrently  

•  Some  users  need  different  access  permissions  

Copyright  ©  Ben  Cartere@e   4  

Page 3: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

3  

Why  Use  a  DBMS?  

•  Data  independence  •  Efficient  access  

•  Integrity  and  security  •  Access  administra.on  

•  Concurrent  access  •  Applica.on  development  .me  

Copyright  ©  Ben  Cartere@e   5  

Why  Not  Use  a  DBMS?  

•  DBMSs  are  large,  complex  programs  designed  for  very  general  data  needs  and  workloads  –  They  are  not  always  suitable  for  specialized  tasks  

•  Applica.on  may  need  to  manipulate  data  in  ways  not  supported  by  DBMS  

•  Security,  concurrent  access,  crash  recovery  may  not  be  cri.cal  

•  Example:    web  search  

Copyright  ©  Ben  Cartere@e   6  

Page 4: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

4  

Why  Study  Databases?  

•  Mul.billion  dollar  industry,  second  only  to  opera.ng  systems  

•  Databases  form  backbone  of  many  informa.on-­‐centric  applica.ons  – Using  computa.on  to  create  and  understand  informa.on  

•  Implemen.ng  and  understanding  DBMS  incorporates  knowledge  from  every  area  of  CS  –  Systems,  theory,  ar.ficial  intelligence  

Copyright  ©  Ben  Cartere@e   7  

Applica.ons  of  Databases  

•  Electronic  commerce  and  banking  – Amazon,  eBay,  PayPal  

–  Integra.ng  vast  catalogs  and  accounts,  high  security  

•  Social  networking  – Facebook,  Twi@er  – Analyzing  flow  of  informa.on  through  large,  .ghtly-­‐connected  networks  

Copyright  ©  Ben  Cartere@e   8  

Page 5: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

5  

Applica.ons  of  Databases  

•  Sensor  networks  – GPS,  RFID,  …  – OZen  supports  mission-­‐cri.cal  applica.ons  

–  Response  to  failures  and  trust  are  important  

•  Bioinforma.cs,  health  informa.cs  – Gene  Ontology,  PubMed,  …  

–  Requires  data  integra.on,  pa@ern  matching,  approximate  matching,  ranking,  automa.c  inference  

Copyright  ©  Ben  Cartere@e   9  

Who  Works  With  Databases?  

•  DBMS  programmers  actually  implement  the  DBMS  soZware  

•  Database  administrators  design  storage  requirements,  handle  security,  ensure  graceful  recovery,  tune  database  performance  

•  Applica;ons  programmers  write  soZware  that  interacts  with  a  database  

•  End  users  use  the  soZware  wri@en  by  applica.ons  programmers  

Copyright  ©  Ben  Cartere@e   10  

Page 6: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

6  

How  Does  a  DBMS  Work?  

•  This  is  the  focus  of  the  course  •  Today:    a  brief  overview  of  the  topics  that  will  be  covered  

1.  Data  models;  database  design  2.  Database  queries  3.  Transac.on  management  4.  DBMS  structure;  scalability  and  efficiency  

Copyright  ©  Ben  Cartere@e   11  

Data  Models  

•  A  data  model  is  a  collec.on  of  concepts  for  describing  data  

•  A  schema  is  a  descrip.on  of  a  par.cular  collec.on  of  data  using  a  given  model  

•  The  rela;onal  data  model  is  most  common  –  Rela;ons  (tables  of  records)  are  the  main  concept  –  Every  rela.on  has  a  schema  that  describes  the  record  fields/table  columns  

Copyright  ©  Ben  Cartere@e   12  

Page 7: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

7  

Levels  of  Abstrac.on  •  Views  or  external  schema  

describe  how  users  see  the  data  

•  Conceptual  schema  define  the  logical  structure  of  rela.ons  

•  Physical  schema  describe  the  specific  files  used  to  store  a  rela.on  on  disk  

Copyright  ©  Ben  Cartere@e   13  

Physical  Schema  

Conceptual  Schema  

View  1   View  2   View  3  

Database  Design  

•  Designing  a  database:  – A  user/client  has  data  and  requirements  for  how  they  need  to  access  and  modify  it  

•  Design  steps:  –  requirements    views  –  views    conceptual  schema  –  conceptual  schema    physical  schema  –  Loop  un.l  it’s  right:    integrity  maintained,  consistent,  fast,  easy  to  use  

Copyright  ©  Ben  Cartere@e   14  

Page 8: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

8  

Data  Independence  

•  Using  an  external  schema  does  not  require  knowledge  of  conceptual  schema  –  Logical  data  independence  

•  Using  a  conceptual  schema  does  not  require  knowledge  of  physical  schema  –  Physical  data  independence  

•  Applica.ons  are  insulated  from  how  data  is  structured  and  stored  

•  End  users  are  insulated  from  how  data  is  organized  and  constrained  

Copyright  ©  Ben  Cartere@e   15  

Database  Queries  

•  Queries  are  ques.ons  asked  of  the  data  

•  A  query  language  specifies  how  queries  are  posed  in  a  specific  data  model  –  The  language  consists  of  keywords  and  operators  for  manipula.ng  rela.ons  –  the  data  manipula;on  language  (DML)  

•  Formula.ng  a  query  does  not  require  knowledge  of  physical  schema  

•  Query  languages  allow  fast  applica.on  development  –  Embed  DML  in  high-­‐level  language  like  Java,  C,  Python  

Copyright  ©  Ben  Cartere@e   16  

Page 9: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

9  

Concurrency  Control  

•  Many  databases  are  used  by  mul.ple  users  concurrently  – Each  user  manipula.ng  rela.ons  in  different  ways  – Simultaneous  uses  can  result  in  inconsistencies  •  E.g.  one  is  looking  up  vacancies  while  another  is  making  a  reserva.on  

•  DBMSs  ensure  that  problems  don’t  happen  

Copyright  ©  Ben  Cartere@e   17  

Transac.on  Management  

•  A  transac;on  is  an  atomic  sequence  of  database  ac.ons  (reads  and  writes)  

•  The  complete  execu.on  of  each  transac.on  must  leave  the  database  in  a  consistent  state  if  the  database  is  consistent  when  it  begins  –  Consistency  means  no  logical  conflicts  

•  User/applica.on  formulates  integrity  constraints  for  the  DBMS  to  enforce  

Copyright  ©  Ben  Cartere@e   18  

Page 10: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

10  

Scheduling  Transac.ons  

•  DBMS  ensures  that  execu.on  of  {T1,  …,  Tn}  is  equivalent  to  serial  execu.on  T1’,  …,  Tn’  –  Locks:    before  reading  or  wri.ng,  a  transac.on  requests  a  lock  on  an  object,  and  does  nothing  un.l  DBMS  grants  lock.    Locks  are  released  aZer  execu.on.  

– Use  locks  to  force  ordering  of  unordered  transac.ons.  – Deadlock:    Ti  has  lock  on  object  A  and  needs  lock  on  object  B.    Tj  has  lock  on  object  B  and  needs  lock  on  object  A.    

Copyright  ©  Ben  Cartere@e   19  

Atomicity  

•  “All  or  nothing”:    an  atomic  transac.on  is  one  that  either  completely  finishes  or  does  not  happen  at  all  

•  DBMS  needs  to  maintain  atomicity  even  when  it  crashes  in  the  middle  of  transac.ons  

•  Use  a  log  to  keep  track  of  ac.ons  DBMS  takes  to  execute  transac.on  – Write-­‐ahead  log  (WAL)  enables  this  

•  Transac.on  isn’t  done  un.l  all  of  its  ac.ons  are  done  

Copyright  ©  Ben  Cartere@e   20  

Page 11: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

11  

Write-­‐Ahead  Log  

•  The  log  consists  of  the  following:  –  For  write  ac.ons,  the  old  data  and  the  new  data  – A  flag  indica.ng  whether  the  transac.on  was  commi@ed  or  aborted  

•  Transac.ons  can  be  undone  when  commit  not  present  

•  Deadlocks  can  be  resolved  by  abor.ng  one  transac.on  and  allowing  the  other  to  con.nue  

Copyright  ©  Ben  Cartere@e   21  

DBMS  Structure  

•  Layered  architecture,  each  layer  only  aware  of  layer  below  it  

Copyright  ©  Ben  Cartere@e   22  

Query  op.miza.on  &  execu.on  

Rela.onal  operators  

Files  and  access  methods  

Buffer  management  

Disk  space  management  

DB  

Recovery  manager  

Transac.on  manager  

Lock  manager  

Concurrency  control  

Page 12: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

12  

When  and  Where  

•  Charles  Bachman  designed  the  Integrated  Data  Store  at  General  Electric  in  the  1960s  

•  The  network  data  model,  a  tree-­‐based  representa.on  designed  for  explora.on  rather  than  querying  

•  First  Turing  Award  winner  in  1973  

Copyright  ©  Ben  Cartere@e   23  

When  and  Where  

•  Edgar  Codd  proposed  rela.onal  data  model  in  1970  at  IBM  

•  Quickly  became  the  basis  of  commercial  systems;  strong  theore.cal  founda.on  developed  

•  Turing  Award  1981  

Copyright  ©  Ben  Cartere@e   24  

Page 13: Introduc.on%to%% Database%Systems%ir.cis.udel.edu/~carteret/CISC637/slides/01intro.pdf · 2011. 2. 8. · 2/8/11 10 Scheduling%Transac.ons% • DBMS%ensures%thatexecu.on%of%{T 1,…,

2/8/11  

13  

When  and  Where  

•  Jim  Gray  made  fundamental  contribu.ons  to  transac.on  management  in  the  80s  and  90s  

•  Allowed  DBMSs  to  scale  to  huge  applica.ons  with  thousands  or  millions  of  users  

•  Turing  Award  1999  

Copyright  ©  Ben  Cartere@e   25  

Summary  

•  DBMS  used  to  maintain  and  query  large  amounts  of  data  

•  They  allow  concurrent  access,  recovery  from  failure,  fast  applica.on  development,  security  

•  Levels  of  abstrac.on  mean  that  one  can  work  on  one  subproblem  without  knowing  about  others  

•  Huge  industry  and  huge  research  area  in  CS  

Copyright  ©  Ben  Cartere@e   26