CSC321/545: Summary of Database Techniques
description
Transcript of CSC321/545: Summary of Database Techniques
CSC321/545: Summary of Database Techniques Dr. Zhen JiangComputer Science DepartmentWest Chester UniversityWest Chester, PA 19383
OutlineOverview
◦Non-relational DB system◦NonSQL DB system
InjectionInference
◦Role access control (UML)◦Perturbation
Design
DBMS
Database System Overview
dataDatabase
Query reque
st
IntegrationAdministration Security &
encryptionPrivacy & inferenceTransaction &
injectionSketching & hashing
Application Programming Interface (API)
integration
Traditional DatabaseThe relation of key vs. non-keyThe relation between key and foreign
key◦Intra-table relation◦Inter-table relation
E-R diagram◦http://www.cs.wcupa.edu/~zjiang/ER.pdf ◦Any regularity?
Arbitrary & Abrupt◦Ambiguity
Sample of such ambiguity in normalization process caused by the lack of background
Non-Relational DatabaseData does not relate in the true
sense◦e.g., Mongo, which handles
document stores or other content and/or metadata stores
NonSQL DatabaseA more clear structure
e.g., Kobo, Playtika (mobile service) Distributed database system
No need and not possible for a “join” operator Fast third-party data aggregation Fast caching for application objects Globally distributed data repository E-commerce and internet burstness Game (data intensive applications) Ad targeting (social networks)
Query reque
stOk?
APIDBMS
InjectionDirect DB injection
◦http://www.youtube.com/watch?v=v6bphRHH4sM
Indirect DB injection◦http://www.irongeek.com/i.php?page
=videos/webgoat-sql-injection
You need a tool for the
trace of transactions
interrupt each transaction as you debug and trace
the record of each transaction
Authorization◦ Restrict access to data and restrict the actions
that people may take (when they access data).Encryption
◦ Scramble data so that the data cannot be read.
Authentication◦ Password check◦ Key protection, not to protect everything!
https://www.youtube.com/watch?v=3QnD2c4XovkRole based access control
Inference (aggregation)Basically, inference occurs when
users are able to piece together (aggregate) information to determine a fact that should be protected.
Role cheating
Flight ID Cargo Hold Contents Classification
1254 A Boots Unclassified1254 B Atomic bomb Top secret1254 C Butter UnclassifiedGeneral Jones (who has a top
security clearance) requests information and would see all three.
Civilian Smith (who has no security clearance) requests the data and would see the following data:Flight ID Cargo Hold Contents Classificatio
n1254 A Boots Unclassified1254 C Butter Unclassified
When Smith sees that nothing is scheduled for hold B on flight 1254, he might attempt to insert the record, and his insertion will fail due to the unique constraint on cargo space availability.
He has all the data he needs to infer that there is a secret shipment on flight.
He could then cross-reference the flight information table to find out the source and destination of the secret shipment and various other information.
Poly-instantiation: allows different records (hold B) to exist in the same table.
Overbooking!
Other caurses such as:◦Count of highly preferred customers◦Average salary
Problem is difficult◦Information?
Content: what is critical?◦Path?
Hold A-C, Hold B? Total space? Probing!
Existing solutions◦Limit access
Role access control Too many restriction could seriously
hinder the functionality
◦Perturbation Alter the data so that individual
details are accurate but overall generalization are inaccurate.
Include dummy data in the results returned by the query unauthorized.
Protect sensitive data, but also achieve preservation of the properties of the dataset. Sketching with a probability of p. With probability p to use the original data With probability (1-p) to use a replacement
PreservationGiven each query f in the original table T with n rows, build a re-constructible query f’ in the revised table T’ (with n rows), so that the result difference can be controlled in a limited range with a probability of p.
In other words, the expected number of rows that get perturbation is n(1-p). For a domain ∆C, n(1-p)k rows will be expected to lie within the available value range (k ∆C), k[1, 0]. Among total nr rows observed from T’ in the value range (k ∆C), subtracting the n(1-p)k rows, we have the estimation for the number of unperturbed rows. Scaled up by 1/p, we get the total number of original rows (n0), as only a p fraction of rows were retained.
Security and Privacyf’ = n0/n[n-n0, n0]A = [n-nr, nr]a=Pr(row T) vs. b=Pr(row in perturbed
table T’)Privacy breach, security threshold
> a / bb b’ (sketch does not help to distinguish
the cases)Server Storage (with a) vs. Client retaining
(with b)
OO Design for DB SystemsInjection, inference
◦RBAC (role based access control)◦Use case
http://www.cs.wcupa.edu/~zjiang/intro_uc.ppt ◦Class design is needed for better maintaining the
data ownership http://www.cs.wcupa.edu/~zjiang/csc545_oo_design.htm
Non-relational DB◦Activity pattern – prediction of future relation,
e.g., credit card securityNonSQL DB
◦Relations in structure for the use.