Introduction to the extensible NoSQL DBMS · PDF fileIntroduction to the extensible NoSQL...

27
Introduction to the extensible NoSQL DBMS Amos Tore Risch Information Technology Uppsala University 2015-11-18

Transcript of Introduction to the extensible NoSQL DBMS · PDF fileIntroduction to the extensible NoSQL...

Page 1: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Introduction to the extensible NoSQL DBMS Amos

Tore RischInformation Technology

Uppsala University2015-11-18

Page 2: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

What is Amos?• A prototype DBMS developed at UU and LiU• Functional NoSQL query language AmosQL

– Functions and types used for data representation and queries– Functions natural for numerical data, e.g. for datamining– Based on domain calculus, i.e. variables bound to objects, not

rows in tables as in SQL, tuple calculus.• Amos is extensible

– User defined foreign functions can be implemented in e.g. C or Java

– User defined indexes can be implemented in C or Java– External DBMSs can be queried, e.g MongoDB and MySQL– Query optimizer can be extended– Several query languages are supported, e.g. SQL-92, SparQL– Key enabling technology: extensible engine

Page 3: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

What is Amos II?• Amos based on main-memory representation of

database– Fast and easy to extend with main-memory indexes– Embeddable in other system– Storage managers (e.g. MongoDB, MySQL) as external back-

ends • Amos runs on PCs under Windows and Linux• Highly parallel data stream management system (DSMS)

on top of Amos: SCSQ, SVALI• Applications: Science, industry, etc.

– E.g. space physics, particle physics, industrial equipment monitoring

Page 4: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

The functional data model allows defining Amos schema exactly following above EER-diagram

Page 5: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

The functional data model

OBJECTS

FUNCTIONS TYPES

classify

belong torelate

participate in

constrain

defined over

Page 6: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Objects• Regard database as collection of objects• AmosQL Example:

create Person instances :tore;• creates new object, say OID[1145], and binds temporary

environment variable :tore to it. – NOTICE: Environment variables disappear when transaction finished!

• The DBMS manages the objects. • Surrogate type objects have unique object identifiers (OIDs), e.g.

OID[1145]– Explicit creation and deletion, e.g.:

delete :tore;• Objects belong to one or more types where one type is the most

specific type– The type Person is the most specific type of OID[1145]

Page 7: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Objects, non-surrogate types

• No OIDs – no create or delete statement– Created when mentioned– Automatically deleted by garbage collector when no longer used

• Built-in atomic types, literals: String, e.g. ‘This is a string’ Integer, e.g. 1234Real, e.g. 2.34Boolean, e.g. trueTimeval, e.g. |2008-10-13/18:11:43|

• Collection types:Bag, e.g. bag(“Tore”,”Kjell”,”Thanh”)Vector, e.g. {1,2,3}Record, e.g. {“Name”:”Tore”, “Dept”:”IT”}Tuple, e.g. (“Tore”,”IT”)

Page 8: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

TypesExamples of type definitions:create type Person;create type Student under Person;create type Instructor under Person;create type TA under Student, Instructor;create type Dept;create type Course;

Object

Function Type Literal Collection UserObject

Number Bag Vector PersonDept Course

Student Instructor

TA

Bag of Person

Page 9: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Types• Classification of objects

– Objects are grouped by the types to which they belong (are instances of)

• Organized in type/subtype hierarchy– Defines that OIDs of one type is subset of OIDs of other types– Objects of user-defined types are subtypes of type UserObject

• Types very useful for classifying data, i.e. for meta-data• Each surrogate type has an extent which is the set of

objects having that type in its type set.– Literal and surrogate types have no extent

• Types and functions are objects tooof types named Type and Function

Page 10: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Functions

• Examples of function definitions:create function name(Person p) -> Charstring nm

as stored;create function name(Dept) -> Charstring as stored;create function name(Course) -> Charstring as stored;create function dept(Person) -> Dept as stored;create function teacher(Course) -> Instructor as stored;create function teaches(Instructor i)-> Bag of Course cas /* Inverse of teacher */

select cwhere teacher(c)= i;

create function instructorNamed(Charstring nm)-> Instructor p

as select pwhere name(p) = nm;

Page 11: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Function partsA function has two parts• Signature:

– Name and types or arguments and resultsname(Person p) -> Charstring nteacher(Course c) -> Instructorplus(Number x, Number y) -> Number r (infix ‘x+y’)children(Person m, Person f) -> Bag of Person cparents2(Person p) -> (Person m, Person f)

• Body:– Specifies how to compute outputs from valid inputs

• Usually inverse too– Usually non-procedural specifications

• i.e. no side effects, except for procedural functions.

Page 12: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Signatures represent ER-diagrams

create type Course;create type Student;create type Instructor;create function enrolled(Student)->Bag of Course as stored;create function teacher(Course)->Instructor as stored;

Student

Course

Instructor

enrolled teacher

n

n

n

1

Page 13: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Kinds of functions• Stored functions

– E..g. name(), dept(), teacher(), enrolled()– Store data values– C.f. object attributes, relational tables

• Derived functions– E.g. teaches(), instructorNamed()– Defined as query– C.f. object methods, relational views

• Foreign functions– Implemented in regular programming language (e.g. C or Java)– Computations, statistics, numeric, accessing files and external systems– Basic built-in functions, e.g. plus()– C.f. object methods, relational UDFs (User Defined Functions)

• Procedural functions (functional stored procedures)– May have side effects like updating the database– Should not be used in queries– C.f. object methods, relational stored procedures

Page 14: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Populating and querying the database

• Use the create statement to create objects and set function valuescreate Dept(name) instances :it (‘IT’);create Instructor(name, dept) instances

:tr (’Tore’, :it),:ko (’Kjell’,:it);

create Course(name,teacher) instances(‘Databases’,:tr), (‘Data Mining’,:ko), (‘ETrade’,:ko);• Equivalent formulation using update command set:create Instructor instances :tr, :ko;set name(:tr) = ’Tore’;set dept(:tr) = :it;set name(:ko) = ‘Kjell’;set dept(:ko) = :it;etc.• Examples of functional expressions, the simplest queries:sqrt(2)+3;instructorNamed('Tore');name(dept(instructorNamed('Tore')));

Page 15: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Queries• Select statements:select <expressions>from <variable declarations>where <restrictions on declared variables>;

• Example:select name(p), name(dept(p))from Person pwhere p = instructorNamed(’Tore’);

=>("Tore",”IT”)NOTICE that variables such as p are bound of objects from any domain, domainrelational calculus, which is different from SQL where variable can be bound only torows in tables, tuple relational calculus!select p, name(p) from Person p;=>(#[OID 1440],"Tore")(#[OID 1441],"Kjell")(#[OID 1442],"Thanh")

Page 16: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Function Composition• Function composition simplifies queries that traverse function graph, Daplex

semantics:name(teaches(instructorNamed('Kjell')));=>"ETrade""Data Mining"

• Daplex semantics: name applied on each element in bag returned from teaches, a form of path specification over functions.

• Equivalent more SQLish flattened query:select nfrom Charstring n, Instructor i, Course cwhere n = name(c)and c in teaches(i)and i = instructorNamed(’Kjell’);

• Notice that literal types like Charstring can be used in from clause.

Page 17: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Aggregate Functions• An aggregate function is a function that coerces some value to a single unit,

a bag, before it is called.• ‘Bagged’ arguments are not ‘flattened’ for aggregate functions (no Daplex

semantics for aggregate functions).

count(teaches(instructorNamed(‘Kjell’)));=>2• Signature of aggregate function count and sum:count(Bag of Object) -> Integer sum(Bag of Number) -> Number

• Nested queries, local bags:sum(select count(teaches(p)) from Instructor p);=>3

Page 18: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Quantifiers• Existential and universal quantification over subqueries supported through

two built-in functions:notany(Bag of Object) -> Booleansome(Bag of Object) -> Boolean

• some() tests if there exists some element in the bag• notany() tests if there does not exist any element in the bag (negation)

• Example:select name(p) from instructor p where notany(name(teaches(p))="Data Mining");

⇒ "Tore“• Quantifiers required for relational completeness

Page 19: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Grouping values in bags• Similar to SQL’s group by with variables bound from any domain:select <grouped result>from <variable declarations>where <restrictions on declared variables>group by <group keys>;

• For example:select name(i), count(c) <= Aggregate functions and group keysfrom Course c, Instructor iwhere teaches(i)=cgroup by name(i); <= Group key=> ("K. Orsborn",2)("T. Risch",1)

Page 20: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Ordering values• Similar to SQL’s order by with variables bound from any domain:select <grouped result>…order by <sort key>;

• For example:select name(i), count(c)from Course c, Instructor iwhere teaches(i)=cgroup by name(i)order by count(c); <= sort key

=> ("T. Risch",1)("K. Orsborn",2)

Page 21: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Top-k queries• Similar to SQL’s order by with variables bound from any domain:select <grouped result>…stop after <sort key>;

• For example:select name(i), count(c)from Course c, Instructor iwhere teaches(i)=cgroup by name(i)order by count(c)limit 1; => ("T. Risch",1)

Page 22: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Bags and scans• Can assign query result to bag variable:set :b = iota(1,1000000);count(:b);-> 1000000• Can iterate over very large bags using scans:open :s1 on iota(1,1000000000000);fetch :s1;-> 1fetch :s1;-> 2and so on for a while…

Page 23: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Updating bag-valued functions• Adding elements to bag-valued functions:add hobbies(personNamed(’T.J.M. Risch’)) = ‘Painting’;add hobbies(personNamed(’T.J.M. Risch’)) = ‘Fishing’;add hobbies(personNamed(’T.J.M. Risch’)) = ‘Sailing’;

hobbies(personNamed(’T.J.M. Risch’));‘Painting’‘Fishing’‘Sailing’• Removing elements from bag-valued functions:remove hobbies(personNamed(’T.J.M. Risch’)) = ‘Fishing’;hobbies(personNamed(’T.J.M. Risch’));‘Painting’‘Sailing’

Page 24: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Procedural functions• E.g. to encapsulate database updates (constructors):create function newInstructor(Charstring nm, Charstringdept) -> Instructor ias begin

create Instructor instances i;set name(i)=nm;set dept(i)=dept;return i;

end;• Iterative update:create function removeOverloadedInstructors(Integer th)

->Bag of Charstringas for each Instructor i

where count(select teaches(i)) >= thbegin return name(i);

delete i;end;

• loop and while statements also, if-then-else, scans

Page 25: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Vectors• The datatype Vector stores ordered sequences of objects of any type, e.g. numerical vectors, tuples.• Vector declarations can be parameterized by declaring

Vector of <type>For example:create type Segment;create function start(Segment)->Vector of Real as stored;create function stop(Segment)->Vector of Real as stored;create type Polygon;create function segments(Polygon)->Vector of Segment

as stored;• Vector values have system provided constructors by {..}:create Segment instances :s1, :s2;set start(:s1)={1.1,2.3};set stop(:s1)={2.3,4.6};set start(:s2)={2.3,4.6};set stop(:s2)={2.8,5.3};create Polygon instances :p1;set segments(:p1)={:s1,:s2};

Page 26: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Vector functions and queries• Functions on vectors can be definedcreate function square(Number r) -> Number as r * r;create function length(Segment s) -> Real

as euclid(start(s),stop(s));create function length(Polygon p) -> Realas sum(select length(s)

from Segment s

where s in segments(p));

• Queries, e.g.:length(:s1);count(select s from Segment s where length(s)>1.34);

vselect start(s) from Polygon p, Segment s, Integer i

where s = segments(p)[i]order by i;

Page 27: Introduction to the extensible NoSQL DBMS  · PDF fileIntroduction to the extensible NoSQL DBMS Amos. ... – Functions and types used for data representation and queries ...

Download and run Amos• sa.amos release by streamanalyze.com can be downloaded from: http://streamanalyze.com/sa-amos/

• Cheat sheet:http://www.it.uu.se/research/group/udbl/amos/doc/AmosCheats.html

• Object-Oriented database modeling tutorial:http://www.it.uu.se/research/group/udbl/amos/doc/tut.pdf

• Previous Uppsala University version, Amos II:http://www.it.uu.se/research/group/udbl/amos/