NoSQL continued CMSC 461 Michael Wilson. MongoDB MongoDB is another NoSQL solution Provides a bit...
-
Upload
nigel-carroll -
Category
Documents
-
view
212 -
download
0
Transcript of NoSQL continued CMSC 461 Michael Wilson. MongoDB MongoDB is another NoSQL solution Provides a bit...
NoSQL continuedCMSC 461Michael Wilson
MongoDB MongoDB is another NoSQL solution
Provides a bit more structure than a solution like Accumulo
Data is stored as BSON (Binary JSON) Binary encoded JSON, extends JSON
Allows storage of large amounts of data
SQL vs. MongoDB SQL has databases, tables, rows,
columns Monbo has databases, collections,
documents, fields Both have primary keys, indexes Collection structures are not enforced
heavily Inserts automatically create schemas
Interacting with MongoDB Multiple databases within MongoDB
Switch databases use newDb
New databases will be stored after an insert
Create collection db.createCollection(“collectionName”) Not necessary, collections are implicitly
created on insert
BSON MongoDB uses BSON very heavily
Binary JSON Like JSON with a binary serialization
method Has extensions so that it can represent
data types that JSON cannot Used to represent documents, provide
input to queries
Selects/queries In MongoDB, querying typically consists of
providing an appropriately crafted BSON SELECT * FROM collectionName
db.collectionName.find() SELECT * FROM collectionName WHERE field =
value db.collectionName.find( {field: value} )
SELECT * FROM collectionName WHERE field > 5 db.collectionName.find( {field: {$gt: 5} } )
Other functions that take a query argument have queries that are formatted this way
Interacting with MongoDB Insert
db.collectionName.insert( {queryBSON} ) Update
db.collectionName.update( {queryBSON}, {updateBSON}, {optionBSON} ) updateBSON
Set field to 5: {$set: {field: 5}} Increment field by 1 {$inc: {field: 1}}
optionBSON Options that determine whether or not to create new
documents, update more than one document, write concerns
Interacting with MongoDB Delete
db.collectionName.remove( {queryBSON} )
Apache Hive Also runs on Hadoop, uses HDFS as a
data store Queryable like SQL
Using an SQL-inspired language, HiveQL
Hive data organization Databases Tables Partitions
Tables are broken down into partitions Partition keys allow data to be stored into
separate data files on HDFS Can query on particular partitions
Buckets Can bucket by column to sample data
Purpose of Hive Provide analytics, query large volumes
of data NOT to be used for real time queries like
Postgres or Oracle Hive queries take forever
Partitions and buckets can help reduce this amount of time
Hive queries Hive queries actually generate
MapReduce jobs MapReduce jobs take a while to set up
and run MapReduce jobs can be run manually,
but for structured data and analytics, Hive can be used