Webinar Back To Basics - Sessione 5 - reportistica

Serie” Sviluppo di un’applicazione”Back to BasicsReportistica e Analitica

Senior Solutions Architect, MongoDB Inc.

[email protected]

Massimo Brignoli

#MongoDBBasics

Agenda

• Riassunto della scorsa sessione

• Opzioni di Reportistica

• Map Reduce

• Introduzione all’Aggregation Framework

– Aggregation explain

• I Report dell’applicazione mycms

• Geospatial con Aggregation Framework

• Text Search con Aggregation Framework

• Virtual Genius Bar

– Use the chat to post

questions

– EMEA Solution

Architecture / Support

team are on hand

– Make use of them

during the sessions!!!

Q & A

Riassunto della scorsasessione…

Indicizzazione

• Indici

• Multikey, compound,

‘dot.notation’

• Covered, sorting

• Text, GeoSpatial

• Btrees

>db.articles.ensureIndex( {

author : 1, tags : 1 } )

>db.user.find({user:"danr"},

{_id:0, password:1})

>db.articles.ensureIndex( {

location: “2dsphere” } )

>>db.articles.ensureIndex(

{ "$**" : “text”,

name : “TextIndex”} )

Opzioni db.col.ensureIndex({ key : type})

Performance / Efficienza degli Indici

• Controllate i piani

degli indici

• Query lente

• Rapporto n /nscanned

• Quali indici sono usati

operatori .explain() , db profiler> db.articles.find(

{author:'Dan Roberts’})

.sort({date:-1}

).explain()

> db.setProfilingLevel(1,

100)

{ "was" : 0, "slowms" : 100,

"ok" : 1 }

> db.system.profile.find()

.pretty()

Opzioni di Reportistica

Opzioni di Accesso ai Dati

• Query Language

– Utilizzate documenti pre aggregati

• Aggregation Framework

– Calcolate nuovi valori dai dati che avete

– Ad esempio: visite medie, numero di commenti

• MapReduce

– Implementazione interna basata su Javascript

– Esterna con Hadoop, utilizzando il connettore di

MongoDB

• Un Insieme delle 3 opzioni

Risultati istantanei– Semplici da un punto di vista delle query

– Usando la collection delle interazioni

Report Pre Aggregati

{

‘_id’ : ObjectId(..),

‘article_id’ : ObjectId(..),

‘section’ : ‘schema’,

‘date’ : ISODate(..),

‘daily’: { ‘views’ : 45,

‘comments’ : 150 }

‘hours’ : {

0 : { ‘views’ : 10 },

1 : { ‘views’ : 2 },

…

23 : { ‘views’ : 14,

‘comments’ : 10 }

}

}

> db.interactions.find(

{"article_id" : ObjectId(”…..")},

{_id:0, hourly:1}

)

Usate il risultato della query per visualizzarlodirettamente nell’applicazione

– Create una nuova REST API

– D3.js library o similare nella UI

Report Pre Aggregati

{

"hourly" : {

"0" : {

"view" : 1

},

"1" : {

"view" : 1

},

……

"22" : {

"view" : 5

},

"23" : {

"view" : 3

}

}

}

Map Reduce

Map Reduce– MongoDB – JavaScript

Map Reduce Incrementale

Map Reduce

//Esempio di Map Reduce

> db.articles.mapReduce(

function() { emit(this.author, this.comment_count); },

function(key, values) { return Array.sum (values) },

{

query : {},

out: { merge: "comment_count" }

}

)

Output

{ "_id" : "Dan Roberts", "value" : 6 }

{ "_id" : "Jim Duffy", "value" : 1 }

{ "_id" : "Kunal Taneja", "value" : 2 }

{ "_id" : "Paul Done", "value" : 2 }

MongoDB – Hadoop Connector

Integrazione con Hadoop

Primary

Secondary

Secondary

HDFS

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

HDFS HDFS HDFS

MapReduce MapReduce MapReduce MapReduce

MongoS MongoSMongoS

Application ApplicationApplication

Application

Dash Boards /

Reporting

1) Data Flow,

Input /

Output via

Application

Tier

Aggregation Framework

Pipeline Multi Fase– Come una pipe unix

• “ps -ef | grep mongod”

– Aggrega i dati,

– Trasforma I documenti

– Implementato nel core server

Aggregation Framework

//Find out which are the most popular tags…

db.articles.aggregate([

{ $unwind : "$tags" },

{ $group : { _id : "$tags" , number : { $sum : 1 } } },

{ $sort : { number : -1 } }

])

Output

{ "_id" : "mongodb", "number" : 6 }

{ "_id" : "nosql", "number" : 3 }

{ "_id" : "database", "number" : 1 }

{ "_id" : "aggregation", "number" : 1 }

{ "_id" : "node", "number" : 1 }

Nella Nostra Applicazione mycms

//Our new python example

@app.route('/cms/api/v1.0/tag_counts', methods=['GET'])

def tag_counts():

pipeline = [ { "$unwind" : "$tags" },

{ "$group" : { "_id" : "$tags" , "number" : { "$sum" : 1 } }

},

{ "$sort" : { "number" : -1 } }]

cur = db['articles'].aggregate(pipeline, cursor={})

# Check everything ok

if not cur:

abort(400)

# iterate the cursor and add docs to a dict

tags = [tag for tag in cur]

return jsonify({'tags' : json.dumps(tags, default=json_util.default)})

Pipeline and Expression operators

Operatori di Aggregazione

Pipeline

$match

$sort

$limit

$skip

$project

$unwind

$group

$geoNear

$text

$search

Tip: Other operators for date, time, boolean and string manipulation

Expression

$addToSet

$first

$last

$max

$min

$avg

$push

$sum

Arithmetic

$add

$divide

$mod

$multiply

$subtract

Conditional

$cond

$ifNull

Variables

$let

$map

Report nell’Applicazione

Di quali report e analisi abbiamo bisogno nella nostra

applicazione?

– Tag più popolari

– Articoli più popolari

– Luoghi più popolari – integrazione con geospatial

– Visite media per ora e per giorno

Tag Populari

• “Unwind” ogni array ‘tags’

• Raggruppateli e contateli, quindi ordinateli

• Scrivere il risultato in una nuova collection

– Fate le query dalla nuova collection, cosi’ non avete

bisogno di calcolarla tutte le volte


{ $unwind : "$tags" },

{ $group : { _id : "$tags" , number : { $sum : 1 } } },

{ $sort : { number : -1 } },

{ $out : "tags"}

])

Articoli Popolari

• I 5 top articoli in base alle visite medie

– Usate l’operatore $avg

– Usate $match per restringere I dati letti

• Usatelo con gli operatori$gt e $lt

db.interactions.aggregate([

{

{$match : { date :

{ $gt : ISODate("2014-02-20T00:00:00.000Z")}}},

{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},

{$sort : { a : -1}},

{$limit : 5}

]);

Aggregation Framework Explain

• Usate Explain per assicurarvi di fare un uso

efficiente degli indici

db.interactions.aggregate([

{$group : {_id: "$article_id", a : { $avg : "$daily.view"}}},

{$sort : { a : -1}},

{$limit : 5}

],

{explain : true}

);

Explain output…

{

"stages" : [

{

"$cursor" : { "query" : … }, "fields" : { … },

"plan" : {

"cursor" : "BasicCursor",

"isMultiKey" : false,

"scanAndOrder" : false,

"allPlans" : [

{

"cursor" : "BasicCursor",

"isMultiKey" : false,

"scanAndOrder" : false

}

]

}

}

},

…

"ok" : 1

}

Aggregazione Geo Spatial & Text Search

Text Search

• L’operatore $text con l’ aggregation framework

– Tutti gli articoli con la parola “MongoDB”

– Raggruppati per autore, ordinati per numero commenti


{ $match: { $text: { $search: "mongodb" } } },

{ $group: { _id: "$author", comments:

{ $sum: "$comment_count" } } }

{$sort : {comments: -1}},

])

Utilizzo con Geo spatial

• L’operatore $geoNear con l’aggregation framework

– Usate l’operatore geo nella fase di $match

– Raggruppate per autore e numero di articoli


{ $match: { location: { $geoNear :

{ $geometry :

{ type: "Point" ,coordinates : [-0.128, 51.507] } },

$maxDistance :5000}

}

},

{ $group: { _id: "$author", articleCount: { $sum: 1 } } }

])

Riassunto

Riassunto

• Per aggregare i dati:

– Map Reduce

– Hadoop

– Report Pre-Aggregati

– Aggregation Framework

• Aggiustate con il piano di Explain

• Compute on the fly or Compute and store

• Geospatial

• Text Search

Prossima Sessione– 20 Maggio

– Gestire la vostra applicazione– Scalabilità

– Alta disponibilità

– Come preparare la produzione

– DImensionamento

Webinar Back To Basics - Sessione 5 - reportistica

Technology

Transcript of Webinar Back To Basics - Sessione 5 - reportistica