Exploring the Aggregation Framework - Runkel 2015 - Slideshare

download Exploring the Aggregation Framework - Runkel 2015 - Slideshare

of 51

Transcript of Exploring the Aggregation Framework - Runkel 2015 - Slideshare

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    1/51

    xploring the Aggregation Framew

    Jay RunkelSolutions [email protected]@jayrunkel

    mailto:[email protected]:[email protected]:[email protected]
  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    2/51

    Agenda

    1. Analytics in MongoDB?

    2. Aggregation Framework

    3. Aggregation Framework in Action US Census Data

    . Aggregation Framework !"tions

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    3/51

    Analytics in MongoDB?

    CreateRea#

    U"#ateDelete

    Analytics

    ?

    $rou"CountDeri%e &alues

    FilterA%erageSort

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    4/51

    For Example: US Census Data

    ' Census #ata (rom 1))*+ 2***+ 2*1*

    ' ,uestion-

    /ic/ US Di%ision /as t/e (astest growing "o"ulation #ensity?

    e only want to inclu#e #ata states wit/ more t/an 1M "eo"le e only want to inclu#e #i%isions larger t/an 1**0 suare miles

    Di%ision a grou" o( US States

    o"ulation #ensity Area o( #i%ision45 o( "eo"leData is "ro%i#e# at t/e state le%el

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    5/51

    US Regions and Divisions

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    6/51

    o! !ould !e solve t"is in S#$?

    ' S676C8 $9!U B: ;A&

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    7/51

    %"at A&out MongoDB?

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    8/51

    Aggregation Frame!or'

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    9/51

    %"at is an Aggregation (ipeline?

    ' A Series o( Document 8rans(ormations

    6>ecute# in stages

    !riginal in"ut is a collection

    !ut"ut as a cursor or a collection

    ' 9ic/ 7irary o( Functions

    Filter+ com"ute+ grou"+ an# summari@e #ata

    !ut"ut o( one stage sent to in"ut o( ne>t

    !"erations e>ecute# in seuential or#er

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    10/51

    Aggregation (ipeline

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    11/51

    (ipeline )perators

    $match Filter #ocuments

    $project

    9es/a"e #ocuments

    $group

    Summari@e #ocuments

    $unwind

    6>"an# #ocuments

    $sort !r#er #ocuments

    $limit/$skip

    aginate #ocuments

    $redact

    9estrict #ocuments

    $geoNear

    ro>imity sort #ocuments

    $let,$map

    De(ine %ariales

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    12/51

    Aggregation Frame!or' in Action*lets play with the census data+

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    13/51

    MongoDB State Collection

    ' Document For 6ac/ State

    ' =ame

    ' 9egion

    ' Di%ision

    ' Census Data For 1))*+ 2***+ 2*1*

    o"ulation

    ;ousing Units

    !ccu"ie# ;ousing Units

    ' Census Data is an array wit/ t/ree su#ocuments

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    14/51

    Document Model{ "_id" : ObjectId("54e23c7b28099359f5661525"),

    "name" : "a!if#nia",

    "#e$in" : "%e&t",

    "data" : '

    {"tta!" : 33871648,

    "tta!*+&e" : 12214549,

    "cc*+&e" : 11502870, "ea#" : 2000-,

    {"tta!" : 37253956,

    "tta!*+&e" : 13680081,

    "cc*+&e" : 12577498,

    "ea#" : 2010-,

    {"tta!" : 29760021, "tta!*+&e" : 11182882,

    "cc*+&e" : 29008161,

    "ea#" : 1990-

    .,

    /

    -

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    15/51

    Count, Distinct

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    16/51

    -otal US Area

    dbcataa$$#e$ate('

    {"$#+" : {"_id" : n+!!,

    "tta!#ea" : {&+m : "a#ea"-,

    "a$#ea" : {a$ : "a#ea"---.)

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    17/51

    .group

    ' $rou" #ocuments y %alue Fiel# re(erence+ oect+ constant

    !t/er out"ut (iel#s are com"ute#

    ' $max, $min, $avg, $sum

    ' $addToSet, $push' $first, $last

    rocesses all #ata in memory y#e(ault

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    18/51

    Area By Region

    dbcataa$$#e$ate('

    {"$#+" : {"_id" : "#e$in",

    "tta!#ea" : {&+m : "a#ea"-,

    "a$#ea" : {a$ : "a#ea"-,

    "n+mtate&" : {&+m : 1-,

    "&tate&" : {+& : "name"---.)

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    19/51

    Calculating Average State Area By Region

    { $#+: {

    _id: "#e$in",

    a$#ea: {a$:

    a#ea" -

    --

    {

    _id: #t a&t",

    a$#ea: 154

    -

    {

    _id: ;%e&t",

    a$#ea: 300

    -

    {

    &tate: e< =#>",

    a#ea: 218,

    #e$in: ;#t a&t"

    -

    {

    &tate: e< ?e#&e",

    a#ea: 90,

    #e$in: ;#t a&t

    -

    {

    &tate: ;a!if#nia",

    a#ea: 300,

    #e$in: ;%e&t"

    -

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    20/51

    Calculating -otal Area and State Count

    { $#+: {

    _id: "#e$in",

    tt#ea: {&+m:

    a#ea" -,

    &+nt : {&+m : 1---

    {

    _id: #t a&t",

    tt#ea: 308

    &+nt: 2-

    {

    _id: ;%e&t",

    tt#ea: 300,

    &+nt: 1-

    {

    &tate: e< =#>",

    a#ea: 218,

    #e$in: ;#t a&t"

    -

    {

    &tate: e< ?e#&e",

    a#ea: 90,

    #e$in: ;#t a&t

    -

    {

    &tate: ;a!if#nia",

    a#ea: 300,

    #e$in: ;%e&t"

    -

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    21/51

    -otal US (opulation By /ear

    dbcataa$$#e$ate(

    '{+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    22/51

    .un!ind

    ' !"erate on an array (iel#

    Create #ocuments (rom array elements

    ' Array re"lace# y element %alue

    ' Missing4em"ty (iel#s no out"ut

    ' =onarray (iel#s error i"e to $groupto aggregate

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    23/51

    .un!ind

    { +n,

    cen&+&: 1990-

    { &tate: e< =#>",

    cen&+&: '1990, 2000,

    2010.

    -

    {

    &tate: e< ?e#&e",

    cen&+&: '1990, 2000.

    -

    {

    &tate: ;a!if#nia",

    cen&+&: '1980, 1990,

    2000, 2010.

    -{

    &tate: e!a

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    24/51

    Sout"ern State (opulation By /ear

    dbcataa$$#e$ate(

    '{matc : {"#e$in" : "+t"--,

    {+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    25/51

    .matc"

    ' Filter #ocuments

    Uses e>isting uery synta>

    =o w/ere Eser%er si#e a%ascri"tG

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    26/51

    .matc"

    { matc:

    { ;#e$in : ;%e&t -

    -

    {

    &tate: e< =#>",

    a#ea: 218,

    #e$in: ;#t a&t"

    -

    {

    &tate: O#e$n",

    a#ea: 245,

    #e$in: ;%e&t

    -

    {

    &tate: ;a!if#nia",

    a#ea: 300,

    #e$in: ;%e&t"

    -

    {

    &tate: O#e$n",

    a#ea: 245,

    #e$in: ;%e&t

    -

    {

    &tate: ;a!if#nia",

    a#ea: 300,

    #e$in: ;%e&t"

    -

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    27/51

    (opulation Delta By State 0rom 1223 to4313

    dbcataa$$#e$ate(

    '{+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    28/51

    (opulation Delta By State 0rom 1223 to4313

    dbcataa$$#e$ate(

    '{+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    29/51

    .sort, .limit, .s'ip

    ' Sort documents by one or morefelds Same order syntax as cursors Waits or earlier pipeline operator to

    return !n"memory unless early and indexed

    ' #imit and skip ollo cursor

    beha%ior

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    30/51

    (opulation Delta By State 0rom 1223 to4313

    dbcataa$$#e$ate(

    '{+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    31/51

    .0irst, .last

    ' Collection o"erations like "us/ an# a##8oSet

    ' Must e use# in grou"

    ' (irst an# last #etermine# y #ocument or#er

    ' 8y"ically use# wit/ sort to ensure or#ering isknown

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    32/51

    (opulation Delta By State 0rom 1223 to4313

    dbcataa$$#e$ate(

    '{+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    33/51

    .pro5ect

    ' 9es/a"e Documents clu#e or rename (iel#s

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    34/51

    6ncluding and Excluding Fields

    { #ject:

    { ;_id : 0,

    ;1990 : 1,

    ;2010 : 1

    -

    { "_id" : "@i#$inia,

    "1990" : 453588,

    "2010" : 3725789-{ "_id" : "+t a>ta",

    "1990" : 453588,

    "2010" : 3725789-

    {

    "1990" : 453588,

    "2010" : 3725789

    -{

    "1990" : 453588,

    "2010" : 3725789-

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    35/51

    {

    name" : ;+t a>ta,

    de!ta" : 118176

    Renaming and Computing Fields

    { #ject:

    { ;_id : 0, ;1990 : 0,

    ;2010 : 0,

    ;name : ;_id,

    "de!ta" :

    {"&+bt#act" :

    '"2010",

    "1990".--

    -

    {

    "_id" : "@i#$inia,

    "1990" : 6187358,

    "2010" : 8001024

    -{

    "_id" : "+t a>ta",

    "1990" : 696004,

    "2010" : 814180

    - {

    name" : ;@i#$inia,

    de!ta" : 1813666

    -

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    36/51

    Compare num&er o0 people living !it"in7338M o0 Memp"is, -9 in 1223, 4333, 4313

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    37/51

    Compare num&er o0 people living !it"in7338M o0 Memp"is, -9 in 1223, 4333, 4313

    dbcataa$$#e$ate('

    {$eea# : {

    "nea#" : {"te" : "int", "c#dinate&" : '90, 35.-,

    ;di&tanceAie!d : "di&tca!c+!ated",

    ;maBi&tance : 500000,

    ;inc!+deCc& : "di&t!catin", ;&e#ica! : t#+e --,

    {+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    38/51

    .geo9ear

    ' &rder'(ilter )ocuments by

    #ocation Re*uires a geospatial index

    &utput includes physical distance +ust be frst aggregation stage

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    39/51

    {

    "_id" : Denne&&ee",

    "1990" : 4877185,

    "2010" : 6346105,

    ;cente# :

    {;te : ;int,

    ;c#dinate& :

    '866, 378.-

    {

    "_id" : "@i#$inia,

    "1990" : 6187358,

    "2010" : 8001024,

    ;cente# :

    {;te : ;int,

    ;c#dinate& :

    '786, 375.-

    -

    .geo9ear

    {$eea# : {

    "nea#: {"te: "int", "c#dinate&:

    '90, 35.-,

    maBi&tance : 500000,

    &e#ica! : t#+e --

    {

    "_id" : Denne&&ee",

    "1990" : 4877185,

    "2010" : 6346105,

    ;cente# :

    {;te : ;int,

    ;c#dinate& :

    '866, 378.-

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    40/51

    %"at i0 6 !ant to save t"e results to acollection?

    dbcataa$$#e$ate('

    {$eea# : {

    "nea#" : {"te" : "int", "c#dinate&" : '90, 35.-,

    ;di&tanceAie!d : "di&tca!c+!ated",

    ;maBi&tance : 500000,

    ;inc!+deCc& : "di&t!catin",

    ;&e#ica! : t#+e --,

    {+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    41/51

    .out

    db.cData.aggregate([,

    {$out :

    resultsCollection}]

    ' Sa%e aggregation results to a new collection

    ' =ew aggregation uses-

    ' 8rans(orm #ocuments 687

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    42/51

    Bac' -o -"e )riginal #uestion

    ' /ic/ US Di%ision /as t/e (astest growing "o"ulation #ensity?

    e only want to inclu#e #ata states wit/ more t/an 1M "eo"le

    e only want to inclu#e #i%isions larger t/an 1**0 suare miles

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    43/51

    Division !it" Fastest ro!ing (opDensity

    dbcataa$$#e$ate(

    '{matc : {"datatta!" : {"$t" : 1000000---, {+n

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    44/51

    Aggregate )ptions

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    45/51

    Aggregate options

    db.cData.aggregate([],

    {!e"plain# : alse

    %allo&Dis'se% : true,

    %cursor% : {%batc)*i+e% : }}

    e>"lain similar to (in#EG.e>"lainEG

    allowDiskUse enale use o( #isk to store interme#iateresults

    cursor s"eci(y t/e si@e o( t/e initial result

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    46/51

    Aggregation and S"arding

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    47/51

    S"arding

    ' orkloa# s"lit etween s/ar#s S/ar#s e>ecute "i"eline u"

    to a "oint rimary s/ar# merges

    cursors an# continues

    "rocessingH Use e>"lain to analy@e"i"eline s"lit

    6arly $match may e>cuses/ar#s

    otential CU an# memoryim"lications (or "rimarys/ar# /ost

    Hrior to %2.I secon# stage "i"eline "rocessing was

    #one y mongos

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    48/51

    Summary

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    49/51

    Analytics in MongoDB?

    CreateRea#

    U"#ateDeletet

    Analytics

    ?

    $rou"CountDeri%e &aluesFilter

    A%erageSort:6SJ

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    50/51

    Frame!or' Use Cases

    ' Basic aggregation ueries

    ' A#/oc re"orting

    ' 9ealtime analytics

    ' &isuali@ing an# res/a"ing #ata

  • 7/25/2019 Exploring the Aggregation Framework - Runkel 2015 - Slideshare

    51/51

    #uestions?

    ay.runkelKmongo#.com

    Kayrunkel

    mailto:[email protected]:[email protected]:[email protected]