Big Blog Analysis
-
Upload
henrique-dias -
Category
Technology
-
view
325 -
download
0
description
Transcript of Big Blog Analysis
![Page 1: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/1.jpg)
Big Blog AnalysisSharding e Map/Reducecom MongoDB
MongoSPJulho 13, 2012
Henrique DiasUniversidade Federal do
Rio Grande do Sul
![Page 2: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/2.jpg)
RoteiroSharding(distribuição)
Map/Reduce(paralelismo)
Ex.: TF-IDF , PageRank
![Page 3: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/3.jpg)
TrabalhoAnalista de T.I. na UFRGS
Sistemas em PHP
Banco de dados Relacional
![Page 4: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/4.jpg)
TrabalhoAnalista de T.I. na UFRGS
Sistemas em PHP
Banco de dados Relacional
![Page 5: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/5.jpg)
PesquisaMestrando na UFRGS
Mineração de Dados
Cluster, Paralelismo, Distribuído
![Page 6: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/6.jpg)
PesquisaMestrando na UFRGS
Mineração de Dados
Cluster, Paralelismo, Distribuído
![Page 7: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/7.jpg)
Análise de Blogs
Projeto de Mestrado
![Page 8: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/8.jpg)
Análise de Blogs
Projeto de Mestrado
● Dados de posts, sem esquema● Milhões de posts● Dados distribuídos● Processamento Paralelo
![Page 9: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/9.jpg)
Análise de Blogs
Projeto de Mestrado
● Dados de posts, sem esquema● Milhões de posts● Dados distribuídos● Processamento Paralelo
MongoDB serve!
![Page 10: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/10.jpg)
"Não há bala de prata"Fred Brooks '86
![Page 11: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/11.jpg)
ProblemaAutores de Blogs Populares em Tópicos
![Page 12: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/12.jpg)
ProblemaAutores de Blogs Populares em Tópicos
Coleta de Blogspara o MongoDB
![Page 13: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/13.jpg)
ProblemaAutores de Blogs
Populares em Tópicos
Coleta de Blogspara o MongoDB
Map/ReducePageRank
![Page 14: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/14.jpg)
ProblemaAutores de Blogs
Populares em Tópicos
Coleta de Blogspara o MongoDB
Map/ReducePageRank
recomendação de Tags com TF-IDF
distribuído
![Page 15: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/15.jpg)
Sharding
Shard London Bridge
![Page 16: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/16.jpg)
Processos MongoDBmongod --shardsvr
mongod --configsvr
mongos --configdb
MongoShard
MongoShard
MongoShard
MongoConfig
mongos
router
tolerânciaà falhas
![Page 17: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/17.jpg)
Shardingmongod --shardsvr
--dbpath /home/mongodb/base --port 27018
mongod --shardsvr --dbpath /home/mongodb/base2 --port 27010
mongod --configsvr--dbpath /home/mongodb/config--port 27019
![Page 18: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/18.jpg)
Shardingmongos --configdb localhost:27019
mongo> use admin
> db.runCommand({ addshard: "localhost:27018" });> db.runCommand({ addshard: "localhost:27010" });
![Page 19: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/19.jpg)
> db.runCommand( { enablesharding : "blogdb" } );> db.runCommand( { shardcollection : "posts",
key : "shardKey" });
Sharding
![Page 20: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/20.jpg)
blogID, content, publishedDate
tags, postID, comments
authorID, title
Shard Key?
![Page 21: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/21.jpg)
blogID, content, publishedDate
tags, postID, comments
authorID, title
Shard Key!
![Page 22: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/22.jpg)
Virtual Shards
VM (8GB 4vP) VM (8GB 4vP)
VM (8GB 4vP)
MongoShard
MongoShard
MongoShard
MongoShard
MongoConfig
MongoShard
MongoShard
![Page 23: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/23.jpg)
758,102 blogs
30,635,902 posts
21,467,340 comments
tam. méd. obj: 1.8mb
Conjunto de Dados
![Page 24: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/24.jpg)
Post em JSON{
"_id" : ObjectId("4e92239ee4b020f5ff0041fb"),"authorID" : "10757528238954720127","blogID" : "1000004267813776424","postID" : "4057761886666222842","published" : ISODate("2011-03-24T12:00:00Z"),"title" : "Quis autem vel eum","content" : "Lorem ipsum dolor sit amet...","tags" : [ "voluptatem" , "accusantium" ],"comments" : [
{"commentID" : "77618861000004262228","authorID" : "00627699636039248506","published" : ISODate("2011-03-24T12:12:11.645Z"),"content" : "Neque porro quisquam est,..."}]
}
![Page 25: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/25.jpg)
Map/Reduce
![Page 26: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/26.jpg)
Map/ReduceMAP Local
Reduce Reduceregistros
![Page 27: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/27.jpg)
Map/ReduceMAP Local
Reduce Reduce
key,value
key,value
key,value
registros
![Page 28: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/28.jpg)
Map/ReduceMAP Local
Reduce Reduce
key,value
key,value
key,value
shufflecombine
registros
![Page 29: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/29.jpg)
Map/ReduceMAP Local
Reduce Reduce
key,value
key,value
key,value
shufflecombine
registros
saída
![Page 30: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/30.jpg)
Map/Reduce
política ...
saúde ...
carros ...
cinema ...
moda ...
futebol ...
livros ...
filmes ...
...
MAP Reduce
saúde, 1
futebol, 1
política, 1
saúde, 143futebol, 230política, 85...
...
documentos
![Page 31: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/31.jpg)
Map/Reduce MongoDBFunções JavaScript
> map = function() { this.content.split(' ').forEach( function(word){ emit( word, 1 ); });}
> reduce = function(key, values) { var count = 0; values.forEach(function(value) { count += value; } ); return count; }
> db.posts.mapReduce(map, reduce, {out: { inline : 1}});
![Page 32: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/32.jpg)
Map/Reduce MongoDB
MongoShard
MongoShard
MongoShard
mongos
dispara
![Page 33: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/33.jpg)
Map/Reduce MongoDB
MongoShard
MongoShard
MongoShard
mongos
mapreduce
mapreduce
mapreduce
![Page 34: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/34.jpg)
Map/Reduce MongoDB
MongoShard
MongoShard
MongoShard
mongos
reduce
![Page 35: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/35.jpg)
TF-IDF Distribuídopara recomendação de Tags
Tópicos
![Page 36: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/36.jpg)
Métrica de RelevânciaPalavra x Tag x Global
TF-IDF
![Page 37: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/37.jpg)
TF: Frequência do Termo
tf(t,d) =
IDF: Frequência inversa nos documentos
idf(t,D) = log
tf x idf (t,d,D) = tf(t,d) x idf(t,D)
TF-IDF
|D||{d ∈ D : t ∈ d}|
|t||T|
![Page 38: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/38.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDF
![Page 39: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/39.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDF
![Page 40: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/40.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDFN = 70, D = 100
![Page 41: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/41.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDFN = 70, D = 100
tf(magma,d) = 2/70idf(magma,D) = 100/4
![Page 42: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/42.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDFN = 70, D = 100
tf(magma,d) = 2/70idf(magma,D) = 100/4
tf-idf(magna,d,D) = 0,09
![Page 43: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/43.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDFN = 70, D = 100
tf(magma,d) = 2/70idf(magma,D) = 100/4
tf-idf(magna,d,D) = 0,09
tf(in,d) = 3/70idf(in,D) = 100/65
![Page 44: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/44.jpg)
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint magna occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
TF-IDFN = 70, D = 100
tf(magma,d) = 2/70idf(magma,D) = 100/4
tf-idf(magna,d,D) = 0,09
tf(in,d) = 3/70idf(in,D) = 100/65
tf-idf(in,d,D) = 0,01
![Page 45: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/45.jpg)
Valores necessários:
n: Ocorrências de p em uma tag ( Tarefa 1 )
N: Nº palavras em uma tag ( Tarefa 2 )
d: Nº de tags que p aparece ( Tarefa 3 )
D: Total de tags ( Tarefa 2 )
TF-IDF Map/Reduce
![Page 46: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/46.jpg)
TF-IDF Tarefa 1MAP Reduce
{ tag , palavra } , 1
{ tag , palavra } , n
somatório da palavra para a tag
Entrada
Posts{ tags, conteúdo }
![Page 47: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/47.jpg)
TF-IDF Tarefa 1Map
function(){var tags = this.tags;
this.content.split(' '). forEach(function(sWord){
tags.forEach(function(sTag){ emit ( { tag: sTag , word: sWord } , 1 );
});});
};
![Page 48: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/48.jpg)
TF-IDF Tarefa 1Reduce
function( key , values ){var count = 0;values.forEach( function(value) {
count += value;});return count;
};
![Page 49: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/49.jpg)
TF-IDF Tarefa 1Resultado
{_id: {tag:"saúde", word:"doença" } values : 98 }{_id: {tag:"política", word:"leis" } values : 13 }{_id: {tag:"saúde", word:"saúde" } values : 32 }{_id: {tag:"política", word:"crise" } values : 45 }{_id: {tag:"saúde", word:"corpo" } values : 98 }{_id: {tag:"saúde", word:"para" } values : 34 }{_id: {tag:"2012", word:"de" } values : 65 }...
![Page 50: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/50.jpg)
TF-IDF Tarefa 2MAP Reduce
tag , n
tag , N
somatório dos contadores
Entrada
{ tag, palavra } , n
![Page 51: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/51.jpg)
TF-IDF Tarefa 2Resultado
{_id: "saúde", values : 670 }{_id: "política", values : 830 }{_id: "futebol", values : 700 }{_id: "2012", values : 1500 }...
Combina o resultado com o anterior
![Page 52: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/52.jpg)
TF-IDF Tarefa 2Resultado
{tag:"saúde", word:"doença" } n:98, N:670{tag:"política", word:"leis" } n:13, N:830{tag:"saúde", word:"saúde" } n:32, N:670{tag:"política", word:"crise" } n:45, N:830{tag:"saúde", word:"corpo" } n:98, N:670{tag:"saúde", word:"para" } n:34, N:670{tag:"2012", word:"de" } n:65, N:1500...
![Page 53: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/53.jpg)
TF-IDF Tarefa 3MAP Reduce
palavra , 1
palavra , d
documentos que a palavra aparece
Entrada
{ tag, palavra } , n
![Page 54: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/54.jpg)
TF-IDF Tarefa 3Resultado
{_id: "doença", values : 45 }{_id: "leis", values : 23 }{_id: "saúde", values : 80 }{_id: "crise", values : 41 }{_id: "corpo", values : 30 }{_id: "para", values : 350 }{_id: "de", values : 480 }...
Combina resultados anteriores
![Page 55: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/55.jpg)
TF-IDF CálculoResultado
{tag:"saúde", word:"doença" } n:98, N:670, d:45{tag:"política", word:"leis" } n:13, N:830, d:23{tag:"saúde", word:"saúde" } n:32, N:670, d:80{tag:"política", word:"crise" } n:45, N:830, d:41{tag:"saúde", word:"corpo" } n:98, N:670, d:30...
> D = db.TFIDF_Tarefa2.count();> db.TagsTFIDF.find().forEach(function(item){
item.tfidf = (item.n/item.N)*Math.log(D/item.d) });
![Page 56: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/56.jpg)
TF-IDF ResultadoSaúde
saúdeágua
doençaspelecorpo
sintomasanimais
alimentoscriançascélulas
...
Políticadeputadopresidente
governocontraDilma
ministroMinistério
EstadopolíticaCâmara
...
Futebolgolstime
futebolequipe
goljogadorCopa
contrarodadapartida
...
![Page 57: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/57.jpg)
PageRank Distribuído
Popularidade
![Page 58: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/58.jpg)
PageRank
__
__
__
_ __
![Page 59: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/59.jpg)
PageRank
__
__
__
_ __
![Page 60: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/60.jpg)
PageRank
__
__
__
_ __
![Page 61: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/61.jpg)
PageRank
__
__
__
_ __
![Page 62: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/62.jpg)
PageRank
__
__
__
_ __
80
9
![Page 63: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/63.jpg)
PageRank
__
__
__
_ __
80
9
40
40
3
3
3
![Page 64: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/64.jpg)
PageRank
__
__
__
_ __
80
9
40
43
40
40
3
3
3
![Page 65: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/65.jpg)
PageRank Map/Reduce
Tarefa 1:Lista de ID usuário x IDs Autores comentados
Tarefa 2:Iterações propagando os valores de PageRank
![Page 66: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/66.jpg)
PageRank Tarefa 1MAP Reduce
userID , authorID
userID , [authorIDs]
lista de autores comentados por
Entrada
PostauthorID, Comentários
![Page 67: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/67.jpg)
PageRank Tarefa 1Query{tags: "saúde"}
Mapfunction(){
var idAuthor = this.authorID;this.comments.forEach ( function (comment) {
if (comment.userID!=idAuthor) {emit ( comment.userID , [ idAuthor ] );
}});
};
![Page 68: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/68.jpg)
PageRank Tarefa 1Reduce
function( key , values ){var outL = [];values.forEach( function(value) {
outL = outL.concat(value);});return outL;
};
![Page 69: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/69.jpg)
PageRank Tarefa 1Resultado
{ _id: "00627699636039248506" , values: [
"10757528238954720121","40577618866662228425","10000042678137764244",...
]}
![Page 70: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/70.jpg)
PageRank Tarefa 2MAP Reduce
authorID , PR/N
authorID , PageRank
valor do PageRank de cada Autor
Entrada
userID , [authorIDs]
![Page 71: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/71.jpg)
PageRank Tarefa 2Map
function(){var prK = this.value.pr/this.value.outL.length;
this.value.outL.forEach ( function (authorID) {emit ( authorID, { pr:prK , outL:[] , prOld: 0 } );
});
if (this.value.outL.length + prK > 0) emit ( this._id , {
pr: 0 , outL: this.value.outL , prOld: this.value.pr } );
};
![Page 72: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/72.jpg)
PageRank Tarefa 2Reduce
function( key , values ){var result = { pr:0 , outL:[] , prOld:0 }; values.forEach(function(value) {
result.pr += value.pr;result.outL = result.outL.concat(value.outL); result.prOld += value.prOld;
}); return result;
};
Processo deve executar até PageRank convergir
![Page 73: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/73.jpg)
PageRank ResultadoSaúde
refugiadstardollcoturnonwelovefrfcoeliesangeloriforbidde
...
Política
bigbostmilitarpolibiofator
tribodohempadablogdop
...
Futebol
medobvalcabragesptec
hugogoesnovoblogbigbotheblogdoma
...
![Page 74: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/74.jpg)
Desempenho?
Processo Offline
![Page 75: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/75.jpg)
PageRank: 65.650.470ms (~18h)
TF-IDF: 108.213.056ms (~30h)
Tempo Execução
![Page 76: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/76.jpg)
Memória CompartilhadaDisco Virtual Sata
JS Single-ThreadMap/Reduce JS LocksTamanho do Objeto BSON
Por que?
![Page 77: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/77.jpg)
$olução!CloudFiber Channel
Dividir em mais $hards"ticket SERVER-4258 will allow multi-threading"
Reduzir o tamanho do BSON
![Page 78: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/78.jpg)
Próximos Passos150 Milhões de Posts
Processo Online com mais $hard$
NMF, K-Means em Map/Reduce
![Page 79: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/79.jpg)
Obrigado! Dúvidas?inf.ufrgs.br/~hdpsantos (código-fonte e dados)
MongoSPJulho 13, 2012
Henrique DiasUniversidade Federal do
Rio Grande do Sul
![Page 80: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/80.jpg)
Durabilidade?
![Page 81: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/81.jpg)
repairDatabase()versões anteriores
hoje: journaling
Reparar os Dados
![Page 82: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/82.jpg)
mongoexport - json, csv
mongodump - bson
Backup dos Dados
![Page 83: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/83.jpg)
Google Compute Engine
![Page 84: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/84.jpg)
AWS Free Usage Tier
Teste com Single Shard
● EC2 m1.xlarge● EBS 20GB
![Page 85: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/85.jpg)
AWS Free Usage Tier
Teste com Single Shard
● EC2 m1.xlarge● EBS 20GB
Mesmo desempenho!(para single shard)
![Page 86: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/86.jpg)
Nossa Infra-estrutura?
![Page 87: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/87.jpg)
Um Dell Server
4 Xeon Quad-core24 GB RAM2 x 1TB SATA
![Page 88: Big Blog Analysis](https://reader033.fdocuments.in/reader033/viewer/2022060200/55980bad1a28ab302c8b46c0/html5/thumbnails/88.jpg)
ColetaMongoShard
MongoShard
MongoShard
MongoShard
MongoConfig
MongoShard
MongoShard
7x mongos70 Coletores