BI in Clouds

33

Transcript of BI in Clouds

Page 1: BI in Clouds

BI â "îáëàêàõ"

Êóäðÿâöåâ Þðèé, [email protected]

24 àïðåëÿ 2009

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 1 / 33

Page 2: BI in Clouds

Ãîä íàçàä "îáëàêà" áûëè åùå íå ñòîëü ðàñêðó÷åííûì òåðìèíîì

Áûëî ìíîãî "ñèëüíûõ" óòâåðæäåíèé î òîì, êàê èñïîëüçîâàíèåýòîé òåõíîëîãèè èçìåíèò BI

Èíòåðåñíû êàê èñõîäíûå òåíäåíöèè, òàê è òî, ÷òî ïðîèçîøëîçà ãîä

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 2 / 33

Page 3: BI in Clouds

Êëþ÷åâûå ìîìåíòû

1 BI êàê óñëóãà, ïðèìåð Panorama + Google

2 Èñïîëüçîâàíèå "îáëàêà" êàê ïëàòôîðìû äëÿ MPP Õðàíèëèù

3 Map/Reduce è (èëè) MPP ÕÄ?

4 Èñïîëüçîâàíèå MapReduce äëÿ ðàñ÷åòà OLAP-êóáîâ

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 3 / 33

Page 4: BI in Clouds

BI êàê óñëóãà, ïðèìåð Panorama +Google

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 4 / 33

Page 5: BI in Clouds

Î êîìïàíèè Panorama

Èçðàèëüñêàÿ êîìïàíèÿ � îäèí èç ñòàðåéøèõ BI âåíäîðîâ

Êîìàíäà ðàçðàáîò÷èêîâ Microsoft Analysis Services áûëàêóïëåíà èç Panorama

Ïîñëå îáúÿâëåíèÿ î ñîçäàíèè Microsoft PerformancePoint,Panorama îêàçàëàñü â ñëîæíîì ïîëîæåíèè

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 5 / 33

Page 6: BI in Clouds

Panorama Analytics for Google Applications

Äîáàâëåíèå àíàëèòè÷åñêèõ âîçìîæíîñòåé â Google Spreadsheets

Àíàëîã Pivot Tables â Microsoft Excel

Íóæíî ñòðîèòü êóáû, ñ÷èòàòü àãðåãàòû, ðèñîâàòü ãðàôèêè

Äëÿ áûñòðîãî ðàñ÷åòà êóáîâ èñïîëüçóåòñÿ Google App Engine

Åñòü âîçìîæíîñòü èñïîëüçîâàòü Google Docs êàê êëèåíòà ê MsAnalytical Services

http://www.panorama.com/google/

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 6 / 33

Page 7: BI in Clouds

Panorama PowerApps

Ñåðâèñ, ïîçâîëÿþùèé èñïîëüçîâàòü Panorama Analytics âîâíåøíèõ ïðèëîæåíèÿõ/ñàéòàõ

Íàïðèìåð, ãðàôè÷åñêèé àíàëèç íà ñàéòå ó÷åòà ëè÷íûõôèíàíñîâ

Àíàëèòè÷åñêèå êóáû â òàêîì ñëó÷àå ñîáèðàþòñÿ íà "îáëàêå"Google

OLAP êàê ñåðâèñ èëè OLAP 2.0

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 7 / 33

Page 8: BI in Clouds

×òî èçìåíèëîñü çà ãîä?

Èíòåãðàöèÿ â Google Docs óñèëèëàñü, óâåëè÷èâàåòñÿêîëè÷åñòâî ïîëüçîâàòåëåé

Íå âèäíî, ÷òîáû PowerApps øèðîêî èñïîëüçîâàëèñü

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 8 / 33

Page 9: BI in Clouds

Èñïîëüçîâàíèå "îáëàêà" êàêïëàòôîðìû äëÿ MPP Õðàíèëèù

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 9 / 33

Page 10: BI in Clouds

×òî ïðåäîñòàâëÿþò âû÷èñëèòåëüíûå "îáëàêà"?

Âîçìîæíîñòü âçÿòü â àðåíäó ëþáîå êîëè÷åñòâî ñåðâåðîâçàäàííîé êîíôèãóðàöèè

Çàáîòà î ïîääåðæêå ñåðâåðîâ ëîæèòñÿ íà ïðîâàéäåðà "îáëàêà"

Âû ïîëó÷àåòå ip ñåðâåðà è root äîñòóï. Ìîæíî çàëèâàòüãîòîâûå îáðàçû ÎÑ

Äåøåâî )

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 10 / 33

Page 11: BI in Clouds

Çà÷åì ýòî íóæíî äëÿ MPP ÕÄ?

Óäîáñòâî òåñòèðîâàíèÿ òåõíîëîãèè

Ëåãêîå ìàñøòàáèðîâàíèå

Îòñóòñòâèå ïðîáëåì ñ êîíôèãóðàöèåé ñåðâåðîâ

Îñíîâíûå àïîëîãåòû: Vertica, AsterData

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 11 / 33

Page 12: BI in Clouds

Ïðîáëåìû

Áåçîïàñíîñòü äàííûõ

Íàäåæíîñòü ïðîâàéäåðà

Ïåðåíîñ äàííûõ â "îáëàêî"

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 12 / 33

Page 13: BI in Clouds

Áåçîïàñíîñòü, ÷ 1

Çà ýòîò ãîä ïðèäóìàíû âàðèàíòû ðåøåíèÿ: Äîáàâëåíèå ÷àñòèîáëàêà â VPN

http://www.databasecolumn.com/2009/03/securing-your-data-in-the-cloud.html

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 13 / 33

Page 14: BI in Clouds

Áåçîïàñíîñòü, ÷ 2

Øèôðîâàíèå ÷àñòè "âàæíûõ" äàííûõ ïåðåä joinîì

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 14 / 33

Page 15: BI in Clouds

Ïåðåíîñ äàííûõ â îáëàêî

Ñåðâèñû õðàíåíèÿ äàííûõ � îïëàòà çà ãèãàáàéòû äàííûõ,õðàíÿùèõñÿ íà îáëàêå.Ñêîðîñòü èìïîðòà = ñêîðîñòè ñåòåâîãî êàíàëà äî ïðîâàéäåðà.

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 15 / 33

Page 16: BI in Clouds

×òî èçìåíèëîñü çà ãîä?

Óâåëè÷èëîñü êîëè÷åñòâî ïðîâàéäåðîâ "îáëàêîâ": Amazon EC2,AppNexus, GoGrid, Rackspace Cloud

Ó Vertica îêîëî 5òè êëèåíòîâ â "îáëà÷íîì" âàðèàíòå

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 16 / 33

Page 17: BI in Clouds

Map/Reduce è (èëè) MPP ÕÄ?

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 17 / 33

Page 18: BI in Clouds

×òî òàêîå Map Reduce

Map/Reduce � âû÷èñëèòåëüíàÿ ïàðàäèãìà + ïðîãðàììíàÿïëàòôîðìà èñïîëíåíèÿ çàäà÷ íà êëàñòåðå ñåðâåðîâ, ñîçäàííàÿ âêîìïàíèè GoogleÇàäà÷à äåêîìïîçèðóåòñÿ íà 2õ ôàçû � map è reduce (ýêâèâàëåíòíîmap è fold â ôóíêöèîíàëüíûõ ßÏ)

Map-ïðîöåññû çàïóñêàþòñÿ íàä ïîäìíîæåñòâàìè èñõîäíûõäàííûõ è âûïîëíÿþòñÿ àáñîëþòíî íåçàâèñèìî äðóã îò äðóãà.

Reduce-ïðîöåññû îáðàáàòûâàþò ðåçóëüòàòû map-ôàçû,ðàçáèâàÿ èõ ïî çíà÷åíèÿì êëþ÷åé íà íåïåðåñåêàþùèåñÿ áëîêè,÷òî òàêæå ïîçâîëÿåò âûïîëíÿòü èõ íåçàâèñèìî.

Òàêèì îáðàçîì, êàæäàÿ èç ôàç ìîæåò îáðàáàòûâàòüñÿ íàëþáîì êîëè÷åñòâå ñåðâåðîâ ïàðàëëåëüíî.

Open-Source ïðîåêò Map/Reduce ïëàòôîðìû � Apache Hadoop

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 18 / 33

Page 19: BI in Clouds

Ñõåìà Map Reduce

http://labs.google.com/papers/mapreduce.html

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 19 / 33

Page 20: BI in Clouds

Ïðèìåð Map-Reduce, Àëãîðèòì ïîäñ÷åòà ñëîâ â òåêñòå

map � ðàçáèâàåò áëîê ñòðîê íà ñëîâà, äëÿ êàæäîãî ñëîâàãåíåðèðóåò ïàðó (ñëîâî, 1)reduce � äëÿ êàæäîãî ñëîâà, ñêëàäûâàåò ïîëó÷åííûå 1, âû÷èñëÿÿòàêèì îáðàçîì êîëè÷åñòâî ïîâòîðåíèé ñëîâà

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 20 / 33

Page 21: BI in Clouds

Ïî÷åìó âîçíèêëî ïðîòèâîñòîÿíèå M/R è MPP ÕÄ

M/R èñïîëüçóåòñÿ äëÿ ðåøåíèÿ àíàëèòè÷åñêèõ çàäà÷ íàñâåðõáîëüøèõ îáúåìàõ äàííûõ � òà æå îáëàñòü, ÷òî è MPP DWH.2 çàìå÷àòåëüíûõ ïîçèöèè â äèñêóññèè

Map Reduce � ïëàòôîðìà, ÷üå ñóùåñòâîâàíèå îáóñëîâëåíîíåäîñòàòî÷íîé ãðàìîòíîñòüþ ðàçðàáîò÷èêîâ, íå çíàþùèõèñòîðèè ðàçâèòèÿ ïàðàëëåëüíûõ ÑÓÁÄ

Map Reduce � ñðåäñòâî ðàáîòû ñ áîëüøèìè îáúåìàìè,áåñêîíå÷íî ìàñøòàáèðóåìîå è íå èìåþùåå îãðàíè÷åíèÿ â âèäåñõåì è òèïèçàöèè. Çà÷åì íóæíû ÐÑÓÁÄ?

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 21 / 33

Page 22: BI in Clouds

Ñàìûé èçâåñòíûé ïðèìåð Map/Reduce àíàëèòèêè - Hive

Ñîöèàëüíàÿ ñåòü Facebook � äåñÿòêè ìèëëèîíîâ ïîëüçîâàòåëåé

Èñïîëüçóåò Hadoop äëÿ àíàëèçà äàííûõ î ïîëüçîâàòåëÿõ

2,5 Ïåòàáàéòà äàííûõ

 äåíü äîáàâëÿåòñÿ 15 Òá äàííûõ

Òàðãåòèðîâàíèå ðåêëàìû

Àíàëèòè÷åñêèå çàïðîñû

Èíñòðóìåíòàëüíûå ïàíåëè

Text Mining

Íàïèñàíà ñïåöèàëüíàÿ êîìïîíåíòà � Hive, òðàíñôîðìèðóþùàÿSQL çàïðîñ â Map Reduce çàäà÷ó

Hive òåïåðü ÿâëÿåòñÿ îäíèì èç ïîäïðîåêòîâ Hadoop

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 22 / 33

Page 23: BI in Clouds

Ñõåìà ÕÄ Facebook

http://www.slideshare.net/jsensarma/hadoop-hive-talk-at-iitdelhi

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 23 / 33

Page 24: BI in Clouds

Ïî÷åìó Facebook èñïîëüçóåò Hadoop:

Öåíà

Èñïîëüçîâàíèå óæå íàïèñàíîãî êîäà(îáðàáîòêà òåêñòà íàPython) äëÿ ETL

Ìàñøòàáèðóåìîñòü

Ãèáêîñòü ñõåìû õðàíåíèÿ

Âñå ðàâíî íóæåí Oracle RAC äëÿ îíëàéí çàïðîñîâhttp://www.dbms2.com/2009/04/15/cloudera-presents-the-mapreduce-bull-case/

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 24 / 33

Page 25: BI in Clouds

C äðóãîé ñòîðîíû: MPP ÕÄ â Ebay

5 Ïòá äàííûõ â Terradata

10êè òûñÿ÷ ñåðâåðîâ

Ïîñëå òåñòèðîâàíèÿ Hadoop: ìåíüøå çàãðóæàåò ïðîöåññîðà èòðåáóåò áîëüøå ñåðâåðîâ � äîðîæå

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 25 / 33

Page 26: BI in Clouds

Êîãäà ñòîèò èñïîëüçîâàòü MPP ÕÄ èëè Map/Reduce

Ñòàòüÿ Ñòîóíáðåéêåðà ñîòîâàðèùè:A comparison of approaches to large-scale data analysis: MapReduce vsDBMS Benchmarkshttp://database.cs.brown.edu/sigmod09/Ñðàâíåíèå Hadoop, DBMS X (ïîñòðî÷íîå MPP ÕÄ) è Vertica(ïîêîëîíî÷íîå MPP ÕÄ) íà ðÿäå òèïè÷íûõ çàäà÷:

Ïîèñê çíà÷åíèÿ ïî òåêñòîâîé ìàñêå

Ðàñ÷åò àãðåãàòà

Ñëîæíàÿ UDF, ïîäñ÷åò êîëè÷åñòâà âíóòðåííèõ ññûëîê âíàáîðå HTML ñòðàíèö

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 26 / 33

Page 27: BI in Clouds

Âûâîäû òåñòèðîâàíèÿ:

Çàãðóçêà äàííûõ â Hadoop áûñòðåå âñåãî

Ïîèñê è àãðåãàöèÿ � áûñòðåå â DBMS X è êëàñòåðå

UDF áûñòðåå â Hadoop

Vertica áûñòðåå âñåõ íà 100 óçëîâîì êëàñòåðå )

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 27 / 33

Page 28: BI in Clouds

Map/Reduce êàê ñðåäñòâî ðàñøèðåíèÿ MPP ÕÄ

SQL/MR â ÕÄ Aster Data � ñîâìåùåíèå SQL è Map/Reduce âçàïðîñàõ ê õðàíèëèùó×àñòü çàäà÷ â SQL, ÷àñòü íà óäîáíîì ÿçûêå ïðîãðàììèðîâàíèÿ âMRhttp://www.asterdata.com/blog/index.php/2009/04/02/enterprise-class-mapreduce/

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 28 / 33

Page 29: BI in Clouds

Ïðèìåð èñïîëüçîâàíèÿ M/R äëÿàíàëèòèêè. Ñîçäàíèå OLAP êóáîâ

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 29 / 33

Page 30: BI in Clouds

Ïðèìåð èñïîëüçîâàíèÿ M/R äëÿ àíàëèòèêè. Ñîçäàíèå OLAP êóáîâ

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 30 / 33

Page 31: BI in Clouds

Ïðèìåð èñïîëüçîâàíèÿ M/R äëÿ àíàëèòèêè.Ñõåìà îòâåòîâ íàçàïðîñû

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 31 / 33

Page 32: BI in Clouds

Íå âàæíî, êàêîé OLAP-ñåðâåð èñïîëüçîâàòü

Ñêîðîñòü ïîñòðîåíèÿ êóáîâ ðàñòåò ëèíåéíî ñ óìåíüøåíèåìîáúåìà äàííûõ íà óçëå

Ñëîæíàÿ çàäà÷à ñîçäàíèÿ OLAP-êóáà ëåãêî ïåðåïèñûâàåòñÿ âMR

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 32 / 33

Page 33: BI in Clouds

Ñïàñèáî! Âîïðîñû?

Êóäðÿâöåâ Þðèé, [email protected] (ÂÌèÊ ÌÃÓ)BI â "îáëàêàõ" 24 àïðåëÿ 2009 33 / 33