Data Lake: A simple introduction
-
Upload
ibm-analytics -
Category
Data & Analytics
-
view
99 -
download
1
Transcript of Data Lake: A simple introduction
![Page 1: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/1.jpg)
© 2016 IBM Corporation
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
IBM’s Data Lake – A Basic Definition1st June 2016
MandyChessellCBEFREngCEngFBCSDistinguishedEngineer,MasterInventorAnalyticsGroupCTOOffice
![Page 2: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/2.jpg)
© 2016 IBM Corporation2
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Data blues & skills issues
§ Adisproportionateportionofthetimespentinanalyticsprojectisaboutdatapreparation:acquiring/preparing/formatting/normalizingthedata
§ Inadditiontorawdata,augmenteddata/analyticalassetscansignificantlyspeeduptheanalyticsprocessandpartiallybridgethetalentgap
![Page 3: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/3.jpg)
© 2016 IBM Corporation3
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
A growing demand …
BusinessTeamswant• Openaccesstomoreinformation• Morepowerfulanalysisandvisualizationtools
ITTeamsare• Concernedaboutcost.
• Concernedaboutgovernanceandregulatoryrequirements.
![Page 4: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/4.jpg)
© 2016 IBM Corporation4
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Big Data Lakes or Swamps?
§ As we collect data• Can we preserve clarity?• Do we know what we are collecting?• Can we find the data we need?
§ Are we creating a data swamp?
§ How do we build trust in big data?• Do we know what data is being used
for?
![Page 5: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/5.jpg)
© 2016 IBM Corporation5
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
"The need for increased agility and accessibility for data analysis is the primary driver for data lakes," said Andrew White, vice president and distinguished analyst at Gartner. "Nevertheless, while it is certainly true that data lakes can provide value to various parts of the organization, the proposition of enterprise wide data management has yet to be realized."
http://www.gartner.com/newsroom/id/2809117
![Page 6: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/6.jpg)
© 2016 IBM Corporation6
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
IBM’s Data Lake – designed for data access – with safeguards
IBM’sDataLake=EfficientManagement,Governance,ProtectionandAccess.
Data Lake (System of Insight)
Information Management and Governance Fabric
Data Lake Services
Data Lake Repositories
![Page 7: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/7.jpg)
© 2016 IBM Corporation7
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Users supported by IBM’s Data Lake
Data Lake (System of Insight)
Information Management and Governance Fabric
Data Lake Services
Line of BusinessTeams
Data LakeOperations
Data Lake Repositories
Enterprise IT
Other Data Lakes
Systems of Engagement
Systems of Automation
Systems of Record
New Sources
AnalyticsTeams
Governance, Risk andCompliance Team
InformationCurator
![Page 8: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/8.jpg)
© 2016 IBM Corporation8
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
The subsystems inside IBM’s Data Lake
Data Lake (System of Insight)
Information Management and Governance Fabric
Catalogue
Self-ServiceAccess
EnterpriseIT Data
Exchange
Self-ServiceAccess
AnalyticsTeams
Governance, Risk andCompliance Team
InformationCurator
Line of BusinessTeams
Data LakeOperations
Enterprise IT
Other Data Lakes
Systems of Engagement
Data Lake Repositories
Systems of Automation
Systems of Record
New Sources
AnalyticsEngines
![Page 9: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/9.jpg)
© 2016 IBM Corporation9
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
View from the user community - fraud
Conformtoregulations
InvestigateFraudCase
Developnewfraudmodels
DetectandpreventfraudDetectand
preventfraudDetectand
preventfraud
![Page 10: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/10.jpg)
© 2016 IBM Corporation10
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
The role of the catalogue
DataStores
CurationofMetadataaboutStores,Models,Definitions
InformationGovernanceCatalogue
Searchfor,locateanddownloaddataandrelatedartifacts.
ProvisionSandBoxes.
Addadditionalinsightintodatasourcesthroughautomatedanalysis.
Developdatamanagementmodelsandimplementations.
DataStoresDataStores
SandBox Definegovernancepolicies,
rulesandclassifications.Monitorcompliance.
Viewlineage(businessandtechnical)andperformimpactanalysis.
![Page 11: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/11.jpg)
© 2016 IBM Corporation11
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Governance ensures proper management and use of information
InformationGovernance
Compliance
PolicyAdministration
PolicyEnforcement
PolicyMonitoring
PolicyImplementation
Standards Protection
Lifecycle
Quality
InformationValuesQuality
InformationDependencies
InformationRequirements
InformationSupplyChainIntegrity
InformationIdentification
InformationRetention
InformationUsage
InformationPrivacy
InformationArchitecture
InformationDisposal
ArePeople/Systemsoperatingproperly
Isdataqualitysufficientforuse?
Isdatakeptforappropriate
lengthoftime?
Isdataproperlyprotectedfromlossorinappropriateuse?
Aresystemsbuilttoappropriate
standards?
![Page 12: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/12.jpg)
© 2016 IBM Corporation12
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Data lake security
§ Thedatalake’srepositoriesareonlyaccessedbyauthorizedprocesses.
§ Peopleaccessthedatafromthedatalakethroughtheservices.• Identifiedthroughacommonauthenticationmechanism(egLDAP)• Dataclassifiedinthecatalog• Accessgrantedbybusinessowners• Accesscontrolledbydatalakeservices• Allactivitymonitoredbyprobesthatstoreloginformationintheauditdatazone.
IBM’sDataLake=EfficientManagement,Governance,ProtectionandAccess.
Data Lake
Information Management and Governance Fabric
Data Lake Services
Data Lake Repositories
![Page 13: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/13.jpg)
© 2016 IBM Corporation13
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Data Lake (System of Insight)
Information Management and Governance Fabric
Catalogue
Self-ServiceAccess
EnterpriseIT Data
Exchange
Self-Service Access
AnalyticsTeams
Governance, Risk andCompliance Team
InformationCurator
Line of BusinessTeams
Data LakeOperations
Enterprise IT
Other Data Lakes
Systems of Engagement
Systems of Automation
Systems of Record
New Sources
Analytics Engines
IBM’s Data Lake – example deployment options
InfoSphereStreams
InfoSphereInformation
Server
InfoSphereInformationServer
InfoSphereInformationServer
Cognos
WatsonExplorer
CloudantPureData/BLU
InfoSphereBigInsights
InfoSphereMasterDataManagement
WatsonAnalytics
InfoSphereInformationServer,OptimandGuardium
SPSS
![Page 14: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/14.jpg)
© 2016 IBM Corporation14
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
IBM’s Data Lake
§ Asorganizationsexperimentwithanalyticstheydiscover:• Creatingnewanalyticsrequiresaccesstohistoricaldatafrommanysystems.
• Thisdataincludesvaluableandsensitivedatathatiscoretotheorganization’soperation.
• Hadoopisaflexibleplatformforstoringmanytypesofdatabutisnotnecessarilyfastenoughfortheproductiondeploymentofsomeanalytics.DataneedstobereformattedandcopiedontoaspecialistanalyticsplatformssuchasNetezza.
§ Adatalakeprovides:• Singleextractionofdatafromoperationalsystemsanddistributiontomultipleanalyticsplatforms.
• Cataloguingandgovernanceofthedataintheanalyticsplatforms• Simpleinterfacesforthelineofbusinesstoaccesstheinformationtheyneed.
IBM’sDataLake=EfficientManagement,Governance,ProtectionandAccess.
Data Lake
Information Management and Governance Fabric
Data Lake Services
Data Lake Repositories
![Page 15: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/15.jpg)
© 2016 IBM Corporation15
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Governing and managing Big Data for Analytics and Decision Makers
§ AnintroductiontoIBM’sDataLakesolution
http://www.redbooks.ibm.com/redpieces/abstracts/redp5120.html?Open
![Page 16: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/16.jpg)
© 2016 IBM Corporation16
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Designing and Operating a Data Reservoir
§ DescriptionofthebehaviourandprocessesthatmakeupadatalakefromIBM(akadatareservoir)
§ Blog• 5thingstoknowaboutadatareservoirhttps://www.ibm.com/developerworks/community/blogs/5things/entry/5_things_to_know_about_data_reservoir?lang=en
§ Redbook• http://www.redbooks.ibm.com/Redbooks.nsf/RedpieceAbstracts/sg248274.html?Open
![Page 17: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/17.jpg)
© 2016 IBM Corporation17
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Ethics for Big Data and Analyticsü Context – for what purpose was the data originally surrendered? For
what purpose is the data now being used? How far removed from the original context is its new use?
ü Consent & Choice – What are the choices given to an affected party? Do they know they are making a choice? Do they really understand what they are agreeing to? Do they really have an opportunity to decline? What alternatives are offered?
ü Reasonable – is the depth and breadth of the data used and the relationships derived reasonable for the application it is used for?
ü Substantiated – Are the sources of data used appropriate, authoritative, complete and timely for the application?
ü Owned – Who owns the resulting insight? What are their responsibilities towards it in terms of its protection and the obligation to act?
ü Fair – How equitable are the results of the application to all parties? Is everyone properly compensated?
ü Considered – What are the consequences of the data collection and analysis?
ü Access – What access to data is given to the data subject?
ü Accountable – How are mistakes and unintended consequences detected and repaired? Can the interested parties check the results that affect them?
http://www.ibmbigdatahub.com/whitepaper/ethics-big-data-and-analytics
![Page 18: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/18.jpg)
© 2016 IBM Corporation18
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Common Information Models for an Open, Analytical and Agile World
§ TodrivemaximumvaluefromcomplexITprojects,ITprofessionalsneedadeepunderstandingoftheinformationtheirprojectswilluse.Toooften,however,ITtreatsinformationasanafterthought:the“poorstepchild” behindapplicationsandinfrastructure.Thatneedstochange.Thisbookwillhelpyouchangeit.
§ Usingacompletecasestudy,theauthorsexplainwhatCIMsare,howtobuildthem,andhowtomaintainthem.Youlearnhowtoclarifythestructure,meaning,andintentofanyinformationyoumayexchange,andthenuseyourCIMtoimproveintegration,collaboration,andagility.
§ Intoday’smobile,cloud,andanalyticsenvironments,yourinformationismorevaluablethanever.Tobuildsystemsthatmakethemostofit,startrighthere.
![Page 19: Data Lake: A simple introduction](https://reader034.fdocuments.in/reader034/viewer/2022042706/58ac0ccb1a28ab33178b4d3b/html5/thumbnails/19.jpg)
© 2016 IBM Corporation19
LearnmoreaboutDataLakesonibm.com:https://ibm.biz/Bdswi9
Data Lake: Taming the Data Dragon (White Paper)
Tamingthedatadragonleadstosignificantbenefitsacrosstheenterprise,fromimprovedproductivitytoincreasedeffectivenessinsalesandmarketing.Adatalakeacceptsdataflowsfromanysourceandbringsthemintoacommonplatformforuse.Dataisstoredinitsraw,unrefinedstateandlocated,processed,refinedandextractedasrequired. However,governanceneedstobeappliedtothedatalaketoensureitbecomesatrusteddatasource,ratherthanaformlesslandingareainwhichdataisstoredwithoutconsiderationofitsvalidity,valueorshelflife.
DownloadNow:https://ibm.biz/Bdswiu