HPC Hardware Overview - Texas Advanced Computing Center - Home
Webinar Series: Azure Advanced Technologies In Practice Data with... · One, 1-hour, Webinar for...
Transcript of Webinar Series: Azure Advanced Technologies In Practice Data with... · One, 1-hour, Webinar for...
26/9/2019
Webinar Series:
Azure Advanced Technologies In Practice
Big Data with Analytics on Cloud
Khaleel DemeriCloud Technical Specialist
Get in touch:[email protected]
Series of 10 Webinars starting on 24 July 2019 and ending in October 2019
Based on the new Logicom’ s Azure Advanced Solutions Catalogue
One, 1-hour, Webinar for each Advanced Solution in the Catalogue
Simple and consistent webinar structure:– Business Need, Proposed Solution, Solution Abstract, Solution Details, Solution Demo (where possible), Q & A
Easy registration for each webinar in the series at https://cloud.logicom.net/azure-advanced-
technologies-in-practice/
Watch on demand the webinars recording at https://cloud.logicom.net/webinars/
Share reference solutions for business problems with combinations of Azure services
What is Webinar Series About?
Webinar Series Details
Azure Advanced Technologies In Practice
Azure stack Hybrid Cloud Platform
Build Serverless Applications on Cloud
HPC Video Rendering in Cloud
Securing & Monitoring Hybrid Cloud Environments
Development & Testing in Cloud
Build IoT Solutions on Cloud
Big Data with Analytics on Cloud
Build Intelligent Chatbots on Cloud
Deliver Virtual Applications on Cloud
Deliver Virtual Desktops on Cloud
Real life Business Requirement
Proposed Solution Abstract
Proposed Solution Details
o Solution Characteristics & Business Benefits
o Solution Architectural Components
o Deployment Guidance
o Solution Demo
o Solution Use cases
o Indicative Configuration
Q n A
Contoso, A sales and marketing company that creates incentive programs. These programs reward customers, suppliers, salespeople, and employees. This company has large amounts of data from multiple sources. The company wants to improve the insights gained through data analytics.
The company needs a modern approach to analyze data, so that decisions are made using the right data at the right time.
✓ Combine different kinds of data sources into a cloud-scale platform✓ Load data using a highly parallelized approach that can support thousands of incentive programs, without the high costs
of deploying and maintain on-premises infrastructure✓ Transform data into a common structure, to make the data consistent and ready for analysis and reporting✓ Greatly reduce the time needed to gather and transform data, so you can focus on analyzing the data.✓ Easily get insights and reporting to make better decisions✓ Make sure the solution is highly available and highly scalable
Business Need
Requirements
How can we help?
Big Data with Analytics on Cloud
Solution Abstract✓What is Big Data & Analytics?
✓Why Big Data & Analytics Solution?
✓ Big Data & Analytics Solution Workflow
✓ Solution Components & Definitions
Big Data is a collection of extremely large data sets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions that may affect a business.
A big data solution architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems, to deliver better experiences and make better decisions.
What is Big Data & Analytics?
Big Data & Analytics General Diagram
Data Sources
Data Collection, Ingestion and
Processing
Orchestration
✓ Establish a data warehouse to be a single source of truth for your data
✓ Use semantic modeling and powerful visualization tools for simpler data analysis
✓ Extract insights to take better decisions and seize business opportunities
Why Big Data & Analytics Solution?
Analytics & Reporting
Big Data with Analytics Components & Definitions
Data sources: All big data solutions start with a collection of data sources. Examples: relational databases, Static files produced by applications, such as web server log files and Real-time data sources, such as IoT devices and many other. Data sources are not usually collected in one place nor are ready for analysis.
Data storage: Data for processing operations is typically stored in a distributed file store that can hold high volumes of large files in various formats. This is where the data from different resources can be placed.
Data Processor: Because the data sets are so large, often a big data solution must process data files using long-running batch jobs to filter, aggregate, and otherwise prepare the data for analysis. Usually these jobs involve reading source files, processing them, and writing the output to new files.
Analytical data store: Serve the processed data in a structured format that can be queried using analytical tools. It makes sure the data is now ready for analysis and reporting.
Analysis and reporting service: Provide insights into the data through querying, analysis and reporting.
Orchestration Service: Big Data Solutions’ repeated data processing operations are encapsulated in workflows. In order to automate these workflows, an orchestration technology is needed.
Big Data with Analytics on Cloud
Azure Solution Details✓ Azure Resources
✓ Solution Characteristics and Business Benefits
✓ Solution Architectural Components
✓ Key points of Consideration
✓ Implementation steps
✓ Solution Demo
✓ Indicative Configuration
✓ Solution Use cases
Related Azure Resources
Deploy Big Data with Analytics in Azure
✓ Up to 99.9% availability SLA for individual solution services✓ Store & Process data in volumes too large for a traditional database✓ Transform unstructured data of any kind and size for analysis and reporting✓ No fixed limits on file size, account size, or the number of files✓ Optimized for massive throughput to query and analyze any amount of data✓ Azure AD authentication and role-based access control to secure data✓ Achieve performance through parallelism and dynamic scales✓ Integrate seamlessly with other Azure Services
Solution Characteristics
Business Benefits
✓ Bring together all data like business transactions, social media, sensor data and have it in one place transformed and ready for analysis
✓ Save time using Azure managed services that help analyze data immediately and make quick decisions based on the learnings
✓ Create new business opportunities by discovering trends and patterns from your big data while paying per usage
✓ Pay as you go, only for what consumed
Optional Azure Resources
Azure SQL DatabaseMicrosoft
Azure DB for MySQL ServerMicrosoft
Azure Analysis ServicesMicrosoft
Data Lake AnalyticsMicrosoft
Data Lake StorageMicrosoft
Azure SQL WarehouseMicrosoft
Azure Data factory Microsoft
Azure Active DirectoryMicrosoft
. . .
What is Azure Data Lake?
Why use Azure Data Lake?
Azure Data Lake includes all the capabilities required to make it easy for developers, data scientists, and analysts to storedata of any size, shape, and speed, and do all types of processing and analytics across platforms and languages. It includes two services: Data Lake Store and Data Lake Analytics:
o Data Lake Store, a no-limits data lake store that powers big data analytics and can store different shapes and formats of data
o Data Lake Analytics, a no-limits analytics job service to power intelligent action and drive the data processing
✓ Start in seconds, scale instantly, pay per usage
✓ Store and analyze petabyte-size files and trillions of objects
✓ A single place for data regardless if it’s a file system, object data or both
✓ Debug and optimize big data programs with ease using Azure managed services
✓ Enterprise-grade security, auditing, and support that gives the simplicity and power you need
What is Azure SQL Data Warehouse?
Why use Azure SQL Data Warehouse?
Azure SQL Data Warehouse is a distributed system for storing and analyzing large datasets that are ready for insights extracting. Its use of massive parallel processing makes it suitable for running high-performance analytics..
Azure SQL Data Warehouse allows to independently scale compute and storage, while giving the ability to pause and resume the data warehouse within minutes. So companies can seamlessly create hub for analytics along with native connectivity with data integration and visualization services.
Industry leading SQL engine
Industry leading compliance
Built-in advanced security features
Integrated data processing
Global availability
Fully managed infrastructure
Truly elastic by design
Massive query concurrency
What is Azure Data Factory?
Why use Azure Data Factory?
Azure Data Factory is a cloud data integration service, to compose data storage, movement, and processing services into automated data workflows via pipelines.
Create, schedule, and manage data integrations at scale with Azure Data Factory—a hybrid data integration (ETL) service. Work with data wherever it lives, in the cloud or on-premises, with enterprise-grade security.
What is Microsoft Power BI?
Why use Microsoft Power BI?
Power BI is a cloud-based business analytics service that gives a single view of the most critical business data. Monitor the health of your business using a live dashboard and create rich interactive reports.
Microsoft Power BI helps in staying up to date with the information that matters to you. With Power BI, dashboards helps in keeping a finger on the pulse of your business. Dashboards display tiles that can be clicked to explore further with reports.Connect to multiple datasets to bring all of the relevant data together in one place.
✓ Connect to data from any public, private or corporate source
✓ Provide a 360° view of business with live, visual dashboards
✓ Analyze millions of rows of data with Excel in-memory performance
✓ Visualize hierarchical, financial and geospatial data with new charts
✓ Publish data from Excel directly to Power BI
What is Azure Active Directory?
Why use Azure Active Directory?
Azure Active Directory (Azure AD) is Microsoft’s multi-tenant, cloud-based directory, and identity management service. Azure AD combines core directory services, application access management, and identity protection in a single solution, offering a standards-based platform that helps developers deliver access control to their apps, based on centralized policy and rules.
The multi-tenant, geographically distributed, and high availability design of Azure AD means that you can rely on it for your most critical business needs.
✓ Get seamless access to any application from virtually any location or device
✓ Collaborate securely with partners and customers
✓ Increase IT efficiency and cut down help desk costs
✓ Enhance security and respond to advanced threats in real time
✓ Allow application access security by enforcing rules-based MFA policies
✓ Take advantage of the high-availability and reliability
✓ Create and manage a single identity for each user across your entire enterprise
*Indicative Diagram
• Azure Data Lake Analytics billing is per analytics unit (Details)
• Azure SQL Data Warehouse compute power can be paused and resumed as needed
• You get Industry-leading compliance with more than 20 government and industry certifications, including HIPPA, to
protect your data and keep it sovereign (Details)
• Batch processing on a recurring schedule and partitioning data simplifies data ingestion and failures troubleshoot
• The SQL Data Warehouse provides connection security, authentication and authorization via Azure AD or SQL
Server authentication, and encryption (Details)
• Real-Time data can be collected using Azure services like Azure Event Hubs and Azure IoT Hub
• You can process Real-Time data using Azure Stream Analytics
• Data Lake Store is optimized for Big Data and provides very high throughput and parallel processing
• Azure Blob Storage can be used as an alternative to Azure Data Lake Store for simpler projects
• Microsoft provides many technology choices including 3rd party products and Open source ones
1. Create Resource Group2. Create an Azure Data Lake Storage3. Configure Data Lake Analytics
a) Manage Role-Based Access Controlb) Optional: Manage jobs 1
4. Create an Azure SQL data warehousea) Create a server-level firewall ruleb) Connect to the server as server adminc) Optional: Run queries 2
5. Create and configure a data factorya) How to Create a pipelineb) Trigger the pipeline manuallyc) Trigger the pipeline on a scheduled) Monitor the pipelinee) Create Activity to Copy Data from Data sources to Data lake Storef) Create Activity to transform data using Azure Data Lake Analytics 3
6. Visualize SQL warehouse data with Power BI
Deployment Guide
1. We will use Data Factory pipelines to manage this. Listed for reference and guide.2. To be used after the data is transformed using Data Lake Analytics.3. Data lake Analytics G2 is relatively new and this step is in Preview at this time. You can also have many alternate ways to transform data in Data Factory. (Details)
*Deployment Guide is an example of using the Data Factory workflows with other managed Azure services for the optimum Big Data performance.
Big Data with Analytics on Cloud
Demo✓ Create a Data Factory
✓ Create a pipeline to collect data
✓ Link Azure SQL DB input dataset
✓ Link Azure Blob output dataset
✓ Run and monitor the pipeline
✓ Check the output
Variations of Extract, Transform, and Load (ETL)
With larger volumes data, and a greater variety of formats, big data solutions generally use variations of ETL, such as transform, extract, and load (TEL).
Ability to perform batch processing for data in a data store or stream analytics for real-time messages when data is in motion
OLAP applications are based on a data warehouse that stores massive amounts of data in a format readily consumable by analytics engine. Azure SQL Data warehouse is used for this purpose
Azure Services used for big data natively integrates with each other and with other Azure services for building end-to-end big data and advanced analytics solutions
Build sophisticated analytics workflows quickly
Online Analytical Processing (OLAP)
In-place or in motion data processing
Microsoft Azure
Service type Requirements Region Description
SQL Data Warehouse Save 50 TB of processed data in a structured format
in a massive parallel processing storage with a
maximum of 3000 DWU per hour for compute.
West Europe Tier: Compute Optimized Gen2, Compute: 3000 x 48 Hours, Storage: 50 TB
Data Factory Orchestrate, manage and monitor data flow
between the services.
West Europe Azure Data Factory V2 Type, Data Pipeline Service Type, 500 Read/Write
operation(s), 500 Monitoring operation(s), Azure Integration Runtime: 500
Activity Run(s), 500 Data movement unit(s), 50,000 Pipeline activities, 50,000
Pipeline activities – External; Self-hosted Integration Runtime: 500 Activity Run(s),
5,000 Data movement unit(s), 50,000 Pipeline activities, 50,000 Pipeline activities
– External
Azure Data Lake Storage Load 60 TB of data from different resources into a
staging data area in Azure data lake storage.
West Europe Pay-as-you-go: 60 TB Storage, 5000 Read Transactions, 1000 Write Transactions
Data Lake Analytics Run up to 20 Analytics units periodically for up to
24 hours to process the data and transform It to
analytics ready information.
West Europe Pay as you go type, 20 analytic unit(s), 24 hour(s)
Questions?
LogicomGet in Touch
Azure Advanced Technologies In Practice
Azure stack Hybrid Cloud Platform
Build Serverless Applications on Cloud
HPC Video Rendering in Cloud
Securing & Monitoring Hybrid Cloud Environments
Development & Testing in Cloud
Build IoT Solutions on Cloud
Big Data with Analytics on Cloud
Build Intelligent Chatbots on Cloud
Deliver Virtual Applications on Cloud
Deliver Virtual Desktops on Cloud
Logicom Cisco Distribution Playbook27| Introduction | Strategy | Architectures | Services | Marketing | Operations | Meet the Team
27