Analytics as a Service in SL

12
Enabling Analytics as a Service (AaaS): The Key Analytical Platforms and Workloads on IBM SoftLayer Cloud Abstract In the recent past, two prominent and dominant trends have gripped the IT industry. The first one is the accelerated IT infrastructure optimization, which is being primarily sponsored and supported by the proven and promising cloud technologies. The other one is the amount of data getting generated, collected, and subjected to a variety of investigations to extract actionable insights in time, to enable correct and timely decision-making by business executives with all the confidence and clarity, and to empower knowledge workers to be greatly efficient in their tasks, is challengingly massive in size. Established product vendors and researchers from academic institutions across the world are in fast track and in grand unison in collaboratively conceiving and concretizing a bevy of service assemblage and delivery platforms (SDPs), data virtualization, ingestion, analytics and visualization platforms and application enablement platforms (AEPs) in order to speed up and simplify knowledge extraction and engineering from a variety of data heaps via real-time as well as batch processing. Towards the knowledge discovery and dissemination, there are design and architectural patterns, highly synchronized processes, evaluation metrics, key guidelines, best practices, etc. being unearthed and sustained by data management professionals. We have performed a variety of proof of concepts and pilots in the fast-growing big and fast data analytics domains and based on that experience and expertise gained, we could produce a repository of reusable assets to be shared across. In this paper, we would illustrated how the trendy and transformative data analytics is being exposed and delivered as a service via IBM SoftLayer Cloud for worldwide users in an affordable, amenable and accelerated fashion. Introduction There are several disruptive things happening in parallel in the IT field. The device ecosystem is seeing an unprecedented growth towards billions of connected devices, the number of implantables, wearables, portables, cyber-physical systems (CPS), etc. are zooming ahead, the business-critical operational, transactional and analytical systems are becoming pervasive, social sites are embraced with a greater alacrity by people across the world, the digitization idea is pursued vigorously as never before resulting in trillions of digitized entities / smart objects / sentient materials, scores of powerful scientific and technical experimentations are accomplished, etc. Traditionally business data is the main source for analytics to squeeze out business insights. Today the data size is massive, data scope, speed, and structure are varying sharply, and the resulting data value for any individual, innovator and institution is going to be decisive if all kinds of data getting collected are crunched cognitively. Having understood the strategic significance of data-driven insights, there are two grand disciplines (big and fast data analytics) of deeper research and study. There are several enabling technologies, platforms, and tools in plenty these days from worldwide product vendors for accelerating big and fast data analytics in a simplified and streamlined fashion. Precisely speaking, there is an insistence on crafting and composing insights-filled knowledge services towards enhanced care, choice, comfort and convenience for people. In the ensuing sections, we would like throw some light on the two principal technologies enabling the smooth and sagacious realization of analytics as a service (AaaS). We write about the analytics platforms and workloads that got modernized, migrated and deployed in IBM SoftLayer Cloud to envisage and enable the ultimate aim of accomplishing of analytics as a service. The World of Big and Fast Data Analytics Big data analytics is now moving beyond the realm of intellectual curiosity and propensity to make tangible and trendsetting impacts on business operations, offerings and outlooks. It is no longer a hype or a buzzword and is all set to become a core and central tenet for every sort of business enterprise to be extremely relevant and

Transcript of Analytics as a Service in SL

Page 1: Analytics as a Service in SL

Enabling Analytics as a Service (AaaS): The Key Analytical Platforms and Workloads on IBM SoftLayer Cloud

Abstract

In the recent past, two prominent and dominant trends have gripped the IT industry. The first one is the accelerated IT infrastructure optimization, which is being primarily sponsored and supported by the proven and promising cloud technologies. The other one is the amount of data getting generated, collected, and subjected to a variety of investigations to extract actionable insights in time, to enable correct and timely decision-making by business executives with all the confidence and clarity, and to empower knowledge workers to be greatly efficient in their tasks, is challengingly massive in size. Established product vendors and researchers from academic institutions across the world are in fast track and in grand unison in collaboratively conceiving and concretizing a bevy of service assemblage and delivery platforms (SDPs), data virtualization, ingestion, analytics and visualization platforms and application enablement platforms (AEPs) in order to speed up and simplify knowledge extraction and engineering from a variety of data heaps via real-time as well as batch processing. Towards the knowledge discovery and dissemination, there are design and architectural patterns, highly synchronized processes, evaluation metrics, key guidelines, best practices, etc. being unearthed and sustained by data management professionals. We have performed a variety of proof of concepts and pilots in the fast-growing big and fast data analytics domains and based on that experience and expertise gained, we could produce a repository of reusable assets to be shared across. In this paper, we would illustrated how the trendy and transformative data analytics is being exposed and delivered as a service via IBM SoftLayer Cloud for worldwide users in an affordable, amenable and accelerated fashion.

Introduction

There are several disruptive things happening in parallel in the IT field. The device ecosystem is seeing an unprecedented growth towards billions of connected devices, the number of implantables, wearables, portables, cyber-physical systems (CPS), etc. are zooming ahead, the business-critical operational, transactional and analytical systems are becoming pervasive, social sites are embraced with a greater alacrity by people across the world, the digitization idea is pursued vigorously as never before resulting in trillions of digitized entities / smart objects / sentient materials, scores of powerful scientific and technical experimentations are accomplished, etc.

Traditionally business data is the main source for analytics to squeeze out business insights. Today the data size is massive, data scope, speed, and structure are varying sharply, and the resulting data value for any individual, innovator and institution is going to be decisive if all kinds of data getting collected are crunched cognitively. Having understood the strategic significance of data-driven insights, there are two grand disciplines (big and fast data analytics) of deeper research and study. There are several enabling technologies, platforms, and tools in plenty these days from worldwide product vendors for accelerating big and fast data analytics in a simplified and streamlined fashion. Precisely speaking, there is an insistence on crafting and composing insights-filled knowledge services towards enhanced care, choice, comfort and convenience for people. In the ensuing sections, we would like throw some light on the two principal technologies enabling the smooth and sagacious realization of analytics as a service (AaaS). We write about the analytics platforms and workloads that got modernized, migrated and deployed in IBM SoftLayer Cloud to envisage and enable the ultimate aim of accomplishing of analytics as a service.

The World of Big and Fast Data Analytics Big data analytics is now moving beyond the realm of intellectual curiosity and propensity to make tangible and trendsetting impacts on business operations, offerings and outlooks. It is no longer a hype or a buzzword and is all set to become a core and central tenet for every sort of business enterprise to be extremely relevant and

Page 2: Analytics as a Service in SL

rightful to their stakeholders and end-users. Big data analytics is a generic and horizontally applicable idea to be feverishly leveraged across all kinds of business domains and hence is poised to become a trendsetter for worldwide businesses to march ahead with all clarity and confidence. Real-time analytics is the hot requirement today and everyone is working on fulfilling this critical need. The emerging use cases include the use of real-time data such as the sensor data to detect any abnormalities in plant and machinery and batch processing of sensor data collected over a period to conduct root cause and failure analysis of plant and machinery. Public Clouds for Big and Real-time Data Analytics - Most traditional data warehousing and business intelligence (BI) projects to date have involved collecting, cleansing and analyzing data extracted from on-premises business-critical systems. However, this age-old practice is about to change forever. However, for the foreseeable future, it is unlikely that many organizations will move their mission-critical systems or data (customer, confidential and corporate) to public cloud environments for analysis. Businesses steadily are adopting the cloud idea for business operational and transactional purposes. Packaged and cloud-native applications are primarily found fit for clouds and they are exceedingly well in their new residences. The biggest potential for cloud computing is the affordable and adept processing of data that already exists in cloud centers. All sorts of functional web sites, applications and services are bound to be cloud-based sooner rather than later. The positioning of clouds as the converged, heavily optimized and automated, dedicated and shared, virtualized and software-defined environment for IT infrastructures (servers, storage and networking), business infrastructure and management software solutions and applications is getting strengthened fast. Therefore every kind of physical assets are seamlessly integrated with cloud-based services in order to be smart in their behavioral aspects. That is, ground-level sensors and actuators are increasingly tied up with cloud-based software to be distinct in their operations and outputs. All these developments clearly foretell that the future data analytics is to flourish fluently in clouds. These days’ public clouds are natively providing all kinds of big data analytics tools, platforms, and tools on their infrastructures in order to speed up the most promising data analytics at a blazing speed at an affordable cost. WAN optimization technologies are maturing fast to substantially reduce the network latency while transmitting huge amount of data from one system to another among geographically distributed clouds. Federated, open, connected, and interoperable cloud schemes are fast capturing the attention of the concerned and hence we can see the concept of the inter-cloud getting realized soon through open and industry-strength standards and deeper automations. With the continued adoption and articulation of new capabilities and competencies such as software-defined compute, storage and networking, the days of cloud-based data analytics is to grow immensely. In short, clouds are being positioned as the core, central and cognitive environment for all kinds of complex tasks. Hybrid Clouds for Specific Cases - It is anticipated that in the years to unfold, the value of hybrid clouds is to climb up sharply as for most of the emerging scenarios, a mixed and multi-site IT environment is more appropriate. For the analytics space, a viable and venerable hybrid cloud use case is to filter out sensitive information from data sets shortly after capture and then leverage the public cloud to perform any complex analytics on them. For example, if analyzing terabytes worth of medical data to identify reliable healthcare patterns to predict any susceptibility towards a particular disease, the identity details of patients are not too relevant. In this case, just a filter can scrape names, addresses, and social security numbers, etc. before pushing the anonymized set to secure cloud data storage. All kinds of software systems are steadily being modernized and moved to cloud environments especially public clouds to be given subscribed and used as a service over the public web. The other noteworthy factor is that a variety of social sites for capturing and captivating different segments of people across the world are emerging and joining in the mainstream computing. We therefore hear, read and even use social media, networking, and computing aspects. A statistics says that the widely used Facebook pours out at least 8 terabytes of data every day. Similarly other social sites produce a large-scale of amount of personal, social, professional data apart from musings, blogs, opinions, feedbacks, reviews, multimedia files, comments, compliments, complaints,

Page 3: Analytics as a Service in SL

advertisements, and other articulations. These poly-structured data play a bigger role in shaping up the data analytics domain. The other valuable trends include the movement of enterprise-class operational, transactional, commercial, and analytics systems to public clouds. We all know that www.salesforce.com is the founding public cloud providing CRM as a service. Thus most of the enterprise data originates in public clouds. With public clouds projected to grow fast, the cloud data is being presented as another viable and venerable opportunity towards cloud-based data analytics.

The Contemporary Analytics in Hybrid Clouds

Apart from the traditional business analytics, the above-mentioned trends ask for newer kinds of analytics leveraging big and real-time data. There are domain-specific and agnostic analytics categories. For example, increasingly the justifications for predictive and prescriptive analytics, operational, security, performance analytics and so on are being expounded with the purposeful emergence of different and distributed data sources. Every industry vertical has its big data analytics. With different data velocities, real-time / streaming analytics is bound to be mandatory. There are a few vital parameters to determine the appropriateness of cloud environments for powerful data analytics.

The Data Volume and Velocity

The Impacts on Compute, Storage and Network Resources

The Sensitivity of data and Regulatory /Compliance Requirements

The Scope of Analytics

The Types of the Environments?

Why the Next-Generation Data Analytics Applications and Platforms in Cloud Environments?

Clouds-based data analytics has been picking up fast in order to reap all the originally envisaged benefits of the cloud paradigm. Here is a list of key benefits to be accrued out of the cloud embarkation strategy and journey.

Agility & Affordability - No capital investment of a large-scale IT infrastructures. Just Use and Pay

Big & Fast Data Platforms - Deploying and using any kind of Big data Platforms (generic or specific, open or commercial-grade, etc.) for analytics are quick and easy

End-to-end Hadoop Platforms – Data virtualization, ingestion, processing, mining, analytics, and information visualization tasks are being performed by these platforms

Data Management Systems – Parallel, Clustered, Distributed SQL databases, NoSQL and NewSQL databases are made available in Clouds

Data Warehouse Systems – Recently there are data warehouse as a service (DWaaS) capabilities are being realized

Social Sites, mobile application stores, etc. – The popular social media and network applications are being run on public clouds

WAN Optimization Technologies - There are WAN optimization products and platforms for efficiently transmitting data over the Internet infrastructure

Business Applications in Clouds - With enterprise information systems (EISs), business-critical packaged applications such as ERP, CMS. SCM, KM, etc. are also getting deployed in clouds.

Cloud Integrators, Brokers & Orchestrators – There are products and platforms for seamless interoperability among different and distributed systems, services and data

Operational, Transactional and Analytical Systems are modernized, migrated and hosted in Clouds

Page 4: Analytics as a Service in SL

Device / Sensor / Machines Integration with Cloud-native as well as enabled Applications, Services and Data

Cloud-based Analytical Platforms

We have performed a number of proof of concepts (PoCs) in order to gain the deeper understanding of cloud-based big and fast data analytics. The following sections are to depict the various platforms, databases, and tools which are made to run in IBM SoftLayer Cloud for simplifying and streamlining the provision of analytics as a service to worldwide clients and customers.

Big Data Analytics Platforms in IBM SoftLayer Cloud

Increasingly, individuals, innovators and institutions are taking advantage of the agility and cost efficiencies that cloud infrastructures provide. There are several other advantages being carefully associated with cloudification of enterprise IT infrastructures. As we all know, Hadoop is the prime method to proceed with confidence. The maturity and stability levels of Hadoop-compliant data analytics platforms are pushing companies towards big data analytics. As enunciated earlier, the cloud infrastructure is being positioned as the most appropriate one for big data analytics. Also there are several open source as well as commercial-grade implementations of Hadoop specifications in the market. Cloudera, Hortonworks, and MapR. IBM InfoSphere BigInsights is the most favored and full-fledged commercial implementation with Apache Hadoop as the base.

Designed specifically for mission-critical environments, Cloudera Enterprise includes Cloudera data hub (CDH), the world’s most popular open source Hadoop-based platform, as well as advanced system management and data management tools. Cloudera Enterprise includes Cloudera Manager to help you easily deploy, manage, monitor, and diagnose issues with your cluster. Cloudera is critical for operating clusters at scale. Cloud environments are becoming increasingly popular for critical Apache Hadoop workloads, given their flexibility and elasticity. With Cloudera Director, you can unlock the full potential of Hadoop in the cloud, without compromise. The CDH reference architecture is given below.

Page 5: Analytics as a Service in SL

SoftLayer Cloud not only provides potentially unlimited resources for your high-performance computing cluster, but makes it easy to manage with Cloudera Managed Hadoop. Similarly we have deployed Hortwonworks and MapR Hadoop platforms in SoftLayer Cloud. A typical cloud-based solution comprises storage, processing and management components deployed on SoftLayer Cloud, an extensible, elegant, efficient, and elastic environment for processing your data. The other benefits include extreme flexibility, high performance, agility, and pay as per the usage obliterating the upfront costs. IBM InfoSphere BigInsights is also made available on SoftLayer cloud and this movement brings the following benefits to the table.

Accelerates and simplifies cluster deployment – Take advantage of big data analytics without the need for an on-premise infrastructure.

Scales as your business demands – Keep infrastructure costs in line with the changing needs of the business.

Provides advanced tools to reduce time to value – Gain value from Big SQL, Big Sheets, text analytics and more.

Optimizes performance and enhances security – Experience speed and reliability with a dedicated bare-metal infrastructure.

Offers expertise and best practices – Benefit from a dedicated cloud operations team that deploys clusters based on best practices.

Thus Hadoop-based platforms are being steadily taken to cloud environments in order to deliver big data analytics with nimbleness and suppleness.

Real-time Analytics Platforms in IBM SoftLayer Cloud

Not only big data analytics but also real-time analytics on fast and streaming data is also comfortably accomplished in cloud environments. In this section, we would like to explain how a couple of platforms that were methodically modernized and migrated to IBM SoftLayer cloud center in order to understand the concerns, challenges and changes associated with cloud-based real-time analytics.

Delivering Real-time Applications via SoftLayer Cloud-based VoltDB - Now with the data getting generated and captured is growing into unprecedented volumes, the traditional data analytics platforms and infrastructures are bound to face a variety of constraints. That means we need robust and resilient algorithms and IT solutions for big and fast data. Several product vendors, having realized the brewing challenges, are proactively bringing forth a bevy of big data analytics systems that facilitate the smooth transition of captured and consolidated data to information and to knowledge methodically.

Data virtualization, databases, warehouses, data marts and cubes, business intelligence (BI) and visualization solutions are very critical for powering up the goals of knowledge extraction and engineering to realize a growing family of smarter systems and services for fulfilling the ingenious ideas and ideals of the smarter planet vision. VoltDB is a high performance and scalable relational database management system (RDBMS) for big data, high-velocity OLTP and real-time analytics. VoltDB, being proclaimed as a kind of NewSQL database, is a blazingly fast DB designed to run on modern scale-out computing infrastructures. Unlike legacy RDBMS products and NoSQL data stores, VoltDB enables high-velocity applications without requiring complex and costly sharding layers or compromising transactional data integrity (ACID) to gain performance and scale: VoltDB provides

Database throughput reaching millions of operations per second

On demand scaling

Page 6: Analytics as a Service in SL

High availability, fault tolerance and database durability

Real-time data analytics VoltDB is deployed in SoftLayer Cloud in order to showcase its real-time and real-world capabilities of producing actionable insights.

Apache Storm on IBM SoftLayer Cloud for Real-time Analytics

Not only the data size and structure but also the data speed matters much these days. There are specific use cases across industry verticals emerging insisting fast data analytics. Data are being massaged, encapsulated and delivered as messages. Data and event messages are emerging as the formalized building-block to be received, opened up, parsed, and used for a variety of deeper and decisive analysis. There are data streams (multimedia) and events from newer data sources such as sensors, machines, operational systems, platforms, etc. and they need to be systematically captured and analyzed immediately in order to extract both tactic and strategically sound insights to empower decision-makers and even systems to ponder about the next course of actions with all the confidence and clarity. While clouds are being positioned as the core and optimized IT infrastructure, there are several open source as well as commercial-grade platforms for accomplishing and automating the process of real-time and streaming analytics and its associated tasks.

Apache Storm is one such real-time analytics platform, is a free and open source distributed real-time computation system. Storm makes it easy to reliably process unbounded streams of data, doing for real-time processing what Hadoop did for batch processing. Storm is simple, can be used with any programming language. Storm has many use cases: real-time analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate. Storm integrates with the queuing and database technologies you already use. A Storm topology consumes streams of data and processes those streams in arbitrarily complex ways, repartitioning the streams between each stage of the computation however needed. We have deployed an instance of Apache Storm in IBM SoftLayer cloud and chosen a small use case in order to understand and enunciate how cloud-based Storm functions and delivers its originally envisaged goals.

High-Performance Big Data Analytics in SoftLayer Cloud

Everyone agrees that the high-performance characteristic is being insisted everywhere these days. There are valid concerns expressed in different quarters that cloud environments do not guarantee high performance. Therefore hosting high-performing platforms on clouds is being touted as one of the viable mechanisms in order to ensure high-performance of cloud-hosted services and workloads.

Big data analytics (BDA) is emerging as a data-intensive activity mandating high-end IT infrastructures and integrated platforms to simplify and streamline the tasks typically associated with any data analytics. There are several viable options these days ranging from mainframes, clusters, grids, appliances, to super computers to accelerate and accomplish data analytics efficiently. Hadoop platforms are the most sought-after for enabling cost-effective analysis of multi-structured data mountains. In short, high-performance computing (HPC) is the most appropriate computing model in association with to approach the infrastructural challenges thrown by BDA. In this paper, we have described how the Netezza software solution can be systematically moved to IBM SoftLayer Cloud, the leading public cloud offering, configured there, and used for accomplishing next-generation real-time analytics in a low total cost of ownership (TCP) and high return on investment (RoI). In our PoC-induced asset document, we have given all the right and relevant details of a sample application in order to accentuate the power of cloud-based Netezza in fulfilling the various requirements of high-performance data analytics.

Page 7: Analytics as a Service in SL

Streaming Analytics in IBM SoftLayer Cloud

Stream Computing continuously integrates and analyzes data in motion to deliver real-time analytics. It further enables organizations to detect insights (risks and opportunities) in high velocity data which can only be detected and acted on at a moment’s notice. High velocity flows of data from real-time sources such as market data, machines, smartphones, sensors and actuators, clickstreams, and even transactions, remain largely un-navigated. IBM Cloud Analytics Application Services delivers high performance clusters for running enterprise-grade big data and analytics workloads on a dedicated bare metal infrastructure and pre-installed with industry-leading big data software. Real-time analytic processing. Store less, analyze more, and make better decisions faster. IBM InfoSphere Streams is the Supported Software for this Cloud Analytics service. IBM InfoSphere Streams is an advanced analytic platform that allows user-developed applications to quickly ingest, analyze and correlate information as it arrives from thousands of real-time sources. The solution can handle very high data throughput rates, up to millions of events or messages per second.

Many organizations need to process a large amount of data in real-time for real-time analytics, real-time ETL or to respond to events instantaneously. Analyzing on the fly of big data streams is emerging as a distinct need for many industry verticals these days. We have deployed DataTorrent in IBM SoftLayer Cloud and verified how it delivers on its promises for big data streaming analytics. DataTorrent is an enterprise-grade software platform that enables businesses to perform any sort of data processing or transformations on structured or unstructured data, all in real-time as the data is getting streamed into a data center. Leveraging Hadoop 2.0, DataTorrent is a YARN-native application platform. It can be installed directly onto an existing Hadoop cluster, connect directly to all in-coming data sources live, and perform any type of processing or transformation of your data in-memory, as it comes streaming in. DataTorrent will handle all of the scaling and fault tolerance of the system, leaving enterprises to focus on just their business logic.

DataTorrent supports today’s most demanding, mission-critical, big-data streaming applications. It enables you to quickly develop applications that ingest massive amounts of data from various sources in real-time, and perform highly scalable computations in real-time. With DataTorrent, you can leverage your existing Hadoop environment for real-time stream processing. We employed a sample application in order to erudite the readers on how cloud-based real-time analytics applications can be implemented in a streamlined manner.

End-to-end Big Data Analytics Platform in IBM SoftLayer Cloud

In general, Hadoop platforms do pre-processing, processing and analytics for knowledge discovery. But an end-to-end big data analytics platform involves data collection, virtualization, ingestion, analytics and visualization modules. With just a single click, everything gets accomplished quickly and securely. Datameer is one such platform

Datameer is an end-to-end big data analytics platform purpose-built for Hadoop that enables the fastest time from raw data to new insights. The mission is to eliminate the complexity of the tasks associated with big data analytics and empower everyone to make data-driven decisions in minutes, not in months. There is no need of a data scientist or multiple, technical tools to model, integrate, cleanse, prepare, analyze and visualize your data. Datameer is the one-stop-shop for getting all your data into Hadoop, analyzing that data, discovering the knowledge and visualizing the insights squeezed in a preferred form and format. Datameer can handle all kinds of data from multiple sources as illustrated in the picture below. Datameer has been successfully installed in IBM SoftLayer cloud environment and tested with a sample application in order to demonstrate its unique capability.

Page 8: Analytics as a Service in SL

HBase, a NoSQL Database in IBM SoftLayer Cloud

HBase is a column-oriented database management system that runs on top of Hadoop distributed file system (HDFS). HBase is a NoSQL database, is well suited for sparse data sets, and does not support a structured query language like SQL. An HBase system comprises a set of tables and each table must have an element defined as a Primary Key, and all access attempts to HBase tables must use this Primary Key. An HBase column represents an attribute of an object and allows for many attributes to be grouped together into what are known as column families. With HBase, you must predefine the table schema and specify the column families. However, it’s very flexible in that new columns can be added to families at any time, making the schema flexible and therefore able to adapt to changing application requirements.

HBase is a part and parcel of every standard Hadoop distribution and was installed in IBM SoftLayer Cloud. There are certain usage scenarios wherein big data analytics (BDA) is well-accomplished with the help of cloud-based HBase database. We could indulge in developing a small application to test how HBase is productive in faraway clouds.

Page 9: Analytics as a Service in SL

There are several other competent and high-end NoSQL databases in the marketplace. Facebook Cassandra, Google BigTable, etc. are some of the highly popular database management systems getting into cloud environments in order to tackle the data explosion, data variety, viscosity, and variability.

The Apache Cassandra database is the correct choice when you need scalability and high availability without compromising performance. Linear scalability and proven fault-tolerance on commodity hardware or cloud infrastructure make it the perfect platform for mission-critical data. Cassandra's support for replicating across multiple datacenters is best-in-class, providing lower latency for your users and the peace of mind of knowing that you can survive regional outages.

Cassandra's data model offers the convenience of column indexes with the performance of log-structured updates, strong support for denormalization and materialized views, and powerful built-in caching. This is also deployed in IBM SoftLayer Cloud. Basho Riak is another NoSQL database made available in SoftLayer cloud. Similarly other renowned databases such as MongoDB are also being taken to cloud to reap its infrastructural innovations and inventions.

ScaleBase Distributed Database Management System

ScaleBase brings in elasticity, scalability and continuous high availability to MySQL databases and applications in public, private and hybrid cloud environments. ScaleBase enables instant and transparent MySQL scale out, leveraging the power of smaller, less expensive servers working together. The policy-based data distribution (automated sharding), powered by the ScaleBase Analysis Genie and the intelligent load balancing with replication-aware read/write splitting enable growth of the operational load and throughput, increase of application performance and protect from varying usage peaks and load spikes.

ScaleBase automated failover and failback ensure business continuity and protection from both unexpected and expected outages, as well as simplify different ongoing maintenance tasks, such as software and hardware upgrades, without impacting the application or database availability. The ability to migrate an application from a hosted environment with a single growing database to a virtualized environment with smaller, more manageable data nodes gives companies agility, flexibility and competitiveness. ScaleBase was purpose built for cloud deployment. ScaleBase can be run on private clouds and is available on public clouds. We have done the initial formalities in order to prepare and migrate the ScaleBase solution to the IBM SoftLayer public cloud, made the necessary configuration changes, and performed a small sample application in order to run and check how ScaleBase functions in an online, off-premise and on-demand cloud environment. This forms a major part of our strategy of empowering public cloud offerings to be high-performing, elastic, and exotic for data and process-intensive applications

AeroSpike In-Memory NoSQL Database in IBM SoftLayer Cloud Versatile in-memory computing, NoSQL and NewSQL databases, parallel file systems, etc. are the prominent IT solutions to be enabled to be hosted and run in elastic clouds elegantly for fulfilling the varying needs of the big data world. Aerospike is an open-source distributed NoSQL database optimized for in-memory and SSD-based indexing and data storage. Aerospike is a modern database built from the ground up to push the limits of flash storage, processors and networks. It was designed to operate with predictable low latency at high throughput with uncompromising reliability – both high availability and ACID guarantees. It greatly simplifies developers’ workloads substantially as there is no need to incorporate the logic for sharding and for cluster changes. The perpetual needs of no worrying about data loss or downtime get realized with this game-changing database solution. Aerospike is ideal for real-time big data or context driven applications that must sense and respond right now. Aerospike operates at in-memory speed and global scale with enterprise-grade reliability. Identical Aerospike servers scale out to form a shared-nothing cluster which transparently partitions data and

Page 10: Analytics as a Service in SL

parallelizes processing across nodes. Nodes in the cluster are identical, you can start with 2 and just add more hardware. The cluster scales linearly.

We have migrated an instance of Aerospike database to IBM SoftLayer cloud environment and configured to deliver on its promises. We have worked on a sample application in order to gain a deeper understanding of the distinct capabilities of Aerospike in sufficiently meeting the goals of new-generation data-intensive workloads.

NewSQL Databases in IBM SoftLayer Cloud

Essentially, NewSQL combines the best features from both worlds – maintaining the transactional integrity of traditional database systems while providing high-end scalable performance of NoSQL systems. This combination of performance and scale is crucial in transaction-intensive environments. NoSQL-based data systems are riding a seismic wave of success with the promise of scalability. NewSQL databases seek to overtake NoSQL with the added bonus of high-speed transactional integrity.

VoltDB is a NewSQL database and is successfully deployed in IBM SoftLayer Cloud and is subjected to a variety of small-scale tests in order to verify whether it is capable of fulfilling of its ordained capabilities. There are other popular NewSQL databases such as Clustrix, NuoDB, etc. getting a greater market and mind shares fast. These are conveniently hosted and delivered as a service via cloud environments.

Database as a Service (DBaaS)

Today’s applications are expected to manage a variety of structured and unstructured data, accessed by massive networks of users, devices, and business locations, or even sensors, vehicles and Internet-enabled goods. Companies of all sizes, from startups to mega-users like Samsung, Hothead Games, and Fidelity Investments use Cloudant to manage data for large or fast- growing web and mobile applications in ecommerce, on-line education, gaming, financial services, and other industries.

Cloudant is best suited for applications that need a database to handle a massively concurrent mix of low-latency reads and writes. Its data replication & synchronization technology also enables continuous data availability, as well as off-line application usage for mobile or remote users. In a large organization, it can take several weeks for a DBMS instance to be provisioned for a new development project, which limits innovation and agility. DBaaS enables instant provisioning of your data layer, so that you can begin new development whenever you need.

Unlike Do-It-Yourself (DIY) databases, DBaaS solutions like Cloudant provide—and guarantee—a specific level of data layer performance and up time. This eliminates risk of service delivery failure for you and your project. The Cloudant database as a service (DBaaS) is the first data management platform to leverage the availability, elasticity, and reach of the cloud to create a global data delivery network (DDN) that enables applications to scale larger and remain available to users wherever they are.

Data Warehouse as a Service (DWaaS)

IBM dashDB is a fully managed data warehousing service in the cloud. IBM dashDB is a powerful, agile data warehousing solution on the cloud that puts an analytics powerhouse at your fingertips. IBM dashDB allows

you to break free from the bonds of infrastructure when your business demands it. IBM dashDB can help extend your existing infrastructure into the cloud, or help you start new data warehousing self-service capabilities. It is powered by high performance in-memory and in-database technology that delivers answers as fast as you can think. IBM dashDB provides the simplicity of an appliance with the elasticity

Page 11: Analytics as a Service in SL

and agility of the cloud for any size organization. IBM dashDB is designed to meet your expectations of enterprise security. You can gain instant access to critical business insights without the hefty upfront infrastructure investment. Simply you can load, analyze, and visualize your data in minutes. Thus the days of providing data warehouse as a service is straightening and brightening.

IBM Watson Analytics in SoftLayer Cloud

As most of us know that Watson Analytics is a natural language-based cognitive service that can provide instant access to predictive and visual analytic tools for businesses. It is designed to make advanced and predictive analytics easy to acquire and use for anyone. Watson Analytics offers self-service analytics, including access to easy-to-use data refinement and data warehousing services that make it easier for business users to acquire and prepare data, beyond the simple spreadsheets for analysis and visualization. IBM Watson Analytics automates steps like data preparation, predictive analysis, and visual storytelling for business professionals across data intensive disciplines like marketing, sales, operations, finance and human resources. SoftLayer is integrating the latest IBM power systems into their cloud infrastructure in order to fulfill the infrastructural needs for cost-effective high-performance computing. IBM Watson system is to run efficiently on IBM power systems and hence the days of Watson Analytics as a service via the SoftLayer cloud for worldwide users is to see the light sooner.

Containerized Analytics as a Service in IBM SoftLayer Cloud

The concept of containerization for stuffing and sandboxing mission-critical applications is catching the attention of developers as well as system administrators. Bundling every kind of software module along with its binaries, libraries, configuration details and other dependencies together into a single package is one grand way out for the faster and error-free deployment and delivery of software workloads. This pragmatic idea has penetrated further up and thereby these days, all kinds of mobile, cloud, social, embedded, middleware, database, enterprise and IoT applications are methodically being containerized using the sandbox aspect (a subtle and smart isolation technique) to eliminate the restricting dependencies on underlying operating systems. Such comprehensive and compact sandboxed and contained applications are being prescribed as a most sought-after and appropriate solution for achieving portability, extensibility, manoeuvrability, sustainability and security needs.

With the faster maturity of the Docker technology, there is a new paradigm of “containers as a service (CaaS)” emerging and evolving. That is, containers are being readied, hosted and delivered as a service over the public Web. All the necessary procedures to deliver application-aware containers as a service are being meticulously enacted on containers to make them ready for the forthcoming service era. That is, knowledge-filled, service-oriented, cloud-based, composable, and cognitive containers are being proclaimed as one of the principal ingredients for the establishment and sustenance of the smarter planet vision. Precisely speaking, applications are containerized and exposed as services to be discovered and used by a variety of consumers for a growing set of use cases. Big and fast data analytics via Hadoop and Apache Storm, Spark, etc. are fast maturing and stabilizing. VMs are widely being used for enabling Hadoop as a service. Now with the faster adoption of containerization, the prospects for data analytics via portable, substitutable, composable, and replaceable containers that are very famous for faster provisioning, live-in migration, etc. In short, containers are destined for cloud environments.

The integration of Hadoop YARN with Docker will allow multiple clusters to utilize the same hardware resources. We have made YARN containers through the Dockerization steps and hosted the YARN containers in IBM SoftLayer Cloud. We have do a sample work in order to understand how containerized big data workloads and analytical platforms ensures higher efficiency and thereby the new offering of containerized analytics as a service via the SoftLayer cloud seems imminent.

Page 12: Analytics as a Service in SL

Conclusion

Data has become a strategic asset for any organization these days to precisely plan ahead and proceed with utmost confidence and clarity. Data-driven enterprises are being pronounced as the one ordained for the continued success sagaciously overcoming all kinds of unexpected business challenges and changes. That is, any enterprising endeavor subjecting all of its data gleaned from different and distributed sources systematically to a series of IT-enabled deeper analytics processes with the help of end-to-end platforms for extracting actionable insights is bound to attain and retain a greater success in its long and arduous journey. With the steady increase in the data sources, it becomes clear for organizations to strengthen their capabilities in order to capture all the data emanating from different and distributed systems, subject them to a series of deeper and decisive investigations to extract actionable insights in time, and disseminate the extracted and extrapolated to the concerned to enable them to consider the correct course of actions to steer the organizations in its anointed journey. In this white paper, we have explained how IBM SoftLayer can take care of everything to squeeze out actionable insights out of your big and real-time data. The concept of cloud represents the extremely optimized and organized IT to succulently enable every kind of IT capabilities and competencies to be provided as a service via the open, public and cheap Internet infrastructure to the increasingly connected world.

Authors

Pethuru Raj & Skylab Vanga IBM Global CAMS Center of Excellence IBM India, Manyata Tech Park, Bangalore E-mails: [email protected], [email protected]