Post on 07-Aug-2018
8/21/2019 5174affa160bd Cloud Computing Big Data
1/4
Page 1
Cloud computing has been generating considerable hype these
days. Every participant in the datacenter and IT ecosystem hasbeen rolling out cloud initiatives and strategies from hardware
vendors, ISVs, SaaS providers, and Web 2.0 companies - start-
ups and incumbents are equally active.
Cloud computing promises to transform IT infrastructure anddeliver scalability, flexibility, and efficiency, as well as new
services and applications that were previously unthinkable.
Despite all of this activity, cloud computing remains as
amorphous today as its name suggests. However, one criticaltrend shines through the cloud Big Data. Indeed, its the core
driver in cloud computing and will define the future of IT.
BIG DATA THE PERFECT STORM
Cloud computing has been driven fundamentally by the need toprocess an exploding quantity of data. Data is no longer measured
in gigabytes but in exabytes as we are Approaching the
ZettaByte Era.1 Moreover, data types structured, semi-
structured, or unstructured continue to proliferate at analarming rate as more information is digitized, from family
pictures to historical documents to genome mapping to financial
transactions to utility metering. The list is truly unbounded. Buttoday, data is not only being generated by users and applications.
It is increasingly being machine-generated, and such data is
exponentially leading the charge in the Big Data world. In a
recent article, The Economist called this phenomenon the Data
Deluge (http://www.economist.com/opinion/displaystory.cfm?
story_id=15579717).
One can argue that Web 2.0 companies have been pushing the
upper bounds of large-scale data processing more than anyone.That being said, this data explosion is not sparing any vertical
industries financial, health care, biotech, advertising, energy,
telecom, etc. All are grappling with this perfect storm. Below are
just a few stats:
Google was processing two years ago more than 400PB of
data/month in just one application
The New York Times is processing an 11-million-story
archive dating back to 1851
eBay processes more than 50TB/day in its data warehouse
CERN is processing 2GB/second for their most recent
particle accelerator
Facebook crunches 15TB/day into a 2.5PB data warehouse
Without question, data represents the competitive advantage of
any enterprise, and every organization is now encumbered withthe task of storing, managing, analyzing, and extracting value
CLOUD COMPUTING: BIG DATA IS THE FUTURE OF IT
Winter 2009 | Ping Li | ping@accel.com
from this exponential data growth as inexpensively as
possible.
Previous computing platform transitions had technology
dislocations similar to cloud computing but along different
dimensions. The shift from mainframe to client-server wasfueled by disruptive innovation in computing horsepower that
enabled distributed microprocessing environments. The
following shift to web applications/web services during the lastdecade was enabled by the open networking of applications and
services through the internet buildout. While cloud computing
will leverage these prior waves of technology computing andnetworking it will also embrace deep innovations in storage/
data management to tackle big data.
Along these lines, many of the early uses of cloud computing
have been focused less on computing and more on storage.For example, a significant portion of the initial applications on
AWS were primarily leveraging just S3 with applications
executing behind the firewall. Popular storage applications, like
Jungle Diskand Smug Mug,were early AWS customers. Thisexplosion of data has driven enterprises (and consumers for
that matter) to find cheap, on-demand storage in unlimited
quantities which cloud storage promises to deliver. Untilnow, massive tape archives in the middle of nowhere (like Iron
Mountain) have been the only means to achieve that cheapstorage. However, enterprises today need more; they need
quick access data retrieval for multiple reasons, fromcompliance to business analytics. It is simply no longer
sufficient to have cold data; rather, it needs to be online and
resilient (and cheap, of course); hence, the accelerating shift
towards storing every piece of data in memory or on disks
(Data Domainsmartly rode this trend).
The need to balance data availability/usability and costeffectiveness has prompted significant innovation in both on-
premise and hosted cloud storage cloud storage systems
(Caringo, EMC Atmos, and ParaScale, to name just a few),
flash-based storage systems (Fusion IO, Nimble Storage,Pliant, etc.) are just some current examples. Furthermore,
hierarchical storage management (HSM, which has always
sounded great but has been implemented only rarely) will
become an important element in storage workflows.Enterprises will require seamless capability to move data
across different tiers of storage (both on-premise and into the
cloud) based on policy and data type to maximize retrievalcosts. As cloud computing matures, true cloud applications will
be (re)written to leverage hierarchical and cloud-like storage
tiers to retrieve data dynamically from different storage layers.
1 Source: Approaching the Zettabyte Era. Cisco, 16 June 2008.
http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.htmlhttp://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://aws.amazon.com/http://www.jungledisk.com/http://www.smugmug.com/photos/best-photo-sharing/http://www.smugmug.com/photos/best-photo-sharing/http://www.ironmountain.com/http://www.ironmountain.com/http://www.ironmountain.com/http://www.ironmountain.com/http://www.datadomain.com/http://www.caringo.com/http://www.emc.com/products/detail/software/atmos.htmhttp://www.emc.com/products/detail/software/atmos.htmhttp://www.parascale.com/http://www.fusionio.com/http://www.fusionio.com/http://www.nimblestorage.com/http://www.plianttechnology.com/http://www.plianttechnology.com/http://www.nimblestorage.com/http://www.fusionio.com/http://www.parascale.com/http://www.emc.com/products/detail/software/atmos.htmhttp://www.caringo.com/http://www.datadomain.com/http://www.ironmountain.com/http://www.smugmug.com/photos/best-photo-sharing/http://www.jungledisk.com/http://aws.amazon.com/http://www.economist.com/opinion/displaystory.cfm?story_id=15579717http://www.cisco.com/en/US/solutions/collateral/ns341/ns525/ns537/ns705/ns827/white_paper_c11-481374_ns827_Networking_Solutions_White_Paper.html8/21/2019 5174affa160bd Cloud Computing Big Data
2/4
Page 2
A NEW CLOUD STACK
In order for cloud computing to become a mainstream approach,a new cloud stack (like mainframe and OSI) will likely emerge.
Just like prior computing platform transitions (client/server, web
services, etc.), core platform capabilities, such as security, access
control, application management, virtualization, systems
management, provisioning, availability, etc. will be a prerequisite
before IT organizations are able to adopt the cloud completely.
Clearly, this stack will exist in a different representation than
prior platform layers to embrace a cloud environment. Simply
replicating the current computing stack but allowing it to resideoff-premise will not achieve the scale, capabilities, and
economies of cloud computing. In particular, this new cloud
framework needs the ability to process data in increasingly
greater orders of magnitude and do it at a fraction of the cost
by leveraging commodity, multi-threaded servers for storage andcomputing. In many ways, this cloud stack has been implemented
already, albeit in a primitive form, at large-scale internet
datacenters.The challenge of processing terabytes of data daily at Google,
Facebook, and Amazon drove them to adopt a new data
architecture, which is essentially Martian to traditional enterprise
datacenter architects. No longer are ACID and relationaldatabases back-ending transactional applications. Internet
datacenters quickly encountered the scaling limitations of SQL
databases as the volume of data exploded. Instead, high-performance, scalable/distributed non-SQL data stores are being
developed internally and implemented at scale. Big Table and
Cassandraare among the many variants, and this non-database
database trend has proliferated to the point of having its own
conference: NoSQL. Database caching layers (i.e., Northscales
Memcached) are also being implemented to further driveapplication performance, and its now accepted as a standard
tier in datacenters.
Managing non-transactional data has become even more
daunting. From log files to click stream data to web indexing,
internet data centers are collecting massive volumes of data that
need to be processed cheaply in order to drive monetization
value. Hadoop is an open source data management frameworkthat has become widely deployed for massive parallel
computation and distributed file systems in a cloud environment.
Hadoop has allowed the largest web properties (Yahoo!,LinkedIn, Facebook, etc.) to store and analyze any data in near
real-time at a fraction of the cost that traditional data
management and data warehouse approaches could evencontemplate. Although the framework has roots in internet
datacenters, Hadoop is quickly penetrating broader enterprise use
cases. The diverse set of participants at Hadoop World NYC
hosted by Clouderaclearly points to this trend.
SECURING THE CLOUD
Given this data intensive nature, any widely adopted cloud
computing platform will inevitably account for richer security
requirements. The security challenges will be focused less on
point network and data level security, although high bandwidthencryption solutions and sophisticated key management will be
needed to match the massively parallel computational cloud
environments. In this case, the primary security challenges will
stem from control. User authentication will become
increasingly challenging as applications are federated outsidethe firewall because of SaaS adoption. In addition, managing
and reconciling user identities across individual user directories
for each SaaS/Cloud application will present further security
issues. Much like web applications in the 90s created an SSOlayer, cloud computing is essentially abstracting a web services
interface for infrastructure IT, and it will demand a similar
unified authentication/entitlement layer.
In addition to federated user authentication, cloud computingwill also require data authentication and security. Impervas
database firewall is an example of an increasingly important
cloud security product. As applications reside in differentpublic and private clouds, it will be critical for the cloud
applications to be able to talk to each other. This will drive
the need for ensuring data authentication and policy control forthe volumes of data flowing between cloud applications.
Moreover, given the multi-tenancy paradigm of cloud
environments, policy granularity will be paramount to ensure
security and compliance. Data integration across cloud
platforms will be more of an obstacle than application
integration, as applications have become more open/standard.Standard data APIs will emerge as part of the new cloud
stack to allow disparate environments to talk to each other andavoid vendor lock-in. Data migration challenges are perhaps
the greatest factor today for locking users to a particular cloud
platform.
Over time, these APIs and layers will harden and will become
tailored, depending on use case and workload for particularapplications. The adoption of these new frameworks will
ultimately make cloud computing safe and broaden its
penetration into enterprises of all sizes.
WHATS BREWING IN A CLOUD?
Despite constant comparisons to grid and utility computing,
cloud computing has the potential to address a much broader
set of applications and use cases beyond the limited HPC
environments served traditionally by grid computing. This
breadth of cloud computing is engendered in a new set ofunderlying technology forces. Virtualization technologies,
high-powered commodity servers, low-cost/high bandwidthconnectivity, concurrent/multi-threaded programming models
and open source software stacks are all technology building
blocks that can deliver the high performance and scalability of
grid/utility computing, but importantly and concurrently
with underlying commodity resources.
These technology drivers enable applications and users to be
abstracted cleanly from particular IT infrastructure resources
(computing, storage, networking, etc.) in new and powerful
ways; i.e., location agnostic and multi-tenancy are two critical
http://labs.google.com/papers/bigtable.htmlhttp://incubator.apache.org/cassandra/http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.htmlhttp://www.northscale.com/http://www.cloudera.com/hadoophttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoop-world-nychttp://www.imperva.com/index.htmlhttp://www.imperva.com/index.htmlhttp://www.cloudera.com/hadoop-world-nychttp://www.cloudera.com/hadoophttp://www.northscale.com/http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.htmlhttp://incubator.apache.org/cassandra/http://labs.google.com/papers/bigtable.html8/21/2019 5174affa160bd Cloud Computing Big Data
3/4
Page 3
elements among others. Unlike traditional HPC grid
environments, which were designed for a specific application in a
single company, cloud computing enables disparate applications
and entities to harness a shared pool of resources. In addition,
applications can be broken up in the cloud where computingresources may reside on the client while the data is accessed
portably from multiple cloud locations (as an example).
Many different definitions of cloud computing have surfaced.
Rather than posit yet another, several characteristics are residentin any cloud instance: (i) self-provisioned (either by user,
developer, or IT); (ii) elasticity (on-demand allocation of any
computing, storage and networking resources); (iii) multi-
anything (multi-user, multi-application, multi-session, etc.);and (iv) portability (applications are abstracted from physical
infrastructure and can be migrated easily). These capabilities
allow enterprise to shift IT resources from capex to opex ausage based model that is particularly appealing during recent
economic constraints.
These cloud prerequisites will yield a powerful a set of use cases
beyond grid computing that are unique to cloud platforms. Cloudcomputing will reach its full potential in the future when a whole
new set of applications (never possible before) is created that is
purpose-built for the cloud. For example, one can envision
powerful collaboration applications emerging that enable internalenterprise and external users to seamlessly cooperate that would
have been previously impossible with users and data isolated on
disparate enterprise islands. Its likely these innovative
applications will require new programming models and
potentially languages yet to be hardened.
STILL IN THE EARLY DAYS
Despite the high energy surrounding cloud computing and earlycloud offering successes, such as Amazon Web Services, cloudcomputing for enterprise services is definitely still in its
formative stages. In contrast, however, consumers have already
adopted cloud computing technologies. One could argue that web
companies like Google, Yahoo!, Facebook, and Salesforce are
examples of consumers leveraging cloud computing. These Web2.0/SaaS offerings clearly exhibit the core cloud characteristics
outlined above, and in turn are delivering new, value-added
services previously considered unthinkable. Interestingly, thistime the consumers, via their use of Web 2.0 services, have been
teaching the typically early technology adopter enterprises the
effectiveness of cloud computing.
Today, the enterprise use of cloud computing represents opposite
ends of the spectrum: (i) Web 2.0 start-ups seeking to launchapplications quickly and cheaply, and (ii) compute intensive
enterprises that need batch processing for bursty, large-scale
applications. Although these users are driving the early adoptionof cloud technology, its unlikely these limited use cases will
establish cloud computing as a pervasive platform. Cloud
computing instead will need to penetrate mainstream IT
infrastructure slowly and offer a broader set enterprise
applications.
It is important to note here that these Web 2.0 start-ups represent
a powerful trend in the role of developers in driving cloudcomputing adoptions. Many early users of cloud computing are
examples of developers launching applications without
requiring the involvement of IT (in the case of a Web 2.0 start-
up, they dont have an IT department). Increasingly,
empowering developers and line of business owners toinnovate and deploy new applications without the shackles of
IT will be a motivating driver for cloud adoption. No longer do
users need to have ITs blessing and time to get their job done.
This developer-centric nature was a primary motivator ofVMwares strategic acquisition of SpringSource.In addition to
inheriting significant Java technology, VMware now has adistinct opportunity to transition SpringSources dominant Java
developer mindshare to develop onto VMwares private cloud
platform. Amazon Web Services has experienced tremendous
success from its developer-centric platform APIs. Unlike
traditional hosting providers that cater to IT/operations,Amazon went after developers first and has only recently
begun to add the functionality that will appeal to broader
enterprise IT.
Within enterprises, there are early signs of developers (Q&A
environments, batch processing, and developer prototyping)
and line of business/departmental leveraging cloud computing.It is not uncommon for new platform technologies to start at
the fringes of IT before mainstream adoption takes place.
Unlike typical three-tier traditional enterprise datacenters, the
internet datacenters of Facebook, Google, etc. were not
encumbered by legacy enterprise stacks, applications, and ITrules; which in turn enabled them to be built from the ground
up with cloud stacks to handle elastically large-scale consumer
transactions for multiple applications. Therefore, andunsurprisingly, Amazons internet datacenters was easily
adapted to become the first and leading public computingprovider. It will certainly take significant time/effort forenterprise IT infrastructure gatekeepers to evolve their current
architectures to embrace a new cloud platform. Luckily,
enterprises can reap the technology innovation from internet
data centers (many which are open source) to accelerate this
transition.
MORE THAN ONE FLAVOR
There have been analogies drawn between cloud computing
and public utilities (electric, gas, etc.) where the value is all
about economies of scale. According to this hypothesis, the
world will only have a few cloud providers that reachmaximum efficient scale. It is quite unlikely that this will
happen. Multiple cloud models will emerge depending on the
user, the workload, and the application. For example, certain
developers will prefer to interface with a cloud provider at ahigher level of abstraction, such as Google App Engine, as
opposed to a more bare metal API, such as Rackspace.
Alternatively, an application may choose to run on MSFTAzure to leverage SQL/MSFT services or Salesforce Force for
CRM integration and distribution advantages. Today, one can
break cloud platforms into roughly two camps: developer-centric (Amazon, MSFT) and IT-centric (EMC, VMware).
http://gigaom.com/2009/08/10/vmware-to-buy-springsource-for-420m/http://gigaom.com/2009/08/10/vmware-to-buy-springsource-for-420m/http://gigaom.com/2009/08/10/vmware-to-buy-springsource-for-420m/8/21/2019 5174affa160bd Cloud Computing Big Data
4/4
Page 4
Cloud platforms will remain distinct and diverse as long as they
continue to deliver unique value-add for their particular use cases
and users.
To drive this cloud diversity point further, the concept of a cloud
within a cloud is also emerging where distinct services,
such as data warehousing, can be built atop a more generic cloud
platform to provide a higher layer cloud service.
In addition, private clouds behind the firewalls present yet
another flavor of cloud computing as enterprises leverage thebenefits of cloud frameworks while maintaining security/control
as well as the compliance of their internal datacenters. Lastly,
hybrid clouds that bridge private and public clouds on a
permanent and temporary basis (also known as cloud bursting)will come to fruition for certain applications or as a migration
path for enterprises. Several start-ups (Cirtas, CloudSwitch and
Zetta among them) are building products that make the cloudsafe for enterprises. Innovation will abound to solve the
specific issues in all of these various cloud environments.
LOOKING AHEAD
To further parse all this, I hosted a cloud computing panel with an
esteemed group of technology thought leaders at Accels 15th
Stanford Technology Symposium. Needless to say, thesepanelists had plenty of deep insights, opinions, and predictions
about cloud computing.
The panel brought together technologists who view cloud
computing from distinctly different lenses: private cloudinnovators, public cloud providers, cloud enabling technology
solutions and cloud infrastructure applications. In wrapping up
the panel session, I asked each speaker to conjure up a singleprediction for cloud computing in the next few years. Heres what
the experts said:
Jonathan Bryce, CTO/Founder, Mosso (Rackspace): I think
cloud computing is going to be a mindshift; its going to take a
while. But I think an economy like this is actually a hugeopportunity for entrepreneursI think this is a time when
resources are scarce thats when great businesses end up getting
built. And I think part of whats going to enable some of thosebusinesses is cloud computing, and being able to get started with
a lower varied entry, lower price point, all of those kind of
things
Mike Olson, CEO/Co-founder, Cloudera: I think that a lot ofwhats been said around here about data is really right on. I
predict that in the next 10 years, computer science as computerscience isnt really going to be the place that smart young guys
are going to find tremendously rewarding careers. I think that the
application of these new compute systems to large data in the
sciences will advance human kind substantially. I think thatscience will be done maybe not even in the lab on the wet bench
anymore, but with data, with computer systems looking at vast
amounts of data.
Raghu Ramakrishnan, Chief Scientist for Audience and Research
Fellow, Yahoo! Research: So a lot of the companies that areout there today Yahoo!, Facebook, Google theyre all
exposing data APIs. Imagine whats going to happen once
large clouds are routinely available to build theyre own
application and you start aggregating your own data, and you
have the opportunity to fuse that with all the data thats outthere. Someones going to figure out the next big thing, by
taking 2 + 2 and coming up with 20.
Mike Schroepfer, VP Engineering, Facebook: one of the
things that is going to happen is that people are going to figure
out that we need a more blended workload between the cloudand the client. Weve been operating kind of in the cycle of
reincarnation and computer science, moved toward most of the
computing happening in the cloud, and my browser effectively
being its own terminal. You know, in the last 2 or 3 years, thespeed and capability of browsers has been outpacing that of
most chips. Youre seeing 2x to 4x improvements in core
performance on the engines and VMs in those browsers year on
year, which is way outpacing the speed of chip designSo I
believe that there will be a couple of people who will figure outways to blend computation and storage on the client, more
gracefully with that on the server, but still provide you with all
of the benefits of basically access to my data anywhere I need,
and the kind of reliability of the cloud.
Jayshree Ullal, President and CEO, Arista Networks: Well,
theres a technology impact but I actually think its going to
really make CIOs rethink their jobs. Today, you can have a
server administrator, an application administrator, a network
administrator, and theyre all silos but you need your generalpractitioner. And thats really missing right now in the cloud.
So if I had to make a prediction, less on the technology, more
on the operational side, I would say for the deployment of this,
its got to be a generalized IT person, whether thats the CIO or
somebody he or she appoints
Rich Wolski, Professor of Computer Science, University ofCalifornia, Santa Barbara and CTO/Founder, Eucalyptus
Systems: theres another revolution coming thats going to
intersect the cloud revolution and that has to do with data
simulationpretty much everything you own is going to betrying to send you data. And youre going to need, personally,
a great deal of storage and compute capacity to be able to deal
with that. I think the cloud is going to make that revolution that
much quicker to come to us.
These predictions depict cloud computing as still being in itsformative phases, but that it will emerge as fundamental
breakthroughs in datacenter and IT infrastructure in the years to
come. Despite the current macro headwinds, deep innovation,
and market opportunities in cloud computing will persist. Oncethis economic storm passes, Im convinced the sun will shine
through, and cloud computing is sure to have many silver
linings.
Ping Li is a partner at Accel Partners in Palo Alto
and focuses primarily on Information Technology
infrastructure and digital media platforms.
http://www.cirtas.com/http://www.cirtas.com/http://www.cloudswitch.com/http://www.zetta.net/index.phphttp://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.accel.com/events/event.php?event_id=12http://www.zetta.net/index.phphttp://www.cloudswitch.com/http://www.cirtas.com/