Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London...

14
Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20 Sep 2011

Transcript of Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London...

Page 1: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Cloud computing

Hugh Shanahan,Department of Computer Science,Royal Holloway,University of London CCC 2011,

Huazhong Agricultural University, Wuhan

20 Sep 2011

Page 2: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

The amount of Biological data is exploding

• The raw data for a human genome corresponds to 100’s of Gbytes.

• Cost of human genome has fallen from $100 M to ~ $3000 (July 2011)

• Main bottleneck is now reconstructing genome from the data generated.

• Much of the original Dogma is now seen as a simplification

• RNA now seen to play a fundamental role

• miRNA

• How DNA is stored is also crucial

Page 3: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Greater exploration

• 1000’s of genomes being scanned

• Different species

• Cancer genomics

• Methylome

• RNA-seq

• Have not got time to talk about metabolomics/proteomics ...

Page 4: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Caveat to sequence data

• There are many different companies building equipment to perform sequencing.

• They all have their own biases and sources of systematic error.

• The data generated is discrete in nature which tends to make people think it’s accurate.

• It could be as susceptible to systematic biases as microarrays are.

• Interpretation and analysis is the real bottleneck.

Page 5: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

The era of Big Data

• Biological data - Petabytes now, expected to be Exabyte (millions of Terabyte) by 2020.

• High Energy Physics - Large Hadron Collider producing Pbytes of data per year

• Square Kilometre Array (full operation 2024) - one Exabyte a day

• Haven’t even mentioned Google or Bing yet....

Page 6: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Problems - Solutions - Cloud Computing ?

• Data sets this size cannot be moved about on the Internet.

• Data must be analysed, not just retrieved.

• Many people want access to this data, many of whom are

• not computational scientists

• may not have financial resources to buy powerful computers

• may want access to best software, best practices etc.

• data to be updated in a timely fashion

Page 7: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Solutions - Cloud Computing ?

• Cloud computing may be the solution.

• Data centre for cloud co-located with data generation.

• Processing as well as data retrieval done at data generation centre.

Page 8: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Cloud Computing Definition - “If it looks like a duck”

• Features of cloud computing are

• Computing is mostly done at a data centre provided by a vendor

• Client-side computing is minimal

• Servers at data centre make heavy use of virtualisation (as oppose to Grids)

• Client can select number of instances of VM and data usage

• Client pays on a per-use basis - “Somebody’s Credit Card is being used”

• The computing is treated as a utility rather than a resource.

Page 9: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Cloud providers

• Amazon Web Services (AWS)

• Provide Linux or Windows VM

• You get a command line.

• Microsoft - Azure

• More complicated method of submission

• Open Source - Eucalyptus (stability ?)

• Other providers out there ...

Page 10: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Advantages

• Data centre can be where the data is generated and accessed everywhere (in theory).

• Data could be kept up to date.

• Analysis tools could be kept up to date

• Services can be developed which go significantly beyond a simple command line interface (Azure works along these lines).

• Scalability - if you want 1 or 100 VM’s you can get it.

Page 11: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Disadvantages

• At present vendors do not provide tailored environment for Scientific client.

• VM (regardless of OS) is effectively blank canvas and hence have to upload all the right binaries, libraries and data that you need.

• Data may not be in the correct configuration - storage/compute tradeoff.

• Like any utility have to watch use carefully !

• Vendor lock in.

• Security.

• Legal issues - licensing, nationality of vendor and data centre.

Page 12: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Show me the money

• Commercial clouds charge on a per use basis.

• Disk space

• CPU time

• Amazon and Microsoft charge via time VM is deployed

• Google tries to charge per CPU cycle.

• Move from once-off payment model to rolling costs.

Page 13: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Big Data - a new discipline ?

Big Data

Machine Learning /

Pattern Recognition

Hardware

Quality ControlFinance /

Accounting

Page 14: Cloud computing Hugh Shanahan, Department of Computer Science, Royal Holloway, University of London CCC 2011, Huazhong Agricultural University, Wuhan 20.

Conclusions

• Microarray data gives us a first insight into the dynamic cell.

• Sequence and other omic data sets are expanding into Petabytes.

• Big Data is upon us.

• Cloud computing is not a panacea.

• Cloud computing may democratise access to Big Data.