Aspera bt-big-data-cloud

39
Enabling The Big Data Cloud for HPC and Collaboration With High-Speed Data Transport

Transcript of Aspera bt-big-data-cloud

Page 1: Aspera bt-big-data-cloud

Enabling The Big Data Cloud for HPC and Collaboration With High-Speed Data Transport

Page 2: Aspera bt-big-data-cloud

PRESENTER AND AGENDA

PRESENTER

Daniel KumiDirector, New Market Development [email protected] • Who and Why Aspera?

• WAN Transport

• Wireless Transport

• Customer Use Cases

• Cloud and Big Data – Transfer Challenges for HPC and Collaboration

• Aspera On Demand

• BT-Aspera Discussion

AGENDA

Page 3: Aspera bt-big-data-cloud

ASPERA’S MISSION

Creating next-generation transport technologies

that move the world’s digital assets at maximum speed,

regardless of file size, transfer distance and network conditions.

Page 4: Aspera bt-big-data-cloud

Aspera: moving the world’s digital assets at maximum speed

Expanded to Asia PAC and Latin America through direct and channel

50% YOY growth in revenue and employees

Over 10,000 licenses sold, and over 1,500 customers world wide

Patents issued or pending in 32 countries

Continuing to innovate: fasp3™, fasp-MC™, mobile transport, cloud enablement

Page 5: Aspera bt-big-data-cloud

Aspera Ecosystem of Partners

Page 6: Aspera bt-big-data-cloud

Life SciencesLife Sciences

Page 7: Aspera bt-big-data-cloud

BIG DATA TRANSFER CHALLENGE

Page 8: Aspera bt-big-data-cloud

What Happened to my Bandwidth?

1000 Mbps• 170ms RTT• 0.001% packet loss rate Paris

Seattle

WAN Throughput is 1000Mbps

Max TCP Throughput ~29Mbps

Where’s my 970Mbps?

At 29Mbps50GB transfer will take 4 hrs1TB transfer will take 3.3 days

WAN

Page 9: Aspera bt-big-data-cloud

BIG-DATA and WAN TRANSFER WITH TCP

TCP WAS DESIGNED IN THE EARLY 80’S• When data was small & bandwidth was limited• Fantastic for reliable data delivery• Not fast enough for big-data

TCP IS THE ENGINE THAT DRIVES• FTP, HTTP & HTTPS• RSYNC, SCP & DICOM• CIFS & NFS

TCP DOES NOT LIKE NETWORK LATENCY/ RTT• Geographic distance increases latency• Network congestion increases latency

TCP DOES NOT LIKE PACKET LOSS• Loss is caused by congestion• Different network capacity• Wireless and satellite communications

Page 10: Aspera bt-big-data-cloud

The Aspera SolutionSo if TCP doesn’t work, what’s the answer?

Page 11: Aspera bt-big-data-cloud

WAN is 1000Mbps

Max TCP Throughput ~29Mbps

Max Aspera Throughput ~995Mbps (gain of x34)

ROI measured in $$ cost of not using 971Mbps

Same WAN Scenario with Aspera

1000 Mbps• 170ms RTT• 0.001% packet loss rate ParisSeattle

WAN

At 995 Mbps• 50GB transfer will take ~4 hrs• 50GB transfer will take ~7 mins

• 1TB transfer will take 3.3 days• 1TB transfer will take 2.4 hrs

Page 12: Aspera bt-big-data-cloud

FASP™ — HIGH-PERFORMANCE DATA TRANSPORT

MAXIMUM LINE-RATE WAN TRANSFER SPEED• Transfer performance scales with bandwidth independent

of transfer distance and resilient to packet loss• Optimal end-to-end throughput efficiency

CONGESTION AVOIDANCE AND POLICY CONTROL• Automatic, full utilization of available bandwidth• On-the-fly prioritization and bandwidth allocation

UNCOMPROMISING SECURITY AND RELIABILITY• Secure, user/endpoint authentication • AES-128 cryptography in transit & at-rest

SCALABLE MANAGEMENT, MONITORING AND CONTROL• Real-time progress, performance and bandwidth utilization• Detailed transfer history, logging, and manifest

ENTERPRISE-CLASS FILE DELIVERY• Transfers up to thousands of times faster than FTP/HTTP(S)• Precise and predictable transfer times• Extreme scalability (concurrency and throughput)

Page 13: Aspera bt-big-data-cloud

fasp Bandwidth ROI

FTP Across US US – EU US – ASIA Satellite

1 GB 1 – 2 hrs 2 – 4 hrs 4 – 20 hrs 8 – 20 hrs

10 GB 15 – 20 hrs 20 – 40 hrs Impractical Impractical

100 GB Impractical Impractical Impractical Impractical

fasp™ 2 Mbps 10 Mbps 45 Mbps 100 Mbps 200 Mbps 1 Gbps

1 GB 70 min. 14 min. 3.2 min. 1.4 min. 42 sec. 8.4 sec.

10 GB 11.7 hrs 140 min. 32 min. 14 min. 7 min. 1.4 min.

100 GB 23.3 hrs 5.3 hrs 2.3 hrs 1.2 hrs 14 min.

FTP: Limited by Distance & Packet Loss, Not B/W

Aspera: Scales Linearly with Bandwidth

Distance & Packet Loss Independent

FASP vs TCP PERFORMANCE

Page 14: Aspera bt-big-data-cloud

6 Gbps Scalable WAN Throughput

~6Gbps Big-Data Throughput• Latency independent• Loss independent

x3000 improvement vs. TCP• 1TB data moved in 20 min• 2 days with TCP over LAN conditions

Scale to ~10Gbps with IQ Accelerator

Page 15: Aspera bt-big-data-cloud

High Speed Mobile Data Transfer with fasp-AIR™

fasp-AIR SDK – maximum data transfer speed and predictability for mobile devices

• Embeddable software library allows app developers to integrate superior transport capabilities to their own applications such as faster and more predictable downloads/uploads.

• Available for Android and iOS on Aspera Developer Network• Designed for wireless networks with high latency, high packet loss environments• Integrated transfer queuing, pause, resume and progress reporting• Achieves significant performance improvements for upload and download

speeds over 3G, 4G and 802.11 g/n.

Page 16: Aspera bt-big-data-cloud

fasp-AIR Benchmarks on Verizon 4G

In some cases (highlighted in orange), speeds will vary greatly, depending on available bandwidth and the underlying condition of the wireless network.

Page 17: Aspera bt-big-data-cloud

CUSTOMER USE CASES: NCBI/NIH, HUTCHINSON

Page 18: Aspera bt-big-data-cloud

Large-scale Global Collaboration: 1000 Genomes

Petabytes of data transferred monthly• Files range in size from KBs to many GBs

Repository contents• 2,500 genomes from 27 populations• Several types of variations: SNPs, small insertions and deletions,

structural variants, and copy number variants

Available on web - 4 locations• 1000genomes.org, AWS, NCBI, and EBI websites• Technology web sites use:

• Aspera Connect Server• Aspera Developers’ Network and SDK

• Researchers across all locations use:• Aspera Connect client • (Freely distributable with server license)

NIH

Upload/Download

NIHNIH

Data

Cloud

Page 19: Aspera bt-big-data-cloud

Researcher to Researcher Collaboration

Use case : Genomic research

Genomic research results sharing• Research made available to collaborators• Research published—globally

Workflow• Illumina > Storage > Researcher > Aspera• Publish one-to-many

Collaboration options• Person-to-person, one-to-many (faspex server)• Publish-subscribe (faspex or connect server)

Seattle

Faspex in use by world-renowned Cancer Research Center in Seattle, WA

Page 20: Aspera bt-big-data-cloud

CLOUD COMPUTING & BIG DATA

Page 21: Aspera bt-big-data-cloud

• Eliminates the need to plan ahead• Allows companies to meet demand• Without the lead-time bottleneck

THE POTENTIAL OF INFINITE COMPUTING RESOURCES, ON DEMAND

CLOUD COMPUTING — WHY IS IT SO COMPELLING?

• Reduce capital outlay and investment risk• Start small & increase h/w resources to match need• Auto-scale to meet demand

THE ELIMINATION OF AN UP-FRONT COMMITMENT

• CPU’s by the hour• Storage by the day• Bandwidth by the GB

PAY-FOR-USE RESOURCE MODEL

Page 22: Aspera bt-big-data-cloud

SO? WHAT CAN I DO WITH IT?

• Near-line for editing, creative apps and processing• B2B / B2C data workflow• Offsite storage for disaster recovery and business

continuity

• OTT, play out, release, project & event specific marketing

• Collaborative data exchange• CDN and global delivery

• Compute Intensive: 10’s, 100’s, 1000’s of CPU cores• Transcoding, rendering, encoding, watermarking• Big-data analytics & HPC

DATA & CONTENTDISTRIBUTION

DATA PROCESSING & CONTENT CREATION

STORAGE FOR ARCHIVE & D/R

Page 23: Aspera bt-big-data-cloud

GETTING IN AND OUT OF THE CLOUDKNOWING WHEN TO CHOSE THE RIGHT TOOL

Page 24: Aspera bt-big-data-cloud

CHALLENGES OF STORING BIG FILES IN THE CLOUD?

BEWARE THE OBJECT STORE:• Not like traditional NAS or SAN• Bigger, better, but possibly much more complex• a.k.a. Google File System, Amazon S3, Hadoop Distributed File System • Simple read/write of data “blobs”, indexed by a key• Multiple replicas are distributed across storage for durability and optimized for access • Should work well for storing large numbers of files

UNDERSTAND CHUNKS, BLOCKS and BLOBS• You need to deal with chunks, blocks and blobs• “Chunk” sizes are small (64 MB/128 MB)

• Large media files must be “chunked” (1TB file = transporting and reassembling 10,000+ chunks!)• Multi-chunk APIs impede workflow and are complex

• Data I/O use the standard HTTP(s) protocol • VERY SLOW at distance• Single HTTP stream slow even locally (<100 Mbps).

BIG-DATA SERVICES WILL NEED A HIGH-SPEED BRIDGE TO THE CLOUD• Large files moved at full bandwidth capacity with global access• Overcome the WAN and storage bottleneck• Support files of any size or quantity• Transparent to the end user/data owner (GUI, command line, API, browser, etc.)• No hardware to support B2B, B2C, C2B workflow

Page 25: Aspera bt-big-data-cloud

FIRST MAJOR BOTTLENECKS: WAN TRANSFER

Page 26: Aspera bt-big-data-cloud

SECOND MAJOR BOTTLENECKS: LOCAL HTTP I/O

2nd Bottleneck — Data Center

1st Bottleneck - WAN

Page 27: Aspera bt-big-data-cloud

S3 & BIG-DATA: UNDERSTAND THE CONTRAINTS

Page 28: Aspera bt-big-data-cloud

S3 & BIG-DATA: MEET ASPERA’s DIRECT-TO-S3

clientcargo downloader

mobile apps

connect plug-in

point-to-point

Page 29: Aspera bt-big-data-cloud

OVERCOMING BOTH BOTTLENECKS

#1 — TRANSFER DATA TO EC2 OVER WAN EFFECTIVE THROUGHPUT

• http transfer over WAN (single stream)• Typical internet conditions

• 50–250ms latency & 0.1–3% packet loss• 15 parallel http streams

<10 Mbps

<10 to 100 Mbps

• Aspera fasp transfer over WAN to EC2 up to 1Gbps (per EC2 Extra Large Instance)

#2 — TRANSFER DATA FROM EC2 TO S3 EFFECTIVE THROUGHPUT

• Standard single stream http 10 to 100 Mbps

• Aspera S3 Proxy• With parallel I/O http streams

up to 1Gbps(per EC2 Extra Large Instance)

ASPERA + AWS | ~10 TB transferred per 24 hours | PER EC2 INSTANCE

Page 30: Aspera bt-big-data-cloud

ASPERA DIRECT-TO-S3 — LINE RATE ACCESS TO THE CLOUD

UNRIVALED ASPERA PERFORMANCE• Built on Aspera fasp™ technology for maximum transfer speed

• Regardless of file size, transfer distance and network conditions• Precise bandwidth control ensures the available bandwidth is utilized to achieve maximum transfer

speeds, while being fair to other business-critical network traffic

SEAMLESS INTEGRATION WITH S3• Integrated with S3 multi-part HTTP for maximum “last foot” performance• Simple configuration of S3 credentials, for both shared and dedicated docroot• Transfers directly into S3 are seamless and transparent to user

ENTERPRISE-GRADE SECURITY AND RELIABILITY• Secure authentication with encryption in transit & at rest (AES-128, FIPS 140-2, HIPPA Compliant)• Packet-level data integrity verification• Automatic resume of partial or failed transfers• Full support for AWS S3 Service-side-encryption at rest

INTEROPERATES WITH ALL ASPERA HOST OPTIONS• Any platform (Windows, Linux, MAC, UNIX, iOS, Android)• Any Aspera Clients (CLI, Desktop, Point-to-Point, Mobile, Web, Embedded)• Any Aspera Servers (Enterprise, Connect, faspex)

Page 31: Aspera bt-big-data-cloud

ASPERA FOR AWS: DIRECT-TO-S3

fasp

HTTP – multipa

rt

HTTP – multipart

Aspera TransferServer

Aspera Client

Client, Dallas, TX

1. Upload using typical multi-part HTTP client

2. fasp high-speed upload Direct-to-S3

1

2

Herndon, VA

Scale out

Page 32: Aspera bt-big-data-cloud

HYBRID CLOUD DEPLOYMENT (PUBLIC/PRIVATE)

fasp Shares

NodeNode

Shares app transparently communicates with Aspera server Nodes in cloud and in enterprise

User browses content across authorized shares

High-speed data transfers with Datacenter

High-speed data transfers with Direct-to-S3

DMZ

Herndon, VA

fasp

Datacenter, Emeryville, CA

Client, NY, NY

Page 33: Aspera bt-big-data-cloud

ASPERA SOFTWARE ON DEMAND

KEY FEATURES• On demand high-performance data transport to and from remote infrastructures• Unlimited scale out of transfer capacity with additional AMIs• Support for all Aspera Server software and use cases• Additional Client Options: Mobile, Outlook Plug-in & Cargo (Aspera faspex)• Flexible Storage Options: Local, EBS, AWS S3 • Seamlessly interoperates with on-premise Aspera deployments• Integrated Management and Monitoring

APPLICATIONS AND USE CASE• High Performance Computing On Demand• Content Aggregation, Transformation and Distribution• Time-boxed event or project-based collaboration, ad-hoc distribution or content ingest

Aspera ConsoleGlobal transfer monitoring,

reporting & control

Aspera SharesGlobal Person-to-person file

transfer & exchange

Aspera faspexGlobal Person-to-person file

ingest & distribution

Aspera ServerUniversal file transfer server

supports desktop, web, mobile & embedded

Page 34: Aspera bt-big-data-cloud

Aspera software product & technology portfolio

Transport

Distribute

Complete portfolio of servers and end point clients for high-speed digital content delivery and distribution.

Enterprise and Connect Server• Universal file transfer server and web-based

interface and directory listing

Client and Point-to-point• Uni- and bi-directional transfer clients

Connect• Web browser plug-in for high-speed uploads

and downloads

Mobile• High-speed transfer for mobile devices

Sync• Highly scalable, multidirectional file replication

and synchronization

Collaborate

Global person-to-person and project-based exchange and collaboration of files and directories, of any size, over any distance, over any network.

faspex Server• Secure digital delivery and collaborative file

transfers with remote users and partners• Integrated e-mail notifications for delivery and

successful download• Comprehensive administration, user

management & access control

faspex Multi-Server / HA• Automated bi-directional relays between sites

and multiple servers• 3-tier architecture with support for clustering and

high availability

Cargo• Automated client downloads

Automate

Web-based application and SDK for creating and managing automated workflows, from simple file forwarding, to complex process orchestration.

Orchestrator• Intuitive graphical workflow designer• File processing decision tree and flow• Rich and flexible plug-in architecture for third-

party process integration• Comprehensive library of plug-ins for

transcoding, virus checking, quality checking, archive, notifications

• High volume processing• Detailed dashboard, workflow, and step-level

progress reporting.• Open development framework for designing

and integrating highly processing and automation pipelines

Our unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and networkfasp™

Patented, file-based bulk data transportfasp-AIR™Uploads and downloads over 3G, LTE and Wi-Fi networks

fasp3™Next-gen protocol for any bulk datafasp-MC™High-speed delivery over multicast

Aspera On-Demand S3|DirectHigh-speed transfer direct to cloud storage (S3)Console transport managementCentralized web-based management, monitoring, and reporting

Page 35: Aspera bt-big-data-cloud

Aspera fasp™ software environment

Page 36: Aspera bt-big-data-cloud

ASPERA DEVELOPER NETWORK

A complete set of SDKs provides developers with guides, reference information, and sample code to assist them with integrating Aspera technology into their own applications. Aspera fasp™ technology can be used in desktop, network-based, and web applications in place of FTP, HTTP, or custom TCP-based copy protocols.

ASPERA MOBILE APIs

Android SDKAspera Android SDK provides a Java API to transfer files using fasp-AIR™.

iPhone SDKAspera iPhone SDK provides an Objective C API to transfer files using fasp-AIR.

ASPERA APPLICATION APIsfaspex™ Web APIThe Aspera faspex Web API provides a set of services that enables users to create and receive digital deliveries via a Web interface, while taking advantage of fasp high-speed transfer technology.

OTHER INFORMATION

Supporting Tools and LibrariesSupporting tools and libraries let you perform other common tasks surrounding file transfers.

General ReferenceReference on error codes, log file locations, configuration files and more.

ASPERA TRANSFER APIs

Aspera Web ServicesA SOAP based web service API that allows initiation, monitoring and controlling of fasp based file transfers.

Aspera WebJavascript API exposed by Aspera Connect client. It allows integration of fasp based file transfers into web applications.

Connect 2.8 developer Preview 2Introducing the new Connect 2.8 developer preview! Integrate the functionality of Aspera Connect 2.8, a fasp-based file transfer client, into your own web applications, while customizing it to your unique brand.

fasp ManagerA class library that allows intiations, monitoring and controlling of fasp based file transfers.

Aspera Multicast SDKA Java class library that allows initiation and management of IP multicast based data transmissions using Aspera fasp-MC™.

Page 37: Aspera bt-big-data-cloud

Aspera software product & technology portfolio

Transport

Distribute

Complete portfolio of servers and clients for high-speed data delivery and distribution.

Enterprise and Connect Server• Universal file transfer server and web-based

interface and directory listing

Client and Point-to-point• Uni- and bi-directional transfer clients

Connect• Web browser plug-in

Mobile• High-speed transfer for mobile devices

Sync• Highly scalable, multidirectional file replication

and synchronization

Collaborate

Global person-to-person and project-based exchange and collaboration of files and directories.

faspex™ Server

• Secure digital delivery and collaborative file transfers with remote users and partners

• Web, email, mobile client options

• Comprehensive administration, user management & access control

faspex™ Multi-Server / HA

• Automated bi-directional relays between sites

• 3-tier architecture with support for clustering, HA

Cargo• Automated package downloads

Automate

Web-based application and SDK for creating and managing automated file-based workflows.

Orchestrator

• Intuitive graphical workflow designer

• File processing decision tree and flow

• Rich and flexible plug-in architecture for third- party process integration

• Comprehensive library of plug-ins for transcoding, A/V, QC, archive, notifications

• High volume processing

• Detailed dashboard, workflow, and step-level progress reporting.

• Open development framework for designing and integrating automation pipelines

Our unique, patented transport technologies provide unparalleled speed, efficiency, concurrency and bandwidth control over any size, distance, and networkfasp™

Patented, file-based bulk data transportfasp-AIR™Uploads and downloads over 3G, LTE and Wi-Fi networks

fasp3™Next-gen protocol for any bulk datafasp-MC™High-speed delivery over multicast

Aspera On-Demand S3|DirectHigh-speed transfer direct to cloud storage (S3)Console transport managementCentralized web-based management, monitoring, and reporting

APIs APIs APIs

API’s

Page 38: Aspera bt-big-data-cloud

BT-ASPERA DISCUSSION

Page 39: Aspera bt-big-data-cloud

THANK YOU!

Daniel KumiDirector, New Market [email protected]