A Possible Approach for Big Data Access to Support...
Transcript of A Possible Approach for Big Data Access to Support...
![Page 1: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/1.jpg)
A Possible Approach for Big Data Access to Support Climate Science
Mark Foster
Hugh LaMaster NASA Ames Research Center
ESNet/Internet2 Focused Technical Workshop:
Improving Mobility & Management for International Climate Science July 15, 2014
![Page 2: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/2.jpg)
• This presentation is to facilitate the exchange of ideas related to Big Data access and constraints that can arise: • Trusted Internet Exchange • Security • Bandwidth
• This presentation does not represent any type of Agency policy, project, or endorsement
• Diagrams and notes within this presentation are not planned for implementation, they are for discussion within this workshop
Workshop Presentation Context
![Page 3: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/3.jpg)
• NASA Supercomputing – NAS and NCCS • resources • select transfer characteristics • existing challenges
• TIC – Trusted Internet Connection • goals, motivation (driven by DHS for all federal agencies) • what does this mean for current and near term science data xfers?
• Science DMZ and Data Transfer Nodes • friction free xfers for large datasets • sit at boundary of inside/outside • express for approved traffic, regular path for default • static: use/user designations – known in advance (proactive) • dynamic: traffic types(reactive)
• an opportunity for dynamic flow management w/ SDN • Futures
• clouds with clear skies • internal clusters, external clusters • constrained/specific user community vs unrestricted access
Summary/Overview
![Page 4: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/4.jpg)
• Growing performance of Wide Area Networks (WANs) – 10/40/100 Gbps • WAN host-to-host performance has exceeded FireWall (FW) appliance
performance consistently for last 10 years • TIC mandate specifies required elements of border
• Requires SBU data processing/storage elements to be inside/behind TIC
• Growing sophistication of security threats • Threat environment requires Defense-in-Depth, hardening user hosts
and servers; firewall appliances can’t protect against all threats • OMB mandate to use commercial cloud computing and storage where
possible for low/moderate-security data • Cloud resources are available over WAN; external cloud use for
internal computing increases pressure on LAN/WAN border security elements
• FedRAMP compliant – commercial services brought inside NASA auth boundary still have monitoring/border protection requirements
Computing, Communications Environment Evolving
![Page 5: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/5.jpg)
NASA major supercomputing facilities: NAS and NCCS
• Distributed access: • Earth and Space Science datasets from widely distributed sources
• Results transferred back to widely distributed sites
• Some data at supercomputing facilities for processing; many sets stored elsewhere
• NCCS facility at NASA Goddard Space Flight Center: • major weather/climate/oceanographic modeling and data assimilation
• worldwide climate research
• approx 590 TeraFLOPS computing, 4 PetaBytes of online storage.
• NAS facility at NASA Ames Research Center: • premier NASA supercomputing facility since 1983, focus on simulation for
aerospace (CFD) and science (weather, climate, space science/solar dynamics/astrophysics)
• approx 4 PetaFLOPS computing, 14 PetaBytes of online storage.
![Page 6: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/6.jpg)
Climate Related Data
• Remote Sensing Data
• Assimilated Datasets (validation data)
• Model Output
• Climate Projections
Web portals: access to this data provided by tools and distributed systems that hold the data sets. A useful start.
Growth in types and sizes presents access challenges.
![Page 7: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/7.jpg)
EOSDIS Portal
![Page 8: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/8.jpg)
NASA Earth Exchange Portal
![Page 9: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/9.jpg)
Science/High Performance Computing Requirements in a Nutshell
• Science datasets moving over WANs often 10’s to 100’s of TeraBytes
• Large science flows are typically earth science, astro- and solar physics; these flows are sometimes referred to as “elephant flows”
• Network Round Trip Time (RTT) ranges from 1-2 ms (UC Berkeley, Stanford), 8 ms (JPL), 68 ms (NCSA), 200 ms (University of Oslo)
• Good network performance over large RTT requires end-to-end network and host tuning, zero packet loss, optimizations like Jumbo Frames
• Consumer and commercially oriented desktop/laptop/handheld device networks and security appliances are engineered for a massive number of tiny to small flows (“mouse flows”)
• Consumer/commercial switches/appliances often drop packets/have far too small, ill-behaved buffers to work well on elephant flows
![Page 10: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/10.jpg)
Example Elephant Flows
• Top: all traffic (2 days) via NREN => NAS
• Bottom: same 2-day time, NCSA => NAS
– 700 Mbps average over 48 hours
– 5 minute peaks to ~2.4 Gbps
– Roughly 14 TB dataset in ~32 hours
– Elephant flow was ~70% of total volume during that 2 day interval
– Network has necessary headroom to handle these peaks (of roughly 5 Gbps)
– Application: astrophysics/solar physics
![Page 11: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/11.jpg)
Example Elephant Flows (2)
• NCSA=>NAS – 8 hours at
2.0-4.2 Gbps
– 9000-byte packets
• NAS=>UCSC – About 40
mins at 2.0-2.8 Gbps
– 1500-byte packets
![Page 12: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/12.jpg)
DHS TIC Architecture Requirements
• SBU data processing/storage elements to be inside/behind TIC
• All traffic monitored (e.g. via optical splitter) • Limited WAN border/TIC locations
• Science external connectivity is unusual to DHS
• Most civilian Federal agency connectivity looks similar to business IT
![Page 13: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/13.jpg)
• Ingress and egress data flows of all (TCP/UDP) connections must be routed through the same physical TIC location (Symmetric Routing through TICs).
• TIC links leading to local client-computer LANs have to be configured such that a stateful firewall appliance or “stack” (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in the path
• Packet capture and retention requirements – 24 hour full packet capture at link capacity is requirement – access to previous 24 hrs req’d
• Centralized response management • Ability of centralized agency directive to block an address
(or address range) and have it take effect immediately
DHS TIC Architecture Requirements (continued)
![Page 14: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/14.jpg)
Enterprise Routing (notional)
External Peering Network
Internal Wide Area Network
TIC-1 TIC-n
BP BP
TIC-n Trusted Internet Connection #n
BP Center Border Protection Services (FW, IDS, Content Filter)
symmetric ingress/egress
LAN LAN
external peers external peers
TIC
Boundary
![Page 15: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/15.jpg)
Science Border/WAN Architectural Goals and Designs
• DTN – Science DMZ • Special border DMZ data transfer hosts optimized for WAN performance • Many supercomputer/big data centers implement this now • Requires close cooperation w/ Security to get both performance and security
• On-demand path reservation • ESnet OSCARS provides VLAN-based reservations today within ESnet • Goal: signal end-to-end path from DTN host across LAN, I2, ESnet, transport nets • OSCARS connection via NREN provides path across ESnet for augmented access
for NEX today
• Improved ease-of-data-access among partners • Integrated Globus access with DTN/Science DMZ; integrate PIV/token
authentication • Improved data exportation (Who can read data? Who can change it? Re-
exportation?) • Cloud storage architecture and high-speed access: both external commercial and
FedRAMP compliant that is inside auth perimeter
![Page 16: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/16.jpg)
Reference Science DMZ Architecture
Site: http://fasterdata.es.net/science-dmz/science-dmz-architecture/
![Page 17: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/17.jpg)
FW
IDS
A Possible Science DMZ Architecture within the TIC context WAN
external partners
Science net exchange
fabric
SciDMZ switch/router
FW
IDS
DTN
TIC-n
External Peering Network
Internal Wide Area Network
FW
TIC
Boundary
science project resources
perfSONAR
This diagram does not reflect a NASA plan or architecture. It is for discussion purposes only.
![Page 18: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/18.jpg)
Science DMZ/Data Transfer Node
• Operational problems it solves: • Inability to control features and defaults that supercomputing vendors
support • Inability to control end-users environment, both network and host • Effort required to coordinate all system configurations and
parameters in the supercomputing environment
• Science DMZ border nodes can be configured for optimal WAN transfers • Improved utilization of underlying WAN (E2E Jumbo Frames, big
buffers) • May also integrate easier external user authentication (Globus, PIV) • May also integrate end-to-end reservations; additional security
features
![Page 19: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/19.jpg)
Desired access among partners
• Globus Online/GridFTP users would like to use their Globus credentials
• PIV card users would like to use PIV single-sign-on capability • Users would like to allow easier data sharing between
supercomputers and other facilities that they use • Security issues to be resolved
• Re-exportation of data • Third-party control of sharing of semi-confidential data • Trust among Globus user communities
• Implementation on Science DMZ would allow limited trust of credentials without expanding trust to high-value internal resources
• Establish coordination via Identity, Credential, and Access Management group (ICAM)
![Page 20: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/20.jpg)
On demand path reservation
• Multiple approaches • Software Defined Networking (SDN) with OpenFlow, ESnet OSCARS
(assisted setup of VLAN paths), manually provisioned VLANs, policy-based routing
• OSCARS used to support NEX <-> EDC path
• NASA Ames/CET lab has access to experimental 40/100G capabilities but not yet equipped to provide SDN switching capability at those speeds • Possible test partners include CENIC , Internet2, NSF CC-NIE
recipients, ESnet • Establish how to provision paths without endangering operational
traffic • Integrate with end-user system (probably Science DMZ server)
• Enable Science DMZ users to easily establish more optimal path end-to-end
![Page 21: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/21.jpg)
MyESnet (/) Login (/user/login/) | Register (/user
/register/)
es.net-4003 GPN - NASA, VLAN 3025, 200M 08-01-2013 To 08-01-2014
OSCARS Circuit
Traffic A to Z Delivered
Z to A Delivered
2014-01-24
19:09
(http://www.es.net/)(http://www.lbl.gov)(http://science.energy.gov/)
FAQ (/help/faq)
Site Updates (/help/update)
OSCARS (/oscars) / es.net-4003 (/oscars/es.net-4003)
NASA
30 days 24 hours Last hour Refresh7 days
9/1/5.30
25
to_s
acr-cr5_
ip-a
to_s
unn-cr5_
ip-a
to_d
env-cr5_
ip-a
to_s
acr-cr5_
ip-a
to_k
ans-cr5_
ip-a
to_d
env-cr5_
ip-a
10/1/5.302
5
sunn-cr5 sacr-cr5 denv-cr5 kans-cr5
Existing SDN in the WAN supports NASA Earth Exchange
• Existing static OSCARS VLAN path NAS-NREN-(ESnet VLAN)-EDC – NEX data fetch EDC => HEC – “200 Mbps”, occasionally 650M/1000M – Avoids low performance default route,
long RTT
• SDN goal for WAN – allow project DTN host-host signaling through multiple domains
ESnet OSCARS traffic – EDC => NAS 14 TB/2 days – 650 Mbps avg – RTT 43ms
![Page 22: A Possible Approach for Big Data Access to Support …meetings.internet2.edu/media/medialibrary/2014/07/14/...2014/07/14 · (w/ IDS, IPS, web proxy, VPN, etc.) may be inserted in](https://reader034.fdocuments.in/reader034/viewer/2022050118/5f4eff2f31076a75a636eede/html5/thumbnails/22.jpg)
Possible Futures – Clouds, etc.
• Internal vs External Clusters; clustered Science DMZ DTNs
• Cluster Federation (identity, authorization, access) among participating organizations
• Virtualized network services on VM clouds • SDX – software defined exchange: coordinated
access to clusters and distributed storage capabilities