Post on 26-Jan-2015
description
Data Tactics Corporation
04/10/2023
Cloud OverviewIntro to Cloud Technologies: Agenda
1. Intro to Data Tactics2. Cloud3. Data4. Hardware5. Example Cloud Solution
Data Tactics: Who Are We?• Established in 2005
– Created by a group of seasoned engineers, analysts and management specialists who
– Have worked together for over thirty years– A Minority-Owned Small Business registered with CCR and ORCA– TS-Facility Clearance
• Locations in McLean, VA & Aberdeen Proving Ground, MD,
– Advanced Lab Facilities– Integrated Development, Test, Integration and Evaluation Facilities– Host six (6) clouds for the Army and DARPA– Demonstration Rooms– One Sponsored Certified and Accredited SCIFs
• Prime Contract Vehicles– Army RDECOM BAA– Army I2WD BAA– GSA Alliant Small Business– Subcontracts with several LSI firms across DoD and IC
• Certifications– ISO 9001 – Quality Management Systems (May 2010)– ISO 27000 – Information Management Security Systems (May 2010)
Data Tactics: What we do• Data Architecture
– Innovation and Design– Assessment and Benchmarking– Collaboration and Uniformity
• Data Engineering– Discovery, Ingestion, and Cleansing– Scientific Analysis– Large Scale Computation and Platforms
• Data Management– Security and Assurance– Infrastructure and Administration– Visualization and Dissemination
Data Tactics Solutions Spectrum
Data Tactics: Our Family• Over One Hundred Fifty Employees
– Leadership Team, Deeply Experienced, Very Successful, Rich in Relationships– 90% TS/SCI cleared, many with polygraph(s)– Employee retention near 90%
• Steeped in and Dedicated to the Data Tactics Vision• High percentage of Military and Intelligence Community veterans
– Personnel Certifications• ITIL V3 Foundation• PMI certified project managers• CISSP certified security managers• Cloudera Certified Engineers
• 35% of Technical Staff • Software Certifications• Over 10% of Staff are “Data Scientists”
• Three WORLD class semantic researchers• Java, Solaris, Linux, Microsoft, Oracle, VMware, IRIX• Hardware Certifications• Riverbed, EMC, SUN, Dell• Architecture• SOA, DoDAF, other Modeling
– 25% have Advanced Degrees and Doctorates
Data Tactics: Cloud Experience
• 5 Clouds on SIPRNET• 3 at our secure facility in Tyson’s• GISA, Ft. Bragg• Afghanistan
• 4 at TS/SCI• AF TENCAP on JWICS• NRL on JWICS• DARPA• INSCOM• DSC (pending)
• Over a dozen at Unclassified/FOUO Level• Supporting real-world missions on contract
• CX-I Cloud in Afghanistan• At various levels of complexity
• Cloud Domains is where we live• Data, is the Hard Problem
Cloud - The Easy Part
• According to NIST: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.
• Five essential characteristics of Cloud Computing:
NIST Definition
1. Broad Network Access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms
2. Rapid Elasticity: Capabilities can be elastically provisioned and released
3. Measured Service: Cloud systems automatically control and optimize resource use by leveraging a metering capability
4. On Demand Self Service: A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.
5. Resource Pooling: The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model.
• From a service model perspective, cloud is also divided into:– SaaS, PaaS, IaaS
• Software Services [SAAS]– A type of reusable bundle of functionality – (typically business related and infrastructure – related) which may be accessed by other cloud
computing components, software or end users directly to create meta-applications. These bundles of functionality execute within the cloud.
NIST Definition (Cont)
• The Run Time Platform [PAAS]– A solution stack as a service that facilitates deployment and run-time of applications that
required specialized run-time environments. J2EE (clustered), .Net (clustered), Web Technologies (basic servlet, Web Services), basic or clustered, Virtualization, HPC program, Grid
• The Cloud Infrastructure [IAAS]– Compute service, storage service, data parallelization service, remote access service,
management service, and security service.
• Deployment Models are divided into:– Public, Private, Hybrid, Community– Public Cloud: The cloud infrastructure is
provisioned for open use by the general public.– Private Cloud: The cloud infrastructure is
provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units).
NIST Definition (Cont)
– Hybrid Cloud: The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability.
– Community Cloud: The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations).
Beyond NIST Definition• Limited to three service models
– IaaS, PaaS, SaaS• Does not identify the capabilities of distributed systems
– Big Data revolutionary capabilities– Linear scaling – Logarithmic reduction in processing times– Dramatic cost reductions to accomplish tasks formerly in the realm of traditional HPC
Evolution of Cloud in IC• Evolution of Cloud in IC/DoD
– JIOC-I– DCGS-A
• Version 2• Version 3• Rainmaker• DSC
– DARPA– I2 Pilot
• Customer Requirements drive Solution– Not cookie cutter
• Budget, Data, Performance, Security, existing components are all drivers
IC Cloud ‘Flavors’
• Compute Cloud (or Data Cloud)– Handle 3Vs – volume, velocity, variety of data– Petabyte to Exabyte scale with linear scaling– Big Table like construct and supporting capabilities
• Hadoop File System (HDFS)• MapReduce• Zookeeper• Accumulo
• Utility Cloud– Contains the suite of services/apps that interact with the Compute Cloud
• Ozone Widget Framework (OWF)• Web Tier – Data Visualization Tools• Auditing
• Storage Cloud– Scalable storage for large files – Imagery, FMV– Shared Directories
• Most IC Cloud Implementations are a combination of all three flavors
Data and Utility Cloud Example
LAYER 4Cloud Analytics
LAYER 3Cloud Services
LAYER 2Cloud Software
LAYER 1Cloud Hardware
HardwareUtility CLOUDGHOSTMACHINE
Cloud Stack Feature List Example• User Capability
– Thin Client Dataspace Retrieval via Query• Textual or geospatial query• Display with time-wheel, geo-spatial map display, link-graph display
– Thin Client Search for Resolved Entities• Display with Document, Entity Graph, Timeline, Timewheel, Entity Viewers
– User Upload of Analyst Products to Cloud– Persistent Data Query and Alerting (on ingest)– Integrated Chat Widget, and Widget for Querying External Systems– Uniform Widget Experience and Data Sharing between Widget Views
• Data Sourcing– Flexible, secure Data Ingest Architecture– Ingest processes three examples of data formats
• Unstructured artifacts (e.g., free form report)• Semi-structured (e.g. email message)• Structured artifacts (e.g. RDB table, XML document)
Example: Cloud Stack Feature List
• System Capability– Advanced Analytics for Entity Extraction, Resolution and Link Analysis– API for Retrieval of Artifacts, Metadata and Semantic Indices– API for Application Access to Analytics Results– Cloud-to-Cloud Integration
• Between applications running inside the cloud, and • Between applications running inside with applications running outside the cloud• Multiple messaging formats and tools supporting Inter Cloud integration for data, services, and resources
– Cloud Computing Infrastructure, Scalable Storage with Double Redundancy– Security Infrastructure with Role and User Based Access Control
• Management Capability– Log Viewer App: Ability to Monitor Users and Activities in Cloud– User Management App: Ability to Define Access for Users– Ingest Monitor widget: Ability to Track Progress of Data Ingest– Bulk Export/Import App for Dataspace– Cloud Management System for Monitoring and Control– Workflow Management System
(Cont)
Data – The Hard Part
“The need to securely share actionable, timely, and relevant
classified information among state, local, tribal, and private sector
partners in support of homeland security is critical as we work together to address evolving
threats,” said Secretary Napolitano
The data architecture is divided between 1) a Mission Dataspace, and 2) an Operational Dataspace
• Mission Dataspace– The primary business driver (or mission) for our customers is to support the Intelligence
Community (IC) with a "solution for intelligence data integration, search, exploration, enrichment, exploitation, management, and sharing“
• The data on which these activities are undertaken is referred to as mission data and is stored in the Mission Dataspace.
• Operational Dataspace– A location to persist operational data - data that is directly used/created by
infrastructure application software to support their operation/execution, which in turn supports the mission
• Includes input data (information entering the system for the first time from a system or end-user), work and message queues, temporary results, configuration files, and any purely transient information
– Typically this data has a very narrow purpose (that of supporting a particularly business or infrastructure application).
Dataspace can be implemented using 1) HDFS, 2) Cloudbase, 3) Cassandra, 4) MySQL, 5) FS (local, SAN), 6) Oracle (limited)
Data Architecture
Data Models
Unified Dataspace Example
The Wild• Data sources with rich data & semantic context locked in domain silos• Data tightly coupled
to data-models• Data-models tightly
coupled to storage models
Silos isolated by• Implementation
technology• Storage structure• Data representation• Data modality
Segment 2 - Data Description
Segment 1 - Artifact Description
Segment 3 - Model Description
Unstructured Data
Rich semantic context
Rich data context
IntegrationEnrichmentExploitationExplorationAcross all sources
Structured Data
• Structure– Segment 1: Artifact Description Framework (ADF)
• Universal store for unstructured data (documents)• Indexes
– Segment 2: Data Description Framework (DDF)• Universal store for structured data (entities,
attributes, relationships)
– Segment 3: Model Description Framework (MDF) • Universal store for data / knowledge models
– Reference Data• Used to “normalized” data in other segments• Used to support business functionality (e.g., lists of alternative name spellings for search,
dictionaries)
– Inverted Indexes• Specialized indexes to support business functionality (search, analytics)
Mission Dataspace Data Model
Data Models
Segment 2 - Data Description
Segment 1 - Artifact Description
Segment 3 - Model Description
Unstructured Data
Rich semantic context
Rich data context
IntegrationEnrichmentExploitationExplorationAcross all sources
Structured Data
• DDF – looks at data in the following ways– Mention: A chunk of data, either physically located within a tangible artifact, or
contained within an analyst’s mind • “Washington” at offset x in file Y
– Sign: A representation of all disambiguated mentions that are identical except for their indexicality
• E.g., “Washington”– Concept: An abstract idea, defined explicitly or implicitly by a source data-model
• E.g., City, Person, Name, Address, Photo– Predicate: An abstract idea used to express a relationship between “things”
• E.g., isCity, isPerson, hasName, hasAddress, hasPhoto– Term: A disambiguated sign abstracted from the source artifact or asserting analyst
• E.g., Washington Person; Washington Location– Statement: Encodes a binary relationship between a subject (term) and an object
mediated by a predicate• E.g.,[Washington, Person] hasPhoto [GeorgeWashingtonImage.jpg]
Data Description Framework (DDF)
Operational Dataspace
• End user storage (documents, preferences, products)
• System events/traps
• Performance/resource utilization metrics/history
• Application log messages
• Messaging infrastructure message persistence
• Data Surveillance: watch patterns, subscriptions and notification profiles. May also need some working space
• Temporary indexes as well as final index sections (shards)
• Persistence of distributed state in case of total failure
• Directory for digital certificates (LDAP)
• Directory for security authorizations (LDAP)
• Security audit events
• Threat assessment results
• Vulnerability assessment results
• “Scratch” area used by various applications
• Working area to move files in/out of cloud
• Policies, rules, configurations, etc
• CM repository
Good Questions to be asked
Examples1. What are your requirements for Cloud
Computing?1. Integrate Federated Workforce into Headquarter
Business Processes1. How many?
2. Enterprise Storage Capabilities1. For HQ or regions across the world/country?
3. Provide Analytics for discovering and creating knowledge
4. Sharing Information
2. What are the handling requirements for your data?
1. Classified/LES2. US Persons3. Title 6, 10, and/or 504. ICD 501/5035. MOUs
3. What is the anticipated security level associated with your cloud vision
1. PL2, PL3, PL4???
4. What are the complexities associated with your data in its current state?
1. Unstructured documents on shared drive2. Structured in legacy main frame3. Semi-Structured documents with strict handling
procedures (stored in ECM?)4. Amount of Data (GB vs TB)
5. What is your budget?1. Open Source vs Open Source and COTs solution
6. What is your timeline?1. Solution can be driven by speed of delivery vs
functional requirements, as an example.1. Leverage existing cloud solutions as a starting point, rather than
a final product
7. What components (Software & Hardware) are available for reuse?
1. Servers, SANs, Networking gear2. Meta Data Extractors3. One Way Guards4. VM licenses
A Real-World Example
Building up the Cloud
Distributed Common Ground System – Army (DCGS-A) Standard Cloud (DSC)
Business Need for DSC
Bridge the whole IC and all Services with an open data and processing capability – the Dataspace
Break the Data Barriers• End data silos and their proliferation• Provide a universal data storage and computational fabric• Make data ingest faster and simpler• Allow data to be endlessly reshaped / reused• Search, enrich, integrate/ fuse, exploit within and across all
data sources and domains
Stop Moving Data, Start Using Data• Ingest once, reuse endlessly• Move computation to the data (and not data to the
computation)• Build highly sophisticated exploitation tools and
applications• Create quick mashups and mission applications• Surf around and explore the entire Intel Dataspace• Connect all the dots in any way that makes sense from any
mission perspective• Change your mind and do it again, and again… in new ways
without messing up what you already have
Achieve Previously Unachievable Scale• Go bigger, faster, larger• Realize a truly large-scale data store• Embrace an unbounded diversity of data, processing, and
applications• Achieve orders of magnitude greater processing power• Expose familiar usage metaphors (e.g. Google, Amazon)
Get More Bang for the Buck• Deploy using fully automated procedures• Avoid almost all SW licenses• Stay up and running with an inherently robust design that uses
commodity HW
Do New Science and Develop New Practice Around Intelligence• Explore data and processing at entirely new scale and discover new
insights and phenomena • Cultivate an ever growing, increasingly rich, and productive
Dataspace
DSC Software Stack
InfrastructureAs a Service
PlatformAs a Service MapReduce / HDFS / Flume / Oozie
Cloudbase / Katta / Zookeeper
JVM / Apache HTTP/Proxy / Tomcat
HPSA (Puppet?) / HPNA
DNS / DHCP / NFS / NTP
Condor / Cloud Management System
Logging / Auditing / Nagios / Ganglia
SoftwareAs a Service
Client Services
PREFS / OWF / Safemove / OpenFire
V3 / MFWS / DIB / GeoSpatial / BC
GeoServer / Element Index / AntiVirus / AIDE
DSMS / ASLI / ActiveMQ / Alerting
Ozone Widgets
Servers / SAN / Network / Facilities
Linux / LDAP / MySQL / CAS
• The DSC System production hardware is housed in a single twenty foot Performance-Optimized Data (POD) data center
• The POD is configuredto maximize its hardware payload while taking intoconsideration
– Overall power availability– Individual device power
consumption and power dissipation– Individual device weight– Individual device heat generation
Facility
Infrastructure – Hardware Profile
• Two rack types– Compute – 222 servers– Management – 6 servers
• 1,824 cores• ~100,000 MIPS (assumed Java 50 CPI)• 1.035 PB disk storage (raw)• 13.92 TB physical memory (RAM)• Environmental support
– Active power w/backup generator– Two live coolers w/backup cooler
• Processor :– Two Quad Core X5570 2.93GHz Intel Xeon CPUs -> 8 cores per servers
• Memory configuration – varies:– 25 (of 222) nodes with 144 GB Memory via 18 8GB DIMMs [approx: $14K]– 75 (of 222) nodes with 72 GB Memory via 18 4GB DIMMs [approx: $10K]– 122 (of 222) nodes with 36 GB Memory via 18 2GB DIMMs [approx: $8K]
• Storage:– Eight 500GB 6G SAS 7.2K 2.5in MDL Disk Drives– RAID 5
• Power:– N+N 750W Power Supplies
Compute Server - Profile
25 x $14K 75 x $10K+122 x $8K $2.076M
Key design features:
• Separation of the mission and management/operational data to ensure security and performance of the solution– Using VLANs
• Connection to the DSC cloud will be restricted to entry point nodes for a single security choke point (Cloud Access Point Nodes). – Greatly simplifies boundary security– The POD internal network will be non routable (10.x.x.x) with external access
only through the two entry points• Redundant paths from servers and enclosures via stacking cables to
redundant switches at the core provides resiliency from core switch failures as well as cabling faults. – Each node has access to two independent switches in the enclosure.
Network Architecture
Network Architecture (cont’d)
Compute Rack
Management Rack
SAN connected toall 6 mgmt nodes
10GbE Core Switches
Single access point through NAC Leaf Switches
interconnect all nodes
DSC Cloud Elevations222 Compute Nodes
1776 cores, 12 TB RAM, 888 TB DFS6 Management
Nodes
2 SANs168 TB Network Nodes
DSC 1.5.3 U/I
SandStormCommon Map Widget
Time Line
DSC 1.5.3 U/I
ICast
Cloud Text Analytics
Element Graph Viewer
BACKUP SLIDES
• Cloud Head Node: – These nodes are responsible to execute the various cloud service
“masters”– These masters are collocated together as many of them work together.
One node may be responsible to run the Mission Dataspace version of these services, and another node may support the Operational Dataspace. A third node may act as a failover.
• Cloud Access Point Node: – These nodes host the Web Infrastructure Services (web-servers and
proxies) and portions of the Application and Systems Integration Subsystem, and act as a physical gateway into the cloud.
Definition: Node Types
• Cloud Infrastructure Nodes: – These nodes are divided into two categories:
• Low-level infrastructure applications such as DNS, DHCP and NTP (part of Core Infrastructure Services).
• Cloud services (workers corresponding) to the Cloud Head Nodes (see above)
• Cloud Management Nodes: – These nodes run general purpose applications used to manage the cloud such
as: Identity and Access management Subsystem, Cloud Management System, Map Server, Chat Server, Cloud Logging Subsystem, and Cloud Monitoring and Metering.
• Cloud Client Node: – These run business applications that use cloud services such as Ingest and
Analytics.
Definition: Node Types (cont)
Definition: Node Types (cont)• HDFS – Hadoop Distributed File System
• HDFS Master / NameNode– Executes the file system namespace operations such as reading, writing,
renaming and deleting files. – The Name Node Server is responsible for mapping file blocks into the Data
Nodes Servers.• HDFS Worker / Data Nodes
– Functions include storing and retrieving file blocks from the native operating system file system.
– Coordinates with the NameNode to perform block creation, deletion and replication.
– Called by MR Jobs to serve read/write requests in a highly distributed manner.
• Cloud Structured Storage Service– Responsible for providing a highly scalable and highly
available logical structured storage capability, similar to what is traditionally known as a database
– Can support BILLIONS of rows and MILLIONS of columns– Columns can be added at run-time to accommodate data – Based on
• NSA Cloudbase GOTS• Cassandra
Structured Storage Service
• Cloudbase is a Java based, distributed database, based on the Google BigTable design, created at the NSA
• Based on a Master/Worker model– One Master (daemon) – keeps the overall metadata– Multiple Workers (TabletServer) – stores database tables in HDFS
• Uses HDFS to– Store tables (data)– Store recovery logs, write-ahead logs
• Supports Cell Level Security– Security markings defined by the application, stored and enforced by
Cloudbase
Cloudbase
• Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store – implemented as a Distributed Hash Table (DHT)
• Datastore for Synthesys, and Hypergraph analytics• P2P distribution model -- which drives the consistency model -- means there
is no single point of failure– Each peer server is responsible for a
portion of a Distributed Hash Table (range of keys)
• Writes directly to the local file system – No HDFS
Portion of the keyspace
Keeps track of all members in the cluster
Cassandra
• Cloud Processing Parallelization Service– Support the ability to easily leverage the large amount of processing
resources that is available across multiple nodes• Instead of being limited to one node or a small number of nodes
preconfigured in some type of physical cluster.
– On DSC this may mean:• Parallelizing the processing of a file/source (e.g., ingest), to improve overall
performance• Parallelizing the processing of data in the database in support of some analytic
(exploitation or enrichment), or indexing– thereby improving overall performance response-time.
– Based on Hadoop MapReduce facility for data-centric parallelization– Note – Algorithmic parallelization (a la MPI) is a likely future need
Data Parallelization
• A facility to parallelize processing of files, with fault tolerance• Implemented as Master/Worker model• MR Job Tracker
– Determines how to parallelize a job/application (using a Job configuration) and then schedules the work on a set of distributed Task Trackers (worker), each of which executes a portion of the job/application in parallel, monitoring them and re-executing the failed tasks.
– Tries to assign the work to the node where the data is located in HDFS
• MR Task Tracker– They are given the application software to execute and specifications on which “data split” they need to
perform on
– Periodically report back progress/health to the Job Tracker.
• MR Job/Application– A Job/Application needs to be divided into various structural elements: a Mapper, a Reducer, a Partitioner,
a Reporter and an Output Collector.
– The Job/Application logic reads/writes data from the Dataspace via DSMS MR Helper API
• DSC Job Service:– Facilitates the submission and monitoring of MR jobs via a UI
MapReduce (MR)
Cloud Logging• Cloud Logging Subsystem (CLS)
– A proper logging facility is an extremely important service in a ultra-large scale environment.
– Key functionality of the CLS includes:• Support custom and legacy applications• Support specialized cryptographic operations (e.g. encrypting, digital signatures)• Log “interaction” functionality including searching, reporting, analysis, viewing, etc• Log management including rotating, archiving
– Thee modes:• Java API• Command Line bulk loader• Log4J adapter
– The Security Auditing Subsystem leverages the capabilities of the CLS• Keeps security audit separate
– Based on Cloudera Flume (essentially collectors and sinks)
• Cloud Monitoring and Metering– Holistic Monitoring
• Ability to provide a unified/consistent presentation of the health all monitored components (business application software, infrastructure software, operating systems, hardware, network devices), whether they are custom-developed or third-party acquired
– Control• Ability to control/change the behavior of a monitored component without restarting this component
– Near Real Time• Ability to alert in near real-time system/network/security administrators in response to an event from
“inside” a component
– Historic Trending• Ability to store performance (including resource utilization) and health data of various components over
time for analysis
– All devices (software and hardware) in the cloud are monitored, either using agents (push-model) or by polling (pull model or agent-less)
• All DSC components will include a JMX agent to report their health and support some control (where appropriate)
– Based on Nagios agents and JMX agents
Cloud Monitoring and Metering
• Cloud Management System (CMS)– Oversees the efficient operation of the Cloud Computing Environment
• Condor– Process control and monitoring – restarts process if failure occurs
– Distributed process pool
– Can start distributed processes from any node in cloud
– Integrated with DSC Cloud Management System
• DSC Cloud Management System (CMS)– Defines hierarchy of services and dependencies
– Start/Stop cloud services (via Condor)
– As a defined group
– Individually
– View status of running services via Nagios and exposed JMX beans
• HP Network Automation– Monitor and configure network devices
Cloud Management System