Data Tactics dhs introduction to cloud technologies wtc

50
Data Tactics Corporation 06/07/2 022

description

 

Transcript of Data Tactics dhs introduction to cloud technologies wtc

Page 1: Data Tactics dhs introduction to cloud technologies wtc

Data Tactics Corporation

04/10/2023

Page 2: Data Tactics dhs introduction to cloud technologies wtc

Cloud OverviewIntro to Cloud Technologies: Agenda

1. Intro to Data Tactics2. Cloud3. Data4. Hardware5. Example Cloud Solution

Page 3: Data Tactics dhs introduction to cloud technologies wtc

Data Tactics: Who Are We?• Established in 2005

– Created by a group of seasoned engineers, analysts and management specialists who

– Have worked together for over thirty years– A Minority-Owned Small Business registered with CCR and ORCA– TS-Facility Clearance

• Locations in McLean, VA & Aberdeen Proving Ground, MD,

– Advanced Lab Facilities– Integrated Development, Test, Integration and Evaluation Facilities– Host six (6) clouds for the Army and DARPA– Demonstration Rooms– One Sponsored Certified and Accredited SCIFs

• Prime Contract Vehicles– Army RDECOM BAA– Army I2WD BAA– GSA Alliant Small Business– Subcontracts with several LSI firms across DoD and IC

• Certifications– ISO 9001 – Quality Management Systems (May 2010)– ISO 27000 – Information Management Security Systems (May 2010)

Page 4: Data Tactics dhs introduction to cloud technologies wtc

Data Tactics: What we do• Data Architecture

– Innovation and Design– Assessment and Benchmarking– Collaboration and Uniformity

• Data Engineering– Discovery, Ingestion, and Cleansing– Scientific Analysis– Large Scale Computation and Platforms

• Data Management– Security and Assurance– Infrastructure and Administration– Visualization and Dissemination

Data Tactics Solutions Spectrum

Page 5: Data Tactics dhs introduction to cloud technologies wtc

Data Tactics: Our Family• Over One Hundred Fifty Employees

– Leadership Team, Deeply Experienced, Very Successful, Rich in Relationships– 90% TS/SCI cleared, many with polygraph(s)– Employee retention near 90%

• Steeped in and Dedicated to the Data Tactics Vision• High percentage of Military and Intelligence Community veterans

– Personnel Certifications• ITIL V3 Foundation• PMI certified project managers• CISSP certified security managers• Cloudera Certified Engineers

• 35% of Technical Staff • Software Certifications• Over 10% of Staff are “Data Scientists”

• Three WORLD class semantic researchers• Java, Solaris, Linux, Microsoft, Oracle, VMware, IRIX• Hardware Certifications• Riverbed, EMC, SUN, Dell• Architecture• SOA, DoDAF, other Modeling

– 25% have Advanced Degrees and Doctorates

Page 6: Data Tactics dhs introduction to cloud technologies wtc

Data Tactics: Cloud Experience

• 5 Clouds on SIPRNET• 3 at our secure facility in Tyson’s• GISA, Ft. Bragg• Afghanistan

• 4 at TS/SCI• AF TENCAP on JWICS• NRL on JWICS• DARPA• INSCOM• DSC (pending)

• Over a dozen at Unclassified/FOUO Level• Supporting real-world missions on contract

• CX-I Cloud in Afghanistan• At various levels of complexity

• Cloud Domains is where we live• Data, is the Hard Problem

Page 7: Data Tactics dhs introduction to cloud technologies wtc

Cloud - The Easy Part

Page 8: Data Tactics dhs introduction to cloud technologies wtc

• According to NIST: Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction.

• Five essential characteristics of Cloud Computing:

NIST Definition

1. Broad Network Access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms

2. Rapid Elasticity: Capabilities can be elastically provisioned and released

3. Measured Service: Cloud systems automatically control and optimize resource use by leveraging a metering capability

4. On Demand Self Service: A consumer can unilaterally provision computing capabilities, such as server time and network storage, as needed automatically without requiring human interaction with each service provider.

5. Resource Pooling: The provider’s computing resources are pooled to serve multiple consumers using a multi-tenant model.

Page 9: Data Tactics dhs introduction to cloud technologies wtc

• From a service model perspective, cloud is also divided into:– SaaS, PaaS, IaaS

• Software Services [SAAS]– A type of reusable bundle of functionality – (typically business related and infrastructure – related) which may be accessed by other cloud

computing components, software or end users directly to create meta-applications. These bundles of functionality execute within the cloud.

NIST Definition (Cont)

• The Run Time Platform [PAAS]– A solution stack as a service that facilitates deployment and run-time of applications that

required specialized run-time environments. J2EE (clustered), .Net (clustered), Web Technologies (basic servlet, Web Services), basic or clustered, Virtualization, HPC program, Grid

• The Cloud Infrastructure [IAAS]– Compute service, storage service, data parallelization service, remote access service,

management service, and security service.

Page 10: Data Tactics dhs introduction to cloud technologies wtc

• Deployment Models are divided into:– Public, Private, Hybrid, Community– Public Cloud: The cloud infrastructure is

provisioned for open use by the general public.– Private Cloud: The cloud infrastructure is

provisioned for exclusive use by a single organization comprising multiple consumers (e.g., business units).

NIST Definition (Cont)

– Hybrid Cloud: The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound together by standardized or proprietary technology that enables data and application portability.

– Community Cloud: The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (e.g., mission, security requirements, policy, and compliance considerations).

Page 11: Data Tactics dhs introduction to cloud technologies wtc

Beyond NIST Definition• Limited to three service models

– IaaS, PaaS, SaaS• Does not identify the capabilities of distributed systems

– Big Data revolutionary capabilities– Linear scaling – Logarithmic reduction in processing times– Dramatic cost reductions to accomplish tasks formerly in the realm of traditional HPC

Page 12: Data Tactics dhs introduction to cloud technologies wtc

Evolution of Cloud in IC• Evolution of Cloud in IC/DoD

– JIOC-I– DCGS-A

• Version 2• Version 3• Rainmaker• DSC

– DARPA– I2 Pilot

• Customer Requirements drive Solution– Not cookie cutter

• Budget, Data, Performance, Security, existing components are all drivers

Page 13: Data Tactics dhs introduction to cloud technologies wtc

IC Cloud ‘Flavors’

• Compute Cloud (or Data Cloud)– Handle 3Vs – volume, velocity, variety of data– Petabyte to Exabyte scale with linear scaling– Big Table like construct and supporting capabilities

• Hadoop File System (HDFS)• MapReduce• Zookeeper• Accumulo

• Utility Cloud– Contains the suite of services/apps that interact with the Compute Cloud

• Ozone Widget Framework (OWF)• Web Tier – Data Visualization Tools• Auditing

• Storage Cloud– Scalable storage for large files – Imagery, FMV– Shared Directories

• Most IC Cloud Implementations are a combination of all three flavors

wconroy
I think we need more info on this topic ("Beyond the NIST Definition").
Page 14: Data Tactics dhs introduction to cloud technologies wtc

Data and Utility Cloud Example

LAYER 4Cloud Analytics

LAYER 3Cloud Services

LAYER 2Cloud Software

LAYER 1Cloud Hardware

HardwareUtility CLOUDGHOSTMACHINE

Page 15: Data Tactics dhs introduction to cloud technologies wtc

Cloud Stack Feature List Example• User Capability

– Thin Client Dataspace Retrieval via Query• Textual or geospatial query• Display with time-wheel, geo-spatial map display, link-graph display

– Thin Client Search for Resolved Entities• Display with Document, Entity Graph, Timeline, Timewheel, Entity Viewers

– User Upload of Analyst Products to Cloud– Persistent Data Query and Alerting (on ingest)– Integrated Chat Widget, and Widget for Querying External Systems– Uniform Widget Experience and Data Sharing between Widget Views

• Data Sourcing– Flexible, secure Data Ingest Architecture– Ingest processes three examples of data formats

• Unstructured artifacts (e.g., free form report)• Semi-structured (e.g. email message)• Structured artifacts (e.g. RDB table, XML document)

Page 16: Data Tactics dhs introduction to cloud technologies wtc

Example: Cloud Stack Feature List

• System Capability– Advanced Analytics for Entity Extraction, Resolution and Link Analysis– API for Retrieval of Artifacts, Metadata and Semantic Indices– API for Application Access to Analytics Results– Cloud-to-Cloud Integration

• Between applications running inside the cloud, and • Between applications running inside with applications running outside the cloud• Multiple messaging formats and tools supporting Inter Cloud integration for data, services, and resources

– Cloud Computing Infrastructure, Scalable Storage with Double Redundancy– Security Infrastructure with Role and User Based Access Control

• Management Capability– Log Viewer App: Ability to Monitor Users and Activities in Cloud– User Management App: Ability to Define Access for Users– Ingest Monitor widget: Ability to Track Progress of Data Ingest– Bulk Export/Import App for Dataspace– Cloud Management System for Monitoring and Control– Workflow Management System

(Cont)

Page 17: Data Tactics dhs introduction to cloud technologies wtc

Data – The Hard Part

“The need to securely share actionable, timely, and relevant

classified information among state, local, tribal, and private sector

partners in support of homeland security is critical as we work together to address evolving

threats,” said Secretary Napolitano

Page 18: Data Tactics dhs introduction to cloud technologies wtc

The data architecture is divided between 1) a Mission Dataspace, and 2) an Operational Dataspace

• Mission Dataspace– The primary business driver (or mission) for our customers is to support the Intelligence

Community (IC) with a "solution for intelligence data integration, search, exploration, enrichment, exploitation, management, and sharing“

• The data on which these activities are undertaken is referred to as mission data and is stored in the Mission Dataspace.

• Operational Dataspace– A location to persist operational data - data that is directly used/created by

infrastructure application software to support their operation/execution, which in turn supports the mission

• Includes input data (information entering the system for the first time from a system or end-user), work and message queues, temporary results, configuration files, and any purely transient information

– Typically this data has a very narrow purpose (that of supporting a particularly business or infrastructure application).

Dataspace can be implemented using 1) HDFS, 2) Cloudbase, 3) Cassandra, 4) MySQL, 5) FS (local, SAN), 6) Oracle (limited)

Data Architecture

Page 19: Data Tactics dhs introduction to cloud technologies wtc

Data Models

Unified Dataspace Example

The Wild• Data sources with rich data & semantic context locked in domain silos• Data tightly coupled

to data-models• Data-models tightly

coupled to storage models

Silos isolated by• Implementation

technology• Storage structure• Data representation• Data modality

Segment 2 - Data Description

Segment 1 - Artifact Description

Segment 3 - Model Description

Unstructured Data

Rich semantic context

Rich data context

IntegrationEnrichmentExploitationExplorationAcross all sources

Structured Data

Page 20: Data Tactics dhs introduction to cloud technologies wtc

• Structure– Segment 1: Artifact Description Framework (ADF)

• Universal store for unstructured data (documents)• Indexes

– Segment 2: Data Description Framework (DDF)• Universal store for structured data (entities,

attributes, relationships)

– Segment 3: Model Description Framework (MDF) • Universal store for data / knowledge models

– Reference Data• Used to “normalized” data in other segments• Used to support business functionality (e.g., lists of alternative name spellings for search,

dictionaries)

– Inverted Indexes• Specialized indexes to support business functionality (search, analytics)

Mission Dataspace Data Model

Data Models

Segment 2 - Data Description

Segment 1 - Artifact Description

Segment 3 - Model Description

Unstructured Data

Rich semantic context

Rich data context

IntegrationEnrichmentExploitationExplorationAcross all sources

Structured Data

Page 21: Data Tactics dhs introduction to cloud technologies wtc

• DDF – looks at data in the following ways– Mention: A chunk of data, either physically located within a tangible artifact, or

contained within an analyst’s mind • “Washington” at offset x in file Y

– Sign: A representation of all disambiguated mentions that are identical except for their indexicality

• E.g., “Washington”– Concept: An abstract idea, defined explicitly or implicitly by a source data-model

• E.g., City, Person, Name, Address, Photo– Predicate: An abstract idea used to express a relationship between “things”

• E.g., isCity, isPerson, hasName, hasAddress, hasPhoto– Term: A disambiguated sign abstracted from the source artifact or asserting analyst

• E.g., Washington Person; Washington Location– Statement: Encodes a binary relationship between a subject (term) and an object

mediated by a predicate• E.g.,[Washington, Person] hasPhoto [GeorgeWashingtonImage.jpg]

Data Description Framework (DDF)

Page 22: Data Tactics dhs introduction to cloud technologies wtc

Operational Dataspace

• End user storage (documents, preferences, products)

• System events/traps

• Performance/resource utilization metrics/history

• Application log messages

• Messaging infrastructure message persistence

• Data Surveillance: watch patterns, subscriptions and notification profiles. May also need some working space

• Temporary indexes as well as final index sections (shards)

• Persistence of distributed state in case of total failure

• Directory for digital certificates (LDAP)

• Directory for security authorizations (LDAP)

• Security audit events

• Threat assessment results

• Vulnerability assessment results

• “Scratch” area used by various applications

• Working area to move files in/out of cloud

• Policies, rules, configurations, etc

• CM repository

Page 23: Data Tactics dhs introduction to cloud technologies wtc

Good Questions to be asked

Page 24: Data Tactics dhs introduction to cloud technologies wtc

Examples1. What are your requirements for Cloud

Computing?1. Integrate Federated Workforce into Headquarter

Business Processes1. How many?

2. Enterprise Storage Capabilities1. For HQ or regions across the world/country?

3. Provide Analytics for discovering and creating knowledge

4. Sharing Information

2. What are the handling requirements for your data?

1. Classified/LES2. US Persons3. Title 6, 10, and/or 504. ICD 501/5035. MOUs

3. What is the anticipated security level associated with your cloud vision

1. PL2, PL3, PL4???

4. What are the complexities associated with your data in its current state?

1. Unstructured documents on shared drive2. Structured in legacy main frame3. Semi-Structured documents with strict handling

procedures (stored in ECM?)4. Amount of Data (GB vs TB)

5. What is your budget?1. Open Source vs Open Source and COTs solution

6. What is your timeline?1. Solution can be driven by speed of delivery vs

functional requirements, as an example.1. Leverage existing cloud solutions as a starting point, rather than

a final product

7. What components (Software & Hardware) are available for reuse?

1. Servers, SANs, Networking gear2. Meta Data Extractors3. One Way Guards4. VM licenses

Page 25: Data Tactics dhs introduction to cloud technologies wtc

A Real-World Example

Building up the Cloud

Distributed Common Ground System – Army (DCGS-A) Standard Cloud (DSC)

Page 26: Data Tactics dhs introduction to cloud technologies wtc

Business Need for DSC

Bridge the whole IC and all Services with an open data and processing capability – the Dataspace

Break the Data Barriers• End data silos and their proliferation• Provide a universal data storage and computational fabric• Make data ingest faster and simpler• Allow data to be endlessly reshaped / reused• Search, enrich, integrate/ fuse, exploit within and across all

data sources and domains

Stop Moving Data, Start Using Data• Ingest once, reuse endlessly• Move computation to the data (and not data to the

computation)• Build highly sophisticated exploitation tools and

applications• Create quick mashups and mission applications• Surf around and explore the entire Intel Dataspace• Connect all the dots in any way that makes sense from any

mission perspective• Change your mind and do it again, and again… in new ways

without messing up what you already have

Achieve Previously Unachievable Scale• Go bigger, faster, larger• Realize a truly large-scale data store• Embrace an unbounded diversity of data, processing, and

applications• Achieve orders of magnitude greater processing power• Expose familiar usage metaphors (e.g. Google, Amazon)

Get More Bang for the Buck• Deploy using fully automated procedures• Avoid almost all SW licenses• Stay up and running with an inherently robust design that uses

commodity HW

Do New Science and Develop New Practice Around Intelligence• Explore data and processing at entirely new scale and discover new

insights and phenomena • Cultivate an ever growing, increasingly rich, and productive

Dataspace

Page 27: Data Tactics dhs introduction to cloud technologies wtc

DSC Software Stack

InfrastructureAs a Service

PlatformAs a Service MapReduce / HDFS / Flume / Oozie

Cloudbase / Katta / Zookeeper

JVM / Apache HTTP/Proxy / Tomcat

HPSA (Puppet?) / HPNA

DNS / DHCP / NFS / NTP

Condor / Cloud Management System

Logging / Auditing / Nagios / Ganglia

SoftwareAs a Service

Client Services

PREFS / OWF / Safemove / OpenFire

V3 / MFWS / DIB / GeoSpatial / BC

GeoServer / Element Index / AntiVirus / AIDE

DSMS / ASLI / ActiveMQ / Alerting

Ozone Widgets

Servers / SAN / Network / Facilities

Linux / LDAP / MySQL / CAS

wconroy
Confirm HPSA has been replaced by puppet.
Page 28: Data Tactics dhs introduction to cloud technologies wtc

• The DSC System production hardware is housed in a single twenty foot Performance-Optimized Data (POD) data center

• The POD is configuredto maximize its hardware payload while taking intoconsideration

– Overall power availability– Individual device power

consumption and power dissipation– Individual device weight– Individual device heat generation

Facility

Page 29: Data Tactics dhs introduction to cloud technologies wtc

Infrastructure – Hardware Profile

• Two rack types– Compute – 222 servers– Management – 6 servers

• 1,824 cores• ~100,000 MIPS (assumed Java 50 CPI)• 1.035 PB disk storage (raw)• 13.92 TB physical memory (RAM)• Environmental support

– Active power w/backup generator– Two live coolers w/backup cooler

Page 30: Data Tactics dhs introduction to cloud technologies wtc

• Processor :– Two Quad Core X5570 2.93GHz Intel Xeon CPUs -> 8 cores per servers

• Memory configuration – varies:– 25 (of 222) nodes with 144 GB Memory via 18 8GB DIMMs [approx: $14K]– 75 (of 222) nodes with 72 GB Memory via 18 4GB DIMMs [approx: $10K]– 122 (of 222) nodes with 36 GB Memory via 18 2GB DIMMs [approx: $8K]

• Storage:– Eight 500GB 6G SAS 7.2K 2.5in MDL Disk Drives– RAID 5

• Power:– N+N 750W Power Supplies

Compute Server - Profile

25 x $14K 75 x $10K+122 x $8K $2.076M

Page 31: Data Tactics dhs introduction to cloud technologies wtc

Key design features:

• Separation of the mission and management/operational data to ensure security and performance of the solution– Using VLANs

• Connection to the DSC cloud will be restricted to entry point nodes for a single security choke point (Cloud Access Point Nodes). – Greatly simplifies boundary security– The POD internal network will be non routable (10.x.x.x) with external access

only through the two entry points• Redundant paths from servers and enclosures via stacking cables to

redundant switches at the core provides resiliency from core switch failures as well as cabling faults. – Each node has access to two independent switches in the enclosure.

Network Architecture

Page 32: Data Tactics dhs introduction to cloud technologies wtc

Network Architecture (cont’d)

Compute Rack

Management Rack

SAN connected toall 6 mgmt nodes

10GbE Core Switches

Single access point through NAC Leaf Switches

interconnect all nodes

Page 33: Data Tactics dhs introduction to cloud technologies wtc

DSC Cloud Elevations222 Compute Nodes

1776 cores, 12 TB RAM, 888 TB DFS6 Management

Nodes

2 SANs168 TB Network Nodes

Page 34: Data Tactics dhs introduction to cloud technologies wtc

DSC 1.5.3 U/I

Page 35: Data Tactics dhs introduction to cloud technologies wtc

SandStormCommon Map Widget

Time Line

DSC 1.5.3 U/I

Page 36: Data Tactics dhs introduction to cloud technologies wtc

ICast

Page 37: Data Tactics dhs introduction to cloud technologies wtc

Cloud Text Analytics

Page 38: Data Tactics dhs introduction to cloud technologies wtc

Element Graph Viewer

Page 39: Data Tactics dhs introduction to cloud technologies wtc

BACKUP SLIDES

Page 40: Data Tactics dhs introduction to cloud technologies wtc

• Cloud Head Node: – These nodes are responsible to execute the various cloud service

“masters”– These masters are collocated together as many of them work together.

One node may be responsible to run the Mission Dataspace version of these services, and another node may support the Operational Dataspace. A third node may act as a failover.

• Cloud Access Point Node: – These nodes host the Web Infrastructure Services (web-servers and

proxies) and portions of the Application and Systems Integration Subsystem, and act as a physical gateway into the cloud.

Definition: Node Types

Page 41: Data Tactics dhs introduction to cloud technologies wtc

• Cloud Infrastructure Nodes: – These nodes are divided into two categories:

• Low-level infrastructure applications such as DNS, DHCP and NTP (part of Core Infrastructure Services).

• Cloud services (workers corresponding) to the Cloud Head Nodes (see above)

• Cloud Management Nodes: – These nodes run general purpose applications used to manage the cloud such

as: Identity and Access management Subsystem, Cloud Management System, Map Server, Chat Server, Cloud Logging Subsystem, and Cloud Monitoring and Metering.

• Cloud Client Node: – These run business applications that use cloud services such as Ingest and

Analytics.

Definition: Node Types (cont)

Page 42: Data Tactics dhs introduction to cloud technologies wtc

Definition: Node Types (cont)• HDFS – Hadoop Distributed File System

• HDFS Master / NameNode– Executes the file system namespace operations such as reading, writing,

renaming and deleting files. – The Name Node Server is responsible for mapping file blocks into the Data

Nodes Servers.• HDFS Worker / Data Nodes

– Functions include storing and retrieving file blocks from the native operating system file system.

– Coordinates with the NameNode to perform block creation, deletion and replication.

– Called by MR Jobs to serve read/write requests in a highly distributed manner.

Page 43: Data Tactics dhs introduction to cloud technologies wtc

• Cloud Structured Storage Service– Responsible for providing a highly scalable and highly

available logical structured storage capability, similar to what is traditionally known as a database

– Can support BILLIONS of rows and MILLIONS of columns– Columns can be added at run-time to accommodate data – Based on

• NSA Cloudbase GOTS• Cassandra

Structured Storage Service

Page 44: Data Tactics dhs introduction to cloud technologies wtc

• Cloudbase is a Java based, distributed database, based on the Google BigTable design, created at the NSA

• Based on a Master/Worker model– One Master (daemon) – keeps the overall metadata– Multiple Workers (TabletServer) – stores database tables in HDFS

• Uses HDFS to– Store tables (data)– Store recovery logs, write-ahead logs

• Supports Cell Level Security– Security markings defined by the application, stored and enforced by

Cloudbase

Cloudbase

Page 45: Data Tactics dhs introduction to cloud technologies wtc

• Cassandra is a highly scalable, eventually consistent, distributed, structured key-value store – implemented as a Distributed Hash Table (DHT)

• Datastore for Synthesys, and Hypergraph analytics• P2P distribution model -- which drives the consistency model -- means there

is no single point of failure– Each peer server is responsible for a

portion of a Distributed Hash Table (range of keys)

• Writes directly to the local file system – No HDFS

Portion of the keyspace

Keeps track of all members in the cluster

Cassandra

Page 46: Data Tactics dhs introduction to cloud technologies wtc

• Cloud Processing Parallelization Service– Support the ability to easily leverage the large amount of processing

resources that is available across multiple nodes• Instead of being limited to one node or a small number of nodes

preconfigured in some type of physical cluster.

– On DSC this may mean:• Parallelizing the processing of a file/source (e.g., ingest), to improve overall

performance• Parallelizing the processing of data in the database in support of some analytic

(exploitation or enrichment), or indexing– thereby improving overall performance response-time.

– Based on Hadoop MapReduce facility for data-centric parallelization– Note – Algorithmic parallelization (a la MPI) is a likely future need

Data Parallelization

Page 47: Data Tactics dhs introduction to cloud technologies wtc

• A facility to parallelize processing of files, with fault tolerance• Implemented as Master/Worker model• MR Job Tracker

– Determines how to parallelize a job/application (using a Job configuration) and then schedules the work on a set of distributed Task Trackers (worker), each of which executes a portion of the job/application in parallel, monitoring them and re-executing the failed tasks.

– Tries to assign the work to the node where the data is located in HDFS

• MR Task Tracker– They are given the application software to execute and specifications on which “data split” they need to

perform on

– Periodically report back progress/health to the Job Tracker.

• MR Job/Application– A Job/Application needs to be divided into various structural elements: a Mapper, a Reducer, a Partitioner,

a Reporter and an Output Collector.

– The Job/Application logic reads/writes data from the Dataspace via DSMS MR Helper API

• DSC Job Service:– Facilitates the submission and monitoring of MR jobs via a UI

MapReduce (MR)

Page 48: Data Tactics dhs introduction to cloud technologies wtc

Cloud Logging• Cloud Logging Subsystem (CLS)

– A proper logging facility is an extremely important service in a ultra-large scale environment.

– Key functionality of the CLS includes:• Support custom and legacy applications• Support specialized cryptographic operations (e.g. encrypting, digital signatures)• Log “interaction” functionality including searching, reporting, analysis, viewing, etc• Log management including rotating, archiving

– Thee modes:• Java API• Command Line bulk loader• Log4J adapter

– The Security Auditing Subsystem leverages the capabilities of the CLS• Keeps security audit separate

– Based on Cloudera Flume (essentially collectors and sinks)

Page 49: Data Tactics dhs introduction to cloud technologies wtc

• Cloud Monitoring and Metering– Holistic Monitoring

• Ability to provide a unified/consistent presentation of the health all monitored components (business application software, infrastructure software, operating systems, hardware, network devices), whether they are custom-developed or third-party acquired

– Control• Ability to control/change the behavior of a monitored component without restarting this component

– Near Real Time• Ability to alert in near real-time system/network/security administrators in response to an event from

“inside” a component

– Historic Trending• Ability to store performance (including resource utilization) and health data of various components over

time for analysis

– All devices (software and hardware) in the cloud are monitored, either using agents (push-model) or by polling (pull model or agent-less)

• All DSC components will include a JMX agent to report their health and support some control (where appropriate)

– Based on Nagios agents and JMX agents

Cloud Monitoring and Metering

Page 50: Data Tactics dhs introduction to cloud technologies wtc

• Cloud Management System (CMS)– Oversees the efficient operation of the Cloud Computing Environment

• Condor– Process control and monitoring – restarts process if failure occurs

– Distributed process pool

– Can start distributed processes from any node in cloud

– Integrated with DSC Cloud Management System

• DSC Cloud Management System (CMS)– Defines hierarchy of services and dependencies

– Start/Stop cloud services (via Condor)

– As a defined group

– Individually

– View status of running services via Nagios and exposed JMX beans

• HP Network Automation– Monitor and configure network devices

Cloud Management System