Fundamentally Better Cloud Security - Broadcom Inc.

14
WHITEPAPER AUTHOR Deena Thomchick SENIOR DIRECTOR OF CLOUD, BLUE COAT ContentIQ Cloud DLP StreamIQ ThreatScore User Behavior Analysis Data Governance Contextual Analysis Fundamentally Better Cloud Security TAKING A DATA SCIENCE APPROACH

Transcript of Fundamentally Better Cloud Security - Broadcom Inc.

WHITEPAPER

AUTHOR

Deena ThomchickSENIOR DIRECTOR OF CLOUD, BLUE COAT

ContentIQ

Cloud DLP

StreamIQ

ThreatScore

UserBehaviorAnalysis

Data Governance

Contextual Analysis

Fundamentally Better Cloud Security

TAKING A DATA SCIENCE APPROACH

Fundamentally Better Cloud Security

TAKING A DATA SCIENCE APPROACH

IntroductionElastica CloudSOC uses the latest data science techniques, combining machine learning and advanced

math, to provide fundamentally more intelligent and responsive security for the cloud. Our scientists are

continually developing and tuning data science-driven engines and algorithms that take advantage of

expansive processing and storage resources available in the cloud. This highly flexible scientific approach

enables Elastica CloudSOC to keep up with the speed of change while identifying, analyzing, and controlling

more user activity, data, and apps with more accuracy.

Cloud Challenges for Security Cloud adoption challenges IT departments to secure a constantly changing, vast landscape of cloud

territory that they do not control.

• Cloud providers continually update and change their services without warning (as often as every

other week for some services).

• End users regularly adopt new cloud apps without notifying IT.

• Individual end users control what content they choose to upload and share—often without fully

understanding the risks associated with what they do.

• Third parties opportunistically uncover confidential company data

accidentally shared with the public.

• Cyber criminals target cloud accounts as a means to access data, spread

malware, or exfiltrate data.

• Visibility and control challenges associated with traditional security are

duplicated and intensified in the cloud.

• How do you scale data governance and DLP to the vast amount of content being uploaded and

stored in the cloud? How do you avoid duplicating efforts in defining and tuning policies and ensure

minimal false positives?

• How do you detect cloud threat activity when it occurs outside your network infrastructure? How do

you detect advanced threats or malicious user activity when no signatures exist to identify them?

Profoundly improved visibility, security, and DLP in the cloud by leveraging new data science techniques and elastic cloud resources

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

1

Cloud Offers More Resources for SecurityOn a positive note, the cloud opens up new possibilities for solving

classic security problems. Elastica solutions are built upon innovative data

science-driven engines that break through known constraints of traditional

security solutions by leveraging the power of the cloud itself.

More Processing Power

Cloud resources are elastic and the inherent efficiency of cloud provisioning

means security solutions are no longer constrained by the limited processing

power in an on-prem appliance.

More Storage

The cloud offers flexible, expanded storage options unlike on-prem appliances, making it easy to expand

your storage as needed. Too often security appliances are handicapped by storage limitations—resulting in

security based on incomplete intelligence databases and an inability to save enough log files for effective

incident response investigations.

Critical Intelligence Necessary to Protect Your OrganizationA highly effective and accurate cloud security system must

be able to identify key information and to evaluate the

contextual significance of that information in order to turn

it into useful intelligence. Elastica CloudSOC solutions are

based on data science-driven engines to address critical

categories of intelligence essential to effectively protect

your organization.

1. ACTIVITY

a. Knowledge: What is happening between my

users and the cloud? What actions are my users

taking in what cloud services?

b. Security: Is this activity a problem? Could hackers

or malware be getting into my accounts?

2. CONTENT

a. Knowledge: What sensitive data from my organi-

zation is being kept in the cloud?

b. Security: Is any of this sensitive data exposed or

at risk of being exposed?

What IT must do when users adopt cloud apps

Identify and govern confidential data in the cloud to stay compliant and protect intellectual property.

Protect against increasingly sophisticated and damaging cyber crime.

Unsupervised Machine Learning

for when you know you don’t know what you don’t know.

Unsupervised machine learning lets the machines do freeform data discovery. It is a great way to discover source data necessary to guide learning systems to make smart decisions, when you don’t specifically know what that source data should be.

Supervised Machine Learning

for when you know what you don’t know.

Supervised machine learning is a great way to analyze large quantities of source data and sort it into a foundation of knowledge that can be used by systems to make decisions and take actions. It enables a system to use a much larger set of source data, analyze it based on a larger set of characteristics, and process that big data to achieve more effective outcomes.

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

2

Data Science for Better Visibility, Control, and Response to ThreatsYou need deep visibility into real-time traffic, not just what apps users are accessing, but also what exactly are

they doing within that app. Getting to this level of granular and contextual knowledge is difficult. It requires

a system with the ability to read the real meaning in volumes of traffic that uses obscure machine language

identifiers to communicate with disparate systems. Additionally, this system must be adaptive, able to use a

foundation of knowledge based on a continually learning system because these machine language identifiers

can be changed without notice or documentation at any time by 3rd party cloud service development teams.

Elastica data scientists leverage the unique horsepower of cloud computing and machine learning to build

a rich foundation of knowledge. Based on that foundation, they build contextual algorithms that can deliver

a more detailed understanding of user behavior and cloud activity than possible with other traffic analysis

systems. Then they leverage cloud processing power to execute these advanced algorithms.

StreamIQ™ is the advanced extraction technology that enables Elastica to understand transactions per-

formed by users in a cloud app in more granular detail than what is possible in most traffic analysis systems

such as Next Generation Firewalls and Secure Web Gateways. This improved ability to identify the who, what,

where, and when in traffic between your users and cloud accounts is critical to identifying and acting on

potential threats to your organization.

Analyzing Cloud Activity with StreamIQIn traffic analysis you need to track

what activities are being performed

by what users with what cloud apps in

what context. You need details such

as: What actions are being taken? Are

they associated with a specific file with

specific attributes that would make

them important? Are these actions

associated with a cloud app you

consider risky? Is this activity normal

for this user?

New cloud apps are popping up

all the time and existing cloud

apps are continually changing their

programming. Any system would find

it extremely difficult to keep up with

this constantly shifting environment.

StreamIQ Intelligence Fuels Elastica CloudSOC Detect, Protect, and Investigate

Machine learning in StreamIQ drives more accurate and deeper real-time activity tracking for more cloud apps. Elastica solutions use the unique intelligence in StreamIQ to detect more threats, enforce protection with a more granular level of control, and investigate security incidents more effectively.

StreamIQ

• Identifies more details on granular transactions in live traffic

• Analyzes traffic and identifies instructions custom to many apps—sanctioned and unsanctioned

• Automatically updates to accommodate cloud app code changes to stay accurate

• Powers more accurate risk analysis based on better activity intelligence

• Enables more granular policy controls

• Provides more useful data for incident response investigations

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

3

Identifying action indicators, object identifiers, and user information from machine readable text can be

exceedingly difficult to identify. Traditional approaches can’t keep up and as a result can track only a few

gross identifiers and commonly break without warning when cloud services change their algorithms.

Elastica’s StreamIQ technology leverages both unsupervised and supervised machine learning and very

deep content inspection to extract granular cloud activity, which fuels Elastica’s Protect, Detect and

Investigate applications.

Elastica uses both unsupervised and supervised machine learning to create StreamIQ, the intelligence

engine and algorithms that fuel Elastica traffic analysis, CloudSOC ThreatScores for Detect, Protect

rules for visibility and policy control, and the high quality log data in Investigate for incident response.

Our scientists start with a few significant characteristics known to be associated with important traffic

attributes. They use these as starting points for unsupervised machine learning that can identify significant

instructions in machine code that would be very difficult or maybe impossible to find any other way. This

foundational discovery of significant instructions is then fed into supervised machine learning systems that

provide the content and contextual intelligence needed to turn this data into the foundation of knowledge

in StreamIQ. Then powerful StreamIQ algorithms use this knowledge base to read traffic no other system

can interpret. Because machines do this work fast, the Elastica system can keep up with a continually

changing cloud landscape.

Essentially, StreamIQ figures out what the machine code in cloud traffic actually means thanks to this data

science approach so it can deliver a uniquely granular level of traffic intelligence into the Elastica solutions.

Significance

• The domain is cleared of

portions (“12” and “dl”) that

occlude the actual cloud app

(“filesharing.cloudapp”).

• The action (Downloading as

a ZIP) is not explicitly stated

and must be inferred from

multiple portions of the

URL. For this application,

downloading as a ZIP

indicates that there will be

one or more files comprising

the ZIP, and we should search

for each of them.

• The filenames are embedded

in a hierarchy of data formats

and are not near one another,

increasing the difficulty of

extracting them.

POST https://12.dl.filesharing.cloudapp.com/documents/unshared?

session=KSGBYV8TQZX&t=zip&aqs=chrome..69i57.2678j0j1&sourceid=

chrome&ie=UTF-8 HTTP/1.1

Host : 12.dl.filesharing.cloudapp.com

cookie : PREF=ID=08DHMNG54O2X:U=2Q7SPLK15OTW

content-length : 126

user-agent : Mozilla/5.0 (Macintosh; Intel Mac OS X 10 _ 9 _ 2)

content-type : application/x-www-form-urlencoded;charset=UTF-8

accept : */*

token=9YDP70JR5ZCS&payload={[“file”:“passwords.txt”,

“parent”:“credentials”,”confirm”:false,”expires”:60},

“file”:“id _ rsa”,“parent”:“credentials”,”confirm”:false,

”expires”:60]}

Application filesharing.cloudapp

Action Downloading multiple files as a ZIP file

Files password.txt and id_rsa

StreamIQ Traffic Analysis

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

4

Security that Recognizes Risky ActivityOnce you know what is happening in cloud apps, you must be able to identify if that activity poses a risk.

The key to activity-based security analysis lies in the ability to identify when activity represents abnormal

user behavior likely indicating a threat. Cloud activity that follows typical user behavior patterns indicates

everything is probably normal. Malicious activity, whether caused by a malware attack, a hijacked account,

or a malicious insider, usually manifests abnormal activity that can be identified—for example, more

frequent logins or uploads than normal for a particular user can indicate an account takeover. It may sound

simple, but activating it effectively requires extensive foundational knowledge and smart adaptive tracking

systems. Otherwise, you have a system that doesn’t identify abnormalities very well, or requires too much

manual babysitting because it creates a lot of false positives.

User Behavior AnalysisGeneric user behavior based security controls rely on manually

set event thresholds and simple defined actions. This is not

true user behavior analysis because these simplistic controls

are not set based on individual user behavior. These are based

on gross assumptions are relatively easy to set up but not

very accurate, unless used judiciously and balanced by more

nuanced user behavior analysis. An example of a useful generic

behavior threshold control would be a rule to freeze access to

an account if there were three failed user login attempts within

a short period of time.

Another common generic threshold control is to trigger a

response if a user uploads more than a certain number of files

within a particular time period. But how do you decide what

number constitutes larger than normal when some users hardly

ever upload files and others upload lots of files? If this arbitrary

threshold were too high it won’t catch legitimately malicious

activity, and if it were too low it will trigger lots of false positives

creating extra work for IT and frustration for users.

For Example User A may typically batch upload 50

files every Friday to Salesforce, but never uploads files to

Google Drive, except one day when they batch upload

15 files. User B may rarely upload files to Google Drive

except one afternoon when they suddenly upload 10 files. A generic user behavior threshold based on 20 uploads in 10 minutes would falsely flag User A behavior with Salesforce as potentially malicious, but not flag the Google Drive uploads and wouldn’t register User B behavior as abnormal at all.

B

Usual Behavior Anomalous Behaviorrepeated weekly single instance

A BA

> 50 < 3 15 10

User Behavior Analysis Intelligence Feeds Elastica CloudSOC Detect and Protect

Machine learning enables highly granular personal user behavior profiles to more accurately identify risky activity in cloud apps. Elastica solutions use intelligence based on user behavior analysis and ThreatScores to detect threats, auto-matically enforce policies and provide better visibility into risky activity.

Elastica UBA Intelligence

• More aware of abnormal activity due to more granular under-standing of typical user behavior

• Minimizes false positives though individualized and contextual-ized user behavior modeling

• Faster response with automated ThreatScore calculations

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

5

The Elastica system is designed to identify individual user behavioral patterns in context with app, time,

objects, access method, etc. This user-specific, context-based method is much more accurate for

identifying potentially malicious activity. However, a system that can provide a unique baseline for individual

user behavioral patterns requires the ability to classify, analyze, and maintain a large volume of intelligence

data. It requires a system able to adapt to changing patterns over time, and able to interpret the significance

of deviations from normal and translate that deviation as usable, actionable information.

Individualized & Contextualized User Behavior ProfilesElastica uses machine learning with expansive cloud processing and storage resources to power a self-

training User Behavioral Analysis (UBA) engine. The UBA engine uses computational analysis algorithms to

analyze transactional data from StreamIQ. UBA algorithms develop a confidence curve for normal behavior

customized to individual users in context with specific actions, apps and other attributes to create and

maintain collections of highly accurate user behavior profiles.

This foundation of knowledge baseline for normal activity opens up many more opportunities to

accurately identify abnormal and potentially malicious activity without creating a deluge of false positives

at the same time.

Identifying Suspicious Activity with ThreatScoreOnce a system can identify what is normal, it becomes possible to identify what is abnormal and therefore

suspicious. If only it were so simple. How far from normal must behavior drift before it becomes abnormal?

How do you evaluate increasing levels of risk as abnormal activity increases? How can you enable the

solution to automatically respond with appropriate levels of security controls?

Our scientists tackled this problem with another layer of data

science to identify and measure the severity of activity that

deviates from normal. CloudSOC Detect uses computational

analysis of user behavior to identify and score the severity

of incidents representing risk. It then correlates this user

behavior score with threshold-defined triggers and detection

of suspicious sequences of events to calculate a dynamic,

continually updated ThreatScore for each user and action.

CloudSOC Detect displays a dynamic map of user behavior

events with granular event ThreatScores and color coding to

identify levels of risk severity for each user.

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

6

Data Science for More Accurate Data Governance Your organization already has content in the cloud—probably much more than you realize. With the heavy

adoption of Office 365, Google Apps, Box, DropBox, SalesForce, AWS, etc., it is foreseeable that most of

your content will eventually be housed in the cloud. You need to know what files and accounts contain

sensitive, confidential and/or compliance governed content, who has access to that content, which users are

associated with that content, and how at risk it is to exposure. This is important because exposure can result

in material losses for an organization through the loss of intellectual property and/or compliance violations.

The Elastica ContentIQ™ technology uses data science to tackle DLP and data governance employing both

unsupervised and supervised machine learning techniques as well as computational linguistics analysis to

achieve more accurate content identification and classification.

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

7

Industry-leading solutions such as Symantec DLP offer layers of sophisticated technologies to accurately

identify data, but most DLP offerings are not very accurate at identifying sensitive content due to a number

of factors:

ContentIQ Starts by Building a Better Foundation of KnowledgeInterpreting language, design formats and expression structures in a document, email, database field, etc. is

key to effective DLP. To accurately identify and classify content you need to know a lot about characteristics

that indicate specific types of data. Even the most brilliant algorithms are only as good as the underlying

source data.

Today it is possible to identify a far more extensive volume of content indicators by

leveraging newer data science techniques combined with bigger computational

resources. Our data scientists use unsupervised machine learning for indicator

discovery. We access public collections of big data containing different types of

content for data mining. We apply the discovery and clustering capabilities of unsu-

pervised machine learning to these collections to identify many more class-related

terms, expressions, and characteristics than would be possible manually. Using

cloud resources and automated systems, our programs maintain an up-to-date,

robust foundation of ContentIQ indicator knowledge to feed [refresh] our content

identification and classification engine.

Limited Knowledge of Indicators

To identify indicators of sensitive data, some solu-

tions rely solely on limited dictionaries for regular

expression matching. This makes it difficult to

identify data containing industry or topic specific

terminology or terms in various languages.

Limited or No Contextual Analysis

Regular expression matching in many DLP

solutions have zero or minimal ability to effec-

tively evaluate an indicator or multiple indicators

in context. For example, a 16-digit number can

represent any number of things other than a

credit card number so a system that flags every

file containing a 16-digit number will create

many false positives. The same goes for phone

numbers, social security numbers, etc. Without

context, how accurately can a system identify

data with even broader types of indicators such

as source code or legal content?

Lack of Customized Intelligence

Basic DLP solutions do not have the capability to

customize their analysis of files based on typical

form structures used by a particular organization.

Symantec DLP and Elastica CASB are rare in their

ability to offer this capability.

Performance Constraints

Sometimes a DLP solution is limited to small

dictionaries and simple matching or can only

scan a subset of files because it runs on an

appliance with inherent processing, memory,

and/or storage constraints that inhibits the ability

to scan content in real-time.

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

8

ContentIQ Uses Contextual Analysis to Deliver True Positives

Once a system has a rich source of foundational knowledge, the next step is to build models that can

analyze the relationships between multiple indicators in order to evaluate their significance in context.

Sophisticated contextual analysis will greatly increase a system’s ability to identify true positives that could

easily be missed with traditional systems and it will greatly reduce incidence of false positives.

ContentIQ’s contextual analysis enables a system to accurately classify content by first identifying strong

class indicators as well as terms, structures, and characteristics that may only be weakly connected to a

classification category, then analyzing the collection of indicators in relation to each other. Many weak

indicators with strong relationships help identify difficult to target content such as financial, legal, or design

documents. They also help to prevent false positives by narrowing the likely meaning of a term or expres-

sion based on its relationship to additional identifiers that may be present in the same file.

For example, how do you classify a file that contains a 10-digit number?

To most systems that number looks like this ##########. Is that a

phone number? IP address? Swedish or Danish national ID number?

Record number? Account number? Part number? Inventory volume?

Without contextual analysis this data will be flagged and the probability of

a false positive is high, so manual analysis of this file will be necessary.

Take this example a little further to show how vast the non-contextual

identification problem can be. Say a multinational company with customers

located globally wants to make sure they don’t expose any national ID

numbers. The characteristics of these numbers vary by country. U.S. and

U.K. IDs contain 9 digits; in Sweden and Denmark IDs are 10 digits; in

Russia, Turkey, and Norway they contain 11 digits; in Japan and Malaysia IDs

contain 12 digits; in many other countries IDs are 13 digits. So this company

needs a system that identifies ID numbers that could contain from 9 to

13 digits. A 9, 10, 11, or 12-digit item could also be a phone number. An IP

address is also a 10-digit number. And any of these multi-digit numbers

could be part of an address or they could just be some random record

number, part number, inventory volume number, or account number.

This problem is mitigated when expressions are analyzed in context

with a collection of other indicators. For example, if a 10-digit number

is identified AND it is structurally associated with a name AND Swedish

language indicators appear, it is more likely to be a national ID. If a 10-digit

number is identified AND computer or engineering indicators appear, it is

more likely an IP address. If a 10-digit number is identified AND it is asso-

ciated with a name AND a phone related indicator appears AND/OR what

structurally looks like an address appears, then it is likely a phone number.

555 555-1212

[NAME  OF  PRACTICE]  REGISTRATION  FORM  

(Please  Print)  

Today’s  date:   PCP:  

PATIENT  INFORMATION  Patient’s  last  name:   First:   Middle:   q  Mr.  

q  Mrs.  q  Miss  q  Ms.  

Marital  status  (circle  one)  

  Single    /    Mar    /    Div    /    Sep    /    Wid  

Is  this  your  legal  name?   If  not,  what  is  your  legal  name?   (Former  name):   Birth  date:   Age:   Sex:  

q  Yes   q  No                    /                    /     q  M   q  F  

Street  address:   Social  Security  no.:   Home  phone  no.:  

    (                    )  

P.O.  box:   City:   State:   ZIP  Code:  

       

Occupation:   Employer:   Employer  phone  no.:  

    (                      )  

Chose  clinic  because/Referred  to  clinic  by  (please  check  one  box):   q  Dr.     q  Insurance  Plan   q  Hospital  

q  Family   q  Friend   q  Close  to  home/work   q  Yellow  Pages   q  Other    

Other  family  members  seen  here:    

 

INSURANCE  INFORMATION  (Please  give  your  insurance  card  to  the  receptionist.)  

Person  responsible  for  bill:   Birth  date:   Address  (if  different):   Home  phone  no.:  

               /                  /     (                    )  

Is  this  person  a  patient  here?   q  Yes   q  No      

Occupation:   Employer:   Employer  address:   Employer  phone  no.:  

      (                    )  

Is  this  patient  covered  by  insurance?   q  Yes   q  No    

Please  indicate  primary  insurance   q  [Insurance]   q  [Insurance]   q  [Insurance]   q  [Insurance]   q  [Insurance]  

q  [Insurance]   q  [Insurance]   q  [Insurance]   q  Welfare  (Please  provide  coupon)   q  Other    

Subscriber’s  name:   Subscriber’s  S.S.  no.:   Birth  date:   Group  no.:   Policy  no.:   Co-­payment:  

                 /              /       $  

Patient’s  relationship  to  subscriber:   q  Self   q  Spouse   q  Child   q  Other    

Name  of  secondary  insurance  (if  applicable):   Subscriber’s  name:   Group  no.:   Policy  no.:  

       

Patient’s  relationship  to  subscriber:   q  Self   q  Spouse   q  Child   q  Other    

 

IN  CASE  OF  EMERGENCY  Name  of  local  friend  or  relative  (not  living  at  same  address):   Relationship  to  patient:   Home  phone  no.:   Work  phone  no.:  

    (                    )   (                    )  

The  above  information  is  true  to  the  best  of  my  knowledge.  I  authorize  my  insurance  benefits  be  paid  directly  to  the  physician.  I  understand  that  I  am  financially  responsible  for  any  balance.  I  also  authorize  [Name  of  Practice]  or  insurance  company  to  release  any  information  required  to  process  my  claims.            

  Patient/Guardian  signature     Date      

Swedish Language

Indicators

National ID Number

IP Address

Phone Number

Computer Language

Indicators

Name / Phone / Address

Indicators

Washington DC 20001

North Capitol St. NW 222-22-2222

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

9

Simple Relationship Rules Are Not Enough

Some systems perform a very basic contextual analysis

by using simple assumption-based rules such as if a file

contains “SSN” or “Social Security Number” followed by a

9-digit number, identify it as containing PII. This approach

is better but not sufficient. These systems will not correctly

identify a significant amount of content because it relies

on structural relationship definitions that are too simple.

Sophisticated Relationship Models Are Required

The Elastica scientists take a computational approach to

the challenge of analyzing relationships for ContentIQ.

They leverage supervised machine learning to identify,

codify and prioritize relationships between a broad range

of both strong and weak indicators. Then they distill this

foundational relationship knowledge into computational

models to power ContentIQ algorithms to accelerate

identifying and classifying sensitive data and delivering

true positives without triggering false positives.

Remediate and Control with ContentIQOnce you can confidently identify sensitive content that

is either already in the cloud or on its way to the cloud,

you need to perform a risk analysis. How sensitive do you

consider this class of content? Is it content that falls under

compliance regulations? Is it valuable intellectual property?

Is it data so sensitive that it should not be shared even

within your own organization? ContentIQ will help answer

these questions, making it easier to set your guidelines on

these issues. Once you decide what you want to do, you

Identifying Content Unique to an Organization

Some organizations use specific types of data in formats unique to their organi-zation. The ContentIQ machine learning capabilities used to discover content indicators from publicly available data sources can also be used to discover and leverage content indicators unique to an organization. The learning system is smart enough that CloudSOC customers need to feed just a few examples specific to their organization into their Elastica account as training profiles and ContentIQ will learn to look for content in those formats.

ContentIQ Intelligence Enables Accurate Data Governance and DLP with Elastica CloudSOC Detect, Protect, and Investigate

Machine Learning in ContentIQ powers a sophisticated computational linguistics approach to content analysis to more accurately identify and classify sensitive data. Elastica solutions use the unique intelligence in ContentIQ to detect sensitive data, protect against data leakage, and investigate security incidents more effectively.

• Identify compliance related content such as PII, HIPAA, and PCI — with fewer false positives, even when in nontraditional formats

• Identify classes of sensitive content—source code, design documents—with more accuracy

• Track sensitive data stored in sanctioned apps

• Identify sensitive data in traffic to many different unsanctioned and sanctioned apps

• Power more accurate identification of risky activity

• Enable more accurate data governance policies controls

• Provide useful data for incident response investigations

- 6789

ID: 123-45-6789

Jane Doe

ten million

10/10/20109-digit Zip Code

9-digit Date

9-digit $ Amount

9-digit SSN

9-digit Routing#

10,000,000.00

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

10

can define customized ContentIQ profiles and build Elastica CloudSOC policies to automatically remediate

and control where this content can be stored, enforce encryption on this data, set limitations on accessing

or sharing this data, etc.

The Information You Need for Incident ResponseSecurity incidents will occur. That’s the reality of today’s cloud threat landscape and IT departments will at

some point be scrambling to figure out what happened. This type of investigation can be challenging if not

impossible with traditional perimeter security.

Typical Challenges Faced by Incident Response Investigations

Traditional data sources used to investigate security incidents offer up some big challenges, such as:

• Appliances with limited historical data due to storage resource constraints

• Log data that doesn’t include enough granular information to answer important questions

• Vast quantities of redundant or irrelevant logs requiring lots of manual effort to glean useful

information

• Logs full of data designed to be read by machines not humans making it difficult to

interpret the data they contain

• Inability of most on-premises appliances to monitor cloud usage or activities by mobile users

Well Designed Intelligence Engines and Cloud Resources to the Rescue

The limitations of traditional systems for incident response can be solved by leveraging the cloud, applying

data science driven intelligence gathering, designing great algorithms that can interpret the data and a

system that presents that data in an intuitive, easy to interpret format. This is what Elastica Investigate

delivers—the unsung hero of CloudSOC.

Logs, your foundation of knowledge discovery for

incident response, can only include the activity data

that the original security system can read, so the

quality of this data ultimately depends on the intelli-

gence of your firewall, proxy, IPS, CASB or whatever

system. If the underlying intelligence of a system can

only read gross details in its traffic analysis, that’s all

you’ll get from those logs. This is where the power

of StreamIQ and ContentIQ really shine.

StreamIQ picks up detailed activity data that other traffic analysis systems can’t identify. Then it correlates

activity details with multiple related attributes for contextual analysis and translates it from machine code

to human language. This results in logs that are uniquely full of useful information and easy to understand.

Elastica logs automatically consolidate multiple related less important actions under the one action of

Office 365Bob Jones sent an email to Alice Smith with the subject “Billings” using Exchange on April 12, 2016, 11:32 AM

DropboxALERT [email protected] attempted to Share book.xlsx using Linux and Firefox v43 on April 12, 2016 11:34 AM

BoxFile “book.xlsx” has risk of PII and PCI violations from user [email protected]

Google Drive

ALERT Bob Jones shared document “book.xlsx on April 12, 2016 11:45 AM

Office 365Bob Jones user ThreatScore is now 97, changed for “Too many suspicious location changes” on April 12, 2016 11:59 AM

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

11

related contextual significance. For example: StreamIQ data in logs tells you which user was involved instead

of just presenting IP addresses, and it creates a record that this user logged into a particular account instead

of creating multiple records separately tracking each step of the login process. ContentIQ gives you names

and attributes of files that were involved in a transaction instead of presenting unintelligible object identifiers.

In combination, you get logs that make it easy to track who was accessing what file in what app, what the

attributes were of that file, what changes the user made to that file, and what permission settings were

changed related to that file or account.

Pulling it All Together

The best threat intelligence in the world is useless if it can’t find threats or interpret them in a timely manner.

The first thing you’ll notice when you get to the Investigate dashboard is a Query function. This is key,

because wading through lots of irrelevant logs to find the ones you need is a waste of time. Investigate has

a powerful but easy to use query where you can use a wide range of intuitive query terms combined with

keywords to search by app, user, action, file, etc. Or you can skip query and use the rich set of data filtering

options just beside the query feature.

The Investigate interface pivots based on the data returned from your query or filter settings. It automatically

populates data visualizations and presents relevant logs full of drill down details thanks to all that intelligence

work done by StreamIQ and ContentIQ.

Data Science-Based Policy ControlsLayers of data science driven systems from StreamIQ

and ContentIQ, to User Behavior Analysis, to

ThreatScores make it possible for CloudSOC to provide

visibility and control over cloud apps with an accuracy

not possible with previous CASB technologies.

In Elastica CloudSOC, policies can be defined with

a unique level of granularity due to the detailed

intelligence provided by StreamIQ and ContentIQ. In

an optimal world, policy enforcement would be both

automated and nuanced. Elastica Protect enables you

to use the dynamic user ThreatScore rating system

to trigger policy controls in a manner appropriate to

varying levels of risk severity—from monitoring to alerts

to blocking specific traffic to full user quarantines.

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

12

ConclusionElastica Cloud Security is built on a foundation of data science and cloud resources to deliver fundamentally

better cloud security. This approach enables Symantec solutions to move beyond many well known

limitations of traditional security systems. Layers of machine learning, computational analysis, and intelligent

algorithms go into building the highly accurate and adaptive ContentIQ, StreamIQ, and UBA engines at

the core of Elastica CloudSOC. ThreatScores are calculated based on these engines to facilitate practical

everyday security management, big data visualization, and automated controls.

About the AuthorDeena Thomchick is Senior Director of Cloud at Blue Coat. She’s spent more than 25 years in technology

with a particular focus on security. Her background includes work on encryption, advanced threat protec-

tion, network security and endpoint security

About Blue Coat & Elastica Cloud Security

Blue Coat, Inc. is a leading provider of advanced web security solutions for global enterprises and govern-

ments, protecting 15,000 organizations including over 70 percent of the Fortune Global 500. Through the

Blue Coat Security Platform, Blue Coat unites network, security and cloud, protecting enterprises and their

users from cyber threats—whether they are on the network, on the web, in the cloud or mobile. Blue Coat

was acquired by Bain Capital in May 2015. On June 12, 2016, Symantec and Blue Coat, Inc. announced they

have entered into a definitive agreement under which Symantec will acquire Blue Coat for approximately

$4.65 billion in cash. The transaction has been approved by the Boards of Directors of both companies and

is expected to close in the third calendar quarter of 2016.

Elastica, acquired by Blue Coat in November, 2015, is the leader in Data Science Powered™ Cloud Access

Security. Its CloudSOC™ platform empowers companies to confidently leverage cloud applications and

services while staying safe, secure and compliant. A range of Elastica Security Apps deployed on the

extensible CloudSOC™ platform deliver the full life cycle of cloud application security, including auditing

of shadow IT, real-time detection of intrusions and threats, protection against intrusions and compliance

violations, and investigation of historical account activity for post-incident analysis.

For additional information, please visit elastica.net.

Fundamentally Better Cloud Security Taking a Data Science ApproachCopyright © 2016 Symantec Corp. All Rights Reserved.

13

Copyright © 2016 Symantec Corp. All rights reserved. Symantec, the Symantec Logo, the Checkmark Logo, Blue Coat, and the Blue Coat logo are trademarks or

registered trademarks of Symantec Corp. or its a�liates in the U.S. and other countries. Other names may be trademarks of their respective owners. This document

is provided for informational purposes only and is not intended as advertising. All warranties relating to the information in this document, either express or implied,

are disclaimed to the maximum extent allowed by law. The information in this document is subject to change without notice.