The Data Rich and Information Poor

25
Presenter: Date: The Data Rich and Information Poor Retention, Technology, Metrics, & IG John Cofrancesco 5/1/2019 Private & Confidential Copyright ©2015 Active Navigation

Transcript of The Data Rich and Information Poor

Presenter:

Date:

The Data Rich and Information Poor Retention, Technology, Metrics, & IG

John Cofrancesco

5/1/2019Private & Confidential

Copyright ©2015 Active Navigation

Information Governance vs Records Management

➢The nature of our business has changed in the last 24 months • Has anyone asked you about

cybersecurity?

• Have you been dealing more in completed records or your S-drive?

• Does your IT staff include you in data planning?

In records management we will always be an after thought with limited budgets and resources.

The progression of our information economy makes IG the unavoidable future and the people and

organizations that put themselves at the center of it will reap the rewards.

Big numbers are scary -- but do they mean anything?

1. Data volume is exploding: more data has been created in the past two years than in the entire history of the human race.

2. Our accumulated digital universe of data will grow from 4.4 zettabytes today to around 44 zettabytes, or 44 trillion gigabytes next year.

3. We are seeing a massive growth in video and photo data, where every minute up to 300 hours of video are uploaded to YouTube alone.

1. Our data volume is exploding and the cost of our file-shares are growing by Some Number a Year

2. Our accumulated digital universe will increase our potential e-discovery costs by Some Number a Year

3. We are seeing a massive growth in the number of systems we use and the cost of those new systems in addition to legacy systems is Some Number a Year

OR

People are care about the things that they can use to connect to their world. Empower your IG program by

focusing on what concerns your leadership.

Failure Number 1

• HP TRIM

• Train the world

• They will “do records”

• Software Stinks

• It is not like riding a bike

• They hate “doing records”

Failure Number 2

• Better “open source” software

• Build it into their process

• They won’t know they are “doing records”

• Software still stinks

• Not my process

• They found out they are “doing records”

Traditional Solutions Fall Short

• No expertise; focused on adjacent use cases (migration, eDiscovery, identity and access management)

• Heavyweight architecture optimized for different problem and does not scale• Agents hard to deploy and maintain

• Full text indexes take too much to deploy, maintain and query

• Entire solution too costly to implement

• Insufficient decision support for confident actions; nothing gets disposed of

• Full feature set unavailable for management in place• Data loss prevention works for data in motion

• eDiscovery requires a collection process and offline processing

• Inflexible classification engines (to support policies)• Do no readily support customer nuances

• Cannot operate at file level

Proprietary Information of Active Navigation6

Information Challenge

Lots of data

Legislation

and

regulation

Customers’

ethical

expectations

Internal

compliance

Don’t move

the data

Cost

pressure

Malicious

operators

It should be

easy

Proprietary Information of Active Navigation7

File vs Data Analytics & What is ‘Big Data?’

0 8 1 4 8 8 0 8 1 5 8 8

1 0 0 8 8 9 1 0 0 9 8 9

0 7 0 4 7 6 0 7 0 5 7 6

1 2 3 1 1 7 1 2 3 2 1 7

0 1 0 1 1 8 0 1 0 2 1 8

1 0 1 7 7 7 1 0 1 8 7 7

0 8 1 4 8 8 1 0 0 8 9 0

1 0 0 8 8 9 1 1 0 8 9 0

0 7 0 4 7 6 0 7 0 4 7 7

1 2 3 1 1 7 1 2 3 1 1 9

0 1 0 1 1 8 0 1 0 1 1 9

1 0 1 7 7 7 1 0 1 7 7 8

100 50

400 100

100 80

500 0

10 0

50 25

100 50

150 75

100 80

500 0

10 0

50 25

Data Analytics

0 8 1 4 8 8 0 8 1 5 8 8

1 0 0 8 8 9 1 0 0 9 8 9

0 7 0 4 7 6 0 7 0 5 7 6

1 2 3 1 1 7 1 2 3 2 1 7

0 1 0 1 1 8 0 1 0 2 1 8

1 0 1 7 7 7 1 0 1 8 7 7

0 8 1 4 8 8 1 0 0 8 9 0

1 0 0 8 8 9 1 1 0 8 9 0

0 7 0 4 7 6 0 7 0 4 7 7

1 2 3 1 1 7 1 2 3 1 1 9

0 1 0 1 1 8 0 1 0 1 1 9

1 0 1 7 7 7 1 0 1 7 7 8

100 50

400 100

100 80

500 0

10 0

50 25

100 50

150 75

100 80

500 0

10 0

50 25

Data Analytics

• Compare known data

• Allows you to make averages

• Gives value to structured data

• Take action outside the data

What is ‘Big Data?’

0 8 1 4 8 8 0 8 1 5 8 8

1 0 0 8 8 9 1 0 0 9 8 9

0 7 0 4 7 6 0 7 0 5 7 6

1 2 3 1 1 7 1 2 3 2 1 7

0 1 0 1 1 8 0 1 0 2 1 8

1 0 1 7 7 7 1 0 1 8 7 7

0 8 1 4 8 8 1 0 0 8 9 0

1 0 0 8 8 9 1 1 0 8 9 0

0 7 0 4 7 6 0 7 0 4 7 7

1 2 3 1 1 7 1 2 3 1 1 9

0 1 0 1 1 8 0 1 0 1 1 9

1 0 1 7 7 7 1 0 1 7 7 8

100 50

400 100

100 80

500 0

10 0

50 25

100 50

150 75

100 80

500 0

10 0

50 25

Shirts Brand A-Cost

12 The Gap $15

2 JCrew $50

Big Data Analytics

• Compares unrelated data

• Allows you to guess at reasoning

• Gives value to huge data sets

• Take action outside the data

File Analytics

0 8 1 4 8 8 0 8 1 5 8 8

1 0 0 8 8 9 1 0 0 9 8 9

0 7 0 4 7 6 0 7 0 5 7 6

1 2 3 1 1 7 1 2 3 2 1 7

0 1 0 1 1 8 0 1 0 2 1 8

1 0 1 7 7 7 1 0 1 8 7 7

0 8 1 4 8 8 1 0 0 8 8 9

1 0 0 8 8 9 1 1 0 8 9 0

0 7 0 4 7 6 0 7 0 4 7 7

1 2 3 1 1 7 1 2 3 1 1 9

0 1 0 1 1 8 0 1 0 1 1 9

1 0 1 7 7 7 1 0 1 7 7 8

100 50

400 100

100 80

500 0

10 0

50 25

100 50

150 75

100 80

500 0

10 0

50 25

File Analytics

• Found all the birthdates

• Cut across the documents

• Reports their location

• Lets you action them

Data breach investment gap

Invest

men

t

3 to 6 Days* 256 to 388 Days*

250* Days*Ponemon Institute —2015 Cost of Data Breach Study: Global Analysis

Time to gain entry Time to exit

Free play time!

12

Solution Comparison vs Best ApproachBest Approach Needs FA Approach eDiscovery

Identity and Access

Mgt

Low deployment footprint for supportability

and appropriate global investment

2-3% of content

footprint

>20% of content

footprint

Does not scale

globally

Manage in place; migration or replication is

not an option

Designed for all

actions in places

Take a copy for

offline processing

Does not scale

globally

Taking action is HARD; experience and

solution designed for the job

Charting and review

against policies

Review on single

matter only

Poor decision

support for action

Flexible credentials for complex permissions

environment

Fully customizable

credentials mgt

Take a copy for

offline processing

Elevated service

accounts required

Adaptable engine to meet specific and

regional peculiarities

Fully customizable

across all locale

Review on single

matter only

Applicable only for

sensitive data cases

Decision support and review environment

which connects SMEs, visually, to their data

Charting and review

against policies

Not available for

SME review

Poor decision

support for action

Ability to roll up and project progress across

entire deployment

Mgt reporting

aggregates all data

No meaningful

global reporting

Does not scale

globally

Proprietary Information of Active Navigation13

File Analysis & File Analytics

“File analysis enables data architects, legal and security professionals,

storage managers, and business analysts to understand and manage

unstructured data stores, reduce risk and costs, and make better

information management decisions for unstructured data” – Gartner 2017

“Technology Can Do The Heavy Lifting When Unmanaged

Documents Need To Be Cleaned Up, Migrated, And

Mined For Insights” - Forrester 2017

The future belongs to those who own the data

➢ Because of the move to cloud computing, it is unlikely that ECM’s will be able to function as standalone systems

➢ Google, Amazon, and Microsoft will be the only main players left in the mainstream

➢ One or perhaps two ECM’s will persist for local deployment and specialty

1. Did moving the box in which your data lived really improve your organization?

2. Do you feel like you are doing something to someone rather than for someone?

3. Why not build a system around the habits of your users rather than trying to change their habits?

The box matters less than the process: building a system around process will save on costs,

improve the program, and provide opportunities to add value.

IG Capability Maturity Model

Kill the File Share

Start with a Pilot Project to Establish a Repeatable Process

Task

Week

1 2 3 4 5 6 7 8

0.0 Hold Project Kickoff Session

1.0 Modify Index Configuration and Index Content

1.1 Hold review session on metadata and rules; revise metadata in DC

1.2 Set up permissions and index pilot site content

1.3 Preview indexed content via remote session; prepare for workshops

2.0 Conduct ROT and Sensitive Data Workshops with Business Unit

2.1 Schedule workshops, communicate objectives

2.2 Conduct workshops, provide reports for reviewers to markup

2.3 Review and markup reports, return for upload

2.4 Take actions based on markups

2.5 Compile reports, prepare and deliver management presentation

2.6 Refine project process as necessary

3.0 Prepare for Migration (Optional)

4.0 Migrate Cleansed and Restructured Files (Optional)

Proprietary Information of Active Navigation18

Roll Out Repeatable Process Across BUs, Enterprise

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Business Unit 1 Pilot

Business Unit 2

Business Unit 3

Business Unit 4

Business Unit 5

Business Units 6-

Weeks

Proprietary Information of Active Navigation19

Who Do You Need to Kill Your File Share?

Information Governance Policy

Body

Enterprise Core Team

Business Unit Teams

Organizations Involved Information Governance Committee

Enterprise Core Team Business Unit Teams

Membership • Legal

• Compliance/Audit

• Risk Management

• IT / IT Security

• Chief Data Office

• Chief Admin Office

• Operations

• IT / IT Security

• Risk Management

• Chief Data Office

• Compliance

• Records Management

• Legal

• IT

• Risk Management

• Records Manager

• Records Coordinators

• Operations

Role • Set charter and goals

• Set policies

• Provide resources

• Monitor progress

• Insist on results

• Determine requirements

• Recommend policies

• Establish model process

• Determine architecture

• Provide infrastructure

• Manage enterprise

operations

• Provide training and

consultation

• Compile enterprise reports

• Implement model process

• Index content

• Prepare reports

• Conduct workshops

• Apply policies to cleanse

files

• Tailor metadata as

necessary

• Organize, tag, migrate

content

• Monitor policy compliance

Approach recommended by Gartner and Forrester

Proprietary Information of Active Navigation20

Best Practice Work Flow

Data Discovery

Inventory content

in target

repositories:

• File shares

• SharePoint

• ECM

• Cloud

• Exchange

• Google Drive

• Etc.

Data Cleansing

Identify ROT and

cleanse or

quarantine

Identify duplicates

and cleanse

Identify sensitive

data and cleanse

or secure

Data Modeling

Develop rules for

categorizing files

into records

schedules,

knowledge sharing

taxonomies, etc.

Apply rules to

auto-categorize

content

Review results and

refine taxonomies

and rules

Metadata Tagging

Develop metadata

fields based on

taxonomies and

rules

Auto-tag content

for RM and KM

Migration

Configure

structure and

metadata in

destination

repository (e.g.,

SharePoint, ECM)

Map structure and

metadata between

Discovery Center

and destination

Migrate files and

metadata

Monitor policy

complianceApproach recommended by Gartner, Forrester and supported by Discovery Center

Proprietary Information of Active Navigation21

Example Data

Proprietary Information of Active Navigation22

Total: 1,074,258 files, 965.71 GB

941 GB Shared Drives, 24.71 GB SharePoint

Redundant, Obsolete, Trivial

64% of files were remediation candidates, examples:

• Temporary files

• Email archives

• User identified and aging backups

• Created >10 yrs ago

Sensitive Files: 7563

Employment Data\1099s 19

Employment Data\Background Check 5

Employment Data\W-2 or W-4 8

Financial PII\Credit Data 7

Intellectual Property\Architectural Diagrams & Documents 186

Intellectual Property\M&A Documents 11

Intellectual Property\Models and Analytics 1488

Intellectual Property\Network Diagrams & Configurations 76

Intellectual Property\Pre-Product Launch Plans 10

Intellectual Property\Software and Application Code 4746

IT Related\Password Files 49

Medical Data\Health Records 13

NDA 719

PII\Canadian SIN 178

PII\Passports 48

>26% were surplus

file duplicates

Last accessed>5 years: 54%

Created >5 years: 73%

Dealing with risk gives you a seat at the table –but finding value makes you a leader ➢ Odds are your organization already collects the data it needs to be more successful, but being “data rich” means nothing if

you cannot rationalize that data into information.

The example to follow:

• Rio Tinto Group, a mining company, had over 100 years of collected geological data from potential mines all around the world

• Rio began mining (puns are funny!) their data for areas where old technology could not achieve valuable outcomes but new technology would allow for mines to be profitable

• Rio finds one new mine from mining its old data every year

I’ll bet you know who has a well-supported IG program!

Organization Assessment Questionnaire

1. Is my organization’s data organized to meet our business needs?

2. Do we have and follow our process for managing our data?

3. Do we monitor our data to ensure it meets our changing requirements?

4. Does management understand both the risk and the value of our data?

5. Is our data used a source for creating value?

Tech Assessment Questionnaire

1. Do we have a working policy?

2. Are our issues really technical or political?

3. Do we already own the tools we need?

4. What tools will we need in the future?

5. Does my plan for today cause problems for tomorrow?