We think about shared drives as an overwhelming task – no organization, no structure, mixed retention rules, personal ownership. But it isn’t impossible. And a lot of good can be done with relatively small effort. This is a good number to have handy as records managers to promote plans to get control of feral content.
Why Remediate Shared Drives?Reduce Costs: never delete? Server costs increase logarithmically.
4/12/2016 7
Presenter
Presentation Notes
Example from client -- $450,000 in new storage in 2017 if they keep the same growth.
4/12/2016 9
7. Business Document Assessment
When valued at one labor hour per document, the 10,389,381 documents represent $623.4 million dollars of information assets growing at $77.0 million annually.
Note: Information value calculated using the average Department-wide loaded labor rate of $60 per hour obtained from the Office of Finance to manage (includes create, update, publish, store, search, re-create, re-use of previous versions, etc.) the files.
Files Space (GB) Linear (Files) Linear (Space (GB))
Presenter
Presentation Notes
Note that storage is increasing faster than the volume of files – the lines cross at 2014 -- because of the increasing size of individual files. One hour per document will be way too high for some (i.e. saving an email) and way to low for a complex PowerPoint.
Why Remediate Shared Drives?Efficient retrieval of information◦ eliminate clutter◦ structure content logically (or move to ECM/SharePoint)◦ appropriate tagging results in information findability
Encouraging information sharing: properly organized shared drives (or use of ECM/SharePoint) simplifies security and sharing of information across the organization.
Reducing redundancy and versions –◦ reduces the risk of using wrong version for business decisions. ◦ improved information sharing reduces duplication (typically 30 to 50%)
4/12/2016 10
4/12/2016 11
Presenter
Presentation Notes
No use of hashing – strictly uses file names and sizes. As you’ll see later, hashing adds much more accuracy to the process.
Manual remediation
Basic Analysis Workflow
Rules
Likely ROTClassify and
Group Remaining ContentSensitive Content
(quarantine)
Certain ROT
RedundantOutdate
Trivial
Presenter
Presentation Notes
Make an expansive decision tree that addresses the kind of content you are remediating: e.g. engineering will be different than HR. Classification rules are especially important.
Presenter
Presentation Notes
Go after the low hanging fruit.
Clear out the easy ROT first
Shouldn’t be on shares•Personal content (policy)•Wedding and vacation photos/videos•Music libraries•What have you seen?
Presenter
Presentation Notes
Saves storage money, cleans up folders for in-depth analysis, forces policy compliance. How many organizations have a policy and enforce it?
Clear out the easy ROT first
Likely junk•Temp files, Thumbs DB, or system
generated files•Sort share by file type• Look for EXE, TXT, MPG, EPS, ILL, DAT, ZIP, etc. that don’t
belong
Presenter
Presentation Notes
AP shouldn’t have Illustrator files!
Clear out the easy ROT first
Orphan content
• Personal subject folders (the digital equivalent of desk drawers)
• Check activity on shares• Separated staff more than RRS longest rule• Consider keystone status• Make share read only
2005
Presenter
Presentation Notes
Betty retired in 2006 and managed accounts payable. The odds are nothing in her personal file cabinet has a retention longer than 7 years. A quick review of the content shows nothing has been accessed since 2005. one time deletion.
Clear out the easy ROT first
Redundancies• Earlier versions• Duplicates• PDF and Word • “copy of”• Scans not needed
Presenter
Presentation Notes
Duplicates: Not all duplicates are bad duplicates but within the same folder structure they usually have no value. Most common duplicates are “copy of” and Word + PDF equivalent. A determination will need to be made on actions to take – delete, leave as is, move to a “copy” folder, etc. Build rules into your decision tree. In general, if there are multiple duplicates with Word and PDF version, only the most recent Word version should be kept. If final versions (e.g. PDF) are intermingled with drafts, consider creating a “final version” folder and move final version then clean up drafts according to your rules – it will simplify findability and eliminate version ambiguity. Versions may be useful – to show how an agreement evolved, for example. But then build that into your decision tree.
Clear out the easy ROT first
• Delete outright based on approved disposition rules
• Make Folder “read only” and limit access to manager. No activity in a year, delete.
• Move content to quarantine; leave stub to “contact manager”.
• Formal approval for deletion.• Print everything and file in binders.
Options for ROT Disposition
Presenter
Presentation Notes
Case and project files: (could also be “folders” that represents aggregate documentation such as a contract folder, personnel folder) It is particularly important to de-duplicate, eliminate drafts and version, and be sure the “final” folder follows a standard folder taxonomy (work breakdown structure or file plan). typically folder name with designated year such as 1995 Invoices or 1999 Applicants. Working with the SME, delete, archive or quarantine folders exceeding the records retention policy. Other – depends on how you are using share or SharePoint.
What’s wrong with this picture?
Presenter
Presentation Notes
PDF “final” is 3 days before Word “final” Multiple HVAC agreement version
Second pass remediationCase and project files
Folder names where date information exceeds retention
• Authors• Date accessed• Date Modified• Date Created• Status flags
Other obvious content assessment triggers
Presenter
Presentation Notes
Case and project files: (could also be “folders” that represents aggregate documentation such as a contract folder, personnel folder) It is particularly important to de-duplicate, eliminate drafts and version, and be sure the “final” folder follows a standard folder taxonomy (work breakdown structure or file plan). typically folder name with designated year such as 1995 Invoices or 1999 Applicants. Working with the SME, delete, archive or quarantine folders exceeding the records retention policy. Other – depends on how you are using share or SharePoint.
Migrate
• ISO1549• Departmental• One retention rule for a folder structure• Differentiate drafts and final versions• Security – promote sharing, eliminate duplicates• Standardize metadata (folder and file names)
Create classification scheme/taxonomy
• Migrate old folder structure to new, “clean”• Or migrate to ECM/SharePoint• Or, build new structure and make old “read only”
Build new structure
Presenter
Presentation Notes
Case and project files: (could also be “folders” that represents aggregate documentation such as a contract folder, personnel folder) It is particularly important to de-duplicate, eliminate drafts and version, and be sure the “final” folder follows a standard folder taxonomy (work breakdown structure or file plan). typically folder name with designated year such as 1995 Invoices or 1999 Applicants. Working with the SME, delete, archive or quarantine folders exceeding the records retention policy. Other – depends on how you are using share or SharePoint. ECM migration: having folder structure and file names standardized means they can be converted to ECM metadata
• Extract Contract number 12345 from Folder Name• Validate against contract tracking Access database• Pull party names from Access
• Extract document type from File Name• Extract document date from File Name
ECM/SharePoint Migration rules
Presenter
Presentation Notes
Case and project files: (could also be “folders” that represents aggregate documentation such as a contract folder, personnel folder) It is particularly important to de-duplicate, eliminate drafts and version, and be sure the “final” folder follows a standard folder taxonomy (work breakdown structure or file plan). typically folder name with designated year such as 1995 Invoices or 1999 Applicants. Working with the SME, delete, archive or quarantine folders exceeding the records retention policy. Other – depends on how you are using share or SharePoint. ECM migration: having folder structure and file names standardized means they can be converted to ECM metadata
Defined taxonomy
Presenter
Presentation Notes
Work Breakdown Structure for light rail construction project management
Analysis Gotchyas
Long path names (SharePoint, OneDrive issue, ECM issue)
Embedded links
Business systems point to directories
Security rules
Backup/restore software can
reset dates
Autoclassification and Shared Drive Remediation Software
How FACR Software Works
4/12/2016 28
• Dir start • All filesIngest
• Like content• Business
rules• Confidence
%
Group• Workflow• Human
reviewVerify
•ECM•SharePoint•New File Share•Delete
Migrate
Training doc’t sets Semantic Relations Visual/graphical
match Regular expression Search content for
PII, etc. Duplicates
Quarantine Change rights Security
classification Legal hold
Queue for bus. SME Queue for Legal Direct to Migrate Link to databases Mass assign
classification Duplicate handling
PreprocessOCR
Photos
Presenter
Presentation Notes
Training Sets are used in ediscovery to accomplish predictive coding. PII: SS#, bank accounts, credit cards, etc.
FACR SoftwareOUTPUT EXAMPLES
Exact Match duplication analysis
Hash value + Metadata
Presenter
Presentation Notes
Rule setting – in this case the pattern for employee number. Many of these can be combined together.
Presenter
Presentation Notes
Information Management Policy security risk analysis This kind of report – done as a pilot or proof of concept – can scare the socks off management.
Presenter
Presentation Notes
Information Management policy output results for ROT
Migration Actions
Presenter
Presentation Notes
This calculated field will give results desired by target content. “Calculated field values” is a metadata value, and in this example, the data being migrated will be restructured in the destination by that Calculated Field. In this case that calculated field is further defined as “Document Type”, so the end result will be the data structured according to document type within the new repository.