BIG DATA NEEDS BIG CLASSIFICATION Andy Carnahan 13 November 2014 Big Data needs Big Classification...

52
BIG DATA NEEDS BIG CLASSIFICATION Andy Carnahan 13 November 2014 Big Data needs Big Classification Andy Carnahan Customer and Information Services Manager Wingecarribee Shire Council

Transcript of BIG DATA NEEDS BIG CLASSIFICATION Andy Carnahan 13 November 2014 Big Data needs Big Classification...

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Big Data needs Big Classification

Andy CarnahanCustomer and Information Services Manager

Wingecarribee Shire Council

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Context

• The “records” battle is lost• The federated states of search• A tale of two indexes• Why big classification will happen

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

J Brew – Cathedral Term

ite Mound

Cc Wikim

edia Comm

ons

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The extinct ECM modelLouisa Anne M

eredith, 'Tasmanian Tiger',

1880 (Tasmanian Library, SLT)

CONTEXT CONTENT• Manual Record Keeping• Central registry• Human Classification

• Physical Content• Manageable Growth• Single instance

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The current ECM model

• Manual Record Keeping• Single instance ECM• Human Classification

• Technology created• Huge volumes • Indexing speed/access

CONTENT

CONTEXT

Popular Science Monthly/Volume 8/February 1876

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The new ECM model

CONTEXT CONTENT• Synthetic RK• Semantic Indexing• Machine Classification

• Technology created• Federated Search• Indexing speed

Aviceda Wikim

edia Comm

ons:Peregrine Falcon , Q

ld, Apr 2007

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Big Data

VelocityVolumeVariety

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Big Data

Velocity

Volume

Variety

Classification must be performed at the speed of information creation at equal to human quality

Data does not (necessarily) reside in the RKMS

So long as the information is digitally available it can be will be accessible in the future

Big Classification

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

What is big classification?

Human-competitive automatic topic indexing

Alyona MedelyanPhD thesis in Computer ScienceUniversity of WaikatoJuly 2009

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

How’s your math?

• Our inputs:– Central registry: 250 mail items/day, 100

classified/workflowed (40%) by 2 records staff– 200 white collar staff: 5,000 emails / day

• The Question: How many records staff do I need to manage the 5,000 daily emails?

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Current Human situation

• General staff won’t do it– Don’t understand metadata/Classification schemas– Not their job/too busy– Maintain their own data sets for own productivity

• Records staff can’t do it– Sheer quantity of information– Limited access to information domain– Not their job/too busy

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Fields, haystacks and needlesClaude M

onet Wheatstacks (End of Sum

mer),

1890-91, The Art Institute of Chicago

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The (old) centre of the Universe

RecordsTHE Central

Registry

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

A small part of a bigger Universe

EnterpriseInformation

Domain(Content)

Application Data

Databases

Documents

Intranet

Extranet

CorporateIntranet

SocialMedia

Teamsites

(Sharepoint)

Multimedia

Email

ECMS

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The inadvertent ECMS

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The deliberate ECMSwww.youragency.gov.org.com.net.au/about

Our mission is to organise the world’s agency’s information and make it universally appropriately accessible and useful.

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

ECMS

What is an ECMS?

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The two components

CONTEXTMetadataDatabase

CONTENTDocument

Store

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Contenxt is King

CONTEXTMetadataDatabase

CONTENTDocument

Store

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Extending to the domain

CONTEXTMetadataDatabase

CONTENTDocument

Store

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Crawling the domain

CONTEXTMetadataDatabase

CONTENTDocument

Store

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Exploring Context

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Concordance -> content

Concordance Index(location)

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Semantic -> context

Concordance Index(location, Content)

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The big imbalance

Concordance Index(location, Content)

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

Machine mediated

Human mediated

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Index (disambiguation)

The CONTENT index Concordance (location) Left Brain Literal/Algorithmic Single path Speed

The CONTEXT index Semantic (meaning) Right brain Concepts/thesaurus Back of Book Precision

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Concordance Indexing

Concordance Index(location, Content)

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Concordance History

Concordance Index(location, Content)

First recorded instance 1230 AD Hugh of St Cher 500 monks created index to location

of every word in the Versio Vulgata (common Bible)

Same methodology used by Google to index web – URL instead of book/chapter/verse

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Concordance Features

Concordance Index(location, Content)

Needle-in-haystack searching Index requires no human assistance

to build Index is now built at machine speed Access to results is at machine speed Mature, widely adopted Can find the needle, but we still need

the haystacks

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Semantic Indexing

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Semantic Indexing Injects meaning into search Search on concepts Enables multiple taxonomies in virtual

views (pivot taxonomy) Disambiguates Emerging in research and commercial

software

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Manual Semantic Indexing

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

Record Keeperscontinuously

classifying incoming mail

ConfidentClassificationThreshold?

Add to indexes

Get expert help

RK remembers how to classify that exception

Y

N

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Automatic Semantic Indexing

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

“robot”continuously

crawls contentdomain

ConfidentClassificationThreshold?

Add to indexes

Get human help

robot remembers how to classify that exception

Y

N

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Semantic Indexing

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

Denise Bedford AIIM

2013

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Semantic Indexing

Semantic Index(Classification, Taxonomy

Thesaurus, Context) Denise Bedford AIIM

2013

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The big balance

Concordance Index(location, Content)

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

Machine mediated

Machine mediated

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Big classification will happen

Ow

en Carnahan - used with perm

ission

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

2008 Predictions

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

2014 three suggestions

• Enterprise content management can only occur if context management (classification) is performed at machine speed.

• Machine classification must be performed at a quality similar to a records officer.

• The entire enterprise information domain is the responsibility of Records and Information Management Professionals.

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The e-context future is coming…

• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The e-context future is coming…

• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The e-context future is coming…

• IBM’s Watson• Wolfram Alpha (Siri)• StoredIQ• Recommind• Pingar• Cyc/Wikipedia• Smartlogic• TopQuadrant

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The e-context future is coming…

• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The e-context future is coming…

• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The e-context future is coming…

• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Record Context keepers

• Most important people in the organisation• “Training” the synthetic context agent• Refining and enhancing the context engine• ICT is the maintainer of the content engine and content

store• Records and Information Management Role is much more

rewarding

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Record keeping now

Add to indexes

Record Keepers

continuously classifying incoming

mail

ConfidentClassifi-cation

Threshold?

Get expert help

RK remembers how to classify that exception

N

ECMS“small”data

Y

Manual Process

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Automatic Semantic Indexing

Add to indexes

“robot”continuously

crawls contentdomain

ConfidentClassifi-cation

Threshold?

Get expert help

robot remembers

how to classify that exception

N

Y

Automated Process

Manual Process

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The new ECM model

CONTEXT CONTENT• Synthetic RK• Semantic Indexing• Machine Classification

• Technology created• Federated Search• Indexing speed

Aviceda Wikim

edia Comm

ons:Peregrine Falcon , Q

ld, Apr 2007

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

The big balance

Concordance Index(location, Content)

Semantic Index(Classification, Taxonomy

Thesaurus, Context)

Machine mediated

Machine mediated

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Big classification

1. Federated search based on concordance indexing

2. Semantic search/classification based on context engine (machine speed/human quality)

BIG DATA NEEDS BIG CLASSIFICATION

Andy Carnahan 13 November 2014

Thank you!

Keep in touch– Linked in Andy Carnahan– [email protected][email protected]– RIMPA list (www.rimpa.com.au)– LG IT lists

[email protected][email protected][email protected]

Sometimes small data is good