BIG DATA NEEDS BIG CLASSIFICATION Andy Carnahan 13 November 2014 Big Data needs Big Classification...
-
Upload
cori-dixon -
Category
Documents
-
view
217 -
download
3
Transcript of BIG DATA NEEDS BIG CLASSIFICATION Andy Carnahan 13 November 2014 Big Data needs Big Classification...
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Big Data needs Big Classification
Andy CarnahanCustomer and Information Services Manager
Wingecarribee Shire Council
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Context
• The “records” battle is lost• The federated states of search• A tale of two indexes• Why big classification will happen
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
J Brew – Cathedral Term
ite Mound
Cc Wikim
edia Comm
ons
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The extinct ECM modelLouisa Anne M
eredith, 'Tasmanian Tiger',
1880 (Tasmanian Library, SLT)
CONTEXT CONTENT• Manual Record Keeping• Central registry• Human Classification
• Physical Content• Manageable Growth• Single instance
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The current ECM model
• Manual Record Keeping• Single instance ECM• Human Classification
• Technology created• Huge volumes • Indexing speed/access
CONTENT
CONTEXT
Popular Science Monthly/Volume 8/February 1876
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The new ECM model
CONTEXT CONTENT• Synthetic RK• Semantic Indexing• Machine Classification
• Technology created• Federated Search• Indexing speed
Aviceda Wikim
edia Comm
ons:Peregrine Falcon , Q
ld, Apr 2007
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Big Data
Velocity
Volume
Variety
Classification must be performed at the speed of information creation at equal to human quality
Data does not (necessarily) reside in the RKMS
So long as the information is digitally available it can be will be accessible in the future
Big Classification
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
What is big classification?
Human-competitive automatic topic indexing
Alyona MedelyanPhD thesis in Computer ScienceUniversity of WaikatoJuly 2009
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
How’s your math?
• Our inputs:– Central registry: 250 mail items/day, 100
classified/workflowed (40%) by 2 records staff– 200 white collar staff: 5,000 emails / day
• The Question: How many records staff do I need to manage the 5,000 daily emails?
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Current Human situation
• General staff won’t do it– Don’t understand metadata/Classification schemas– Not their job/too busy– Maintain their own data sets for own productivity
• Records staff can’t do it– Sheer quantity of information– Limited access to information domain– Not their job/too busy
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Fields, haystacks and needlesClaude M
onet Wheatstacks (End of Sum
mer),
1890-91, The Art Institute of Chicago
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The (old) centre of the Universe
RecordsTHE Central
Registry
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
A small part of a bigger Universe
EnterpriseInformation
Domain(Content)
Application Data
Databases
Documents
Intranet
Extranet
CorporateIntranet
SocialMedia
Teamsites
(Sharepoint)
Multimedia
ECMS
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The deliberate ECMSwww.youragency.gov.org.com.net.au/about
Our mission is to organise the world’s agency’s information and make it universally appropriately accessible and useful.
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The two components
CONTEXTMetadataDatabase
CONTENTDocument
Store
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Contenxt is King
CONTEXTMetadataDatabase
CONTENTDocument
Store
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Extending to the domain
CONTEXTMetadataDatabase
CONTENTDocument
Store
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Crawling the domain
CONTEXTMetadataDatabase
CONTENTDocument
Store
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Concordance -> content
Concordance Index(location)
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Semantic -> context
Concordance Index(location, Content)
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The big imbalance
Concordance Index(location, Content)
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
Machine mediated
Human mediated
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Index (disambiguation)
The CONTENT index Concordance (location) Left Brain Literal/Algorithmic Single path Speed
The CONTEXT index Semantic (meaning) Right brain Concepts/thesaurus Back of Book Precision
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Concordance Indexing
Concordance Index(location, Content)
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Concordance History
Concordance Index(location, Content)
First recorded instance 1230 AD Hugh of St Cher 500 monks created index to location
of every word in the Versio Vulgata (common Bible)
Same methodology used by Google to index web – URL instead of book/chapter/verse
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Concordance Features
Concordance Index(location, Content)
Needle-in-haystack searching Index requires no human assistance
to build Index is now built at machine speed Access to results is at machine speed Mature, widely adopted Can find the needle, but we still need
the haystacks
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Semantic Indexing
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Semantic Indexing Injects meaning into search Search on concepts Enables multiple taxonomies in virtual
views (pivot taxonomy) Disambiguates Emerging in research and commercial
software
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Manual Semantic Indexing
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
Record Keeperscontinuously
classifying incoming mail
ConfidentClassificationThreshold?
Add to indexes
Get expert help
RK remembers how to classify that exception
Y
N
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Automatic Semantic Indexing
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
“robot”continuously
crawls contentdomain
ConfidentClassificationThreshold?
Add to indexes
Get human help
robot remembers how to classify that exception
Y
N
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Semantic Indexing
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
Denise Bedford AIIM
2013
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Semantic Indexing
Semantic Index(Classification, Taxonomy
Thesaurus, Context) Denise Bedford AIIM
2013
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The big balance
Concordance Index(location, Content)
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
Machine mediated
Machine mediated
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Big classification will happen
Ow
en Carnahan - used with perm
ission
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
2014 three suggestions
• Enterprise content management can only occur if context management (classification) is performed at machine speed.
• Machine classification must be performed at a quality similar to a records officer.
• The entire enterprise information domain is the responsibility of Records and Information Management Professionals.
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The e-context future is coming…
• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The e-context future is coming…
• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The e-context future is coming…
• IBM’s Watson• Wolfram Alpha (Siri)• StoredIQ• Recommind• Pingar• Cyc/Wikipedia• Smartlogic• TopQuadrant
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The e-context future is coming…
• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The e-context future is coming…
• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The e-context future is coming…
• IBM’s Watson• Wolfram Alpha (Siri)• Cyc/Wikipedia• Smartlogic• TopQuadrant
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Record Context keepers
• Most important people in the organisation• “Training” the synthetic context agent• Refining and enhancing the context engine• ICT is the maintainer of the content engine and content
store• Records and Information Management Role is much more
rewarding
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Record keeping now
Add to indexes
Record Keepers
continuously classifying incoming
ConfidentClassifi-cation
Threshold?
Get expert help
RK remembers how to classify that exception
N
ECMS“small”data
Y
Manual Process
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Automatic Semantic Indexing
Add to indexes
“robot”continuously
crawls contentdomain
ConfidentClassifi-cation
Threshold?
Get expert help
robot remembers
how to classify that exception
N
Y
Automated Process
Manual Process
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The new ECM model
CONTEXT CONTENT• Synthetic RK• Semantic Indexing• Machine Classification
• Technology created• Federated Search• Indexing speed
Aviceda Wikim
edia Comm
ons:Peregrine Falcon , Q
ld, Apr 2007
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
The big balance
Concordance Index(location, Content)
Semantic Index(Classification, Taxonomy
Thesaurus, Context)
Machine mediated
Machine mediated
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Big classification
1. Federated search based on concordance indexing
2. Semantic search/classification based on context engine (machine speed/human quality)
BIG DATA NEEDS BIG CLASSIFICATION
Andy Carnahan 13 November 2014
Thank you!
Keep in touch– Linked in Andy Carnahan– [email protected]– [email protected]– RIMPA list (www.rimpa.com.au)– LG IT lists
• [email protected]• [email protected]• [email protected]
Sometimes small data is good