The Next Generation SharePoint: Powered by Text Analytics
-
Upload
alyona-medelyan -
Category
Technology
-
view
2.356 -
download
0
description
Transcript of The Next Generation SharePoint: Powered by Text Analytics
PLATINUM SPONSOR
GOLD SPONSORS
THE NEXT-GENERATION SHAREPOINT:
POWERED BY TEXT ANALYTICS
Alyona Medelyan (Pingar)@zelandiya
AGENDA
• Information tasks • Text analytics• APIs• Demos• Conclusions
Information tasksWhat do they cost us?How does SharePoint help?
Emails
Creatin
g doc
s
Analyz
ing in
fo
Search
ing
Review
ing
Gatheri
ng in
fo
Organiz
ing do
cs
Creatin
g pres
entat
ions
Creatin
g imag
es
Data en
try
Doc ap
prova
l
Publish
ing
Transla
ting
14.513.3
9.6 9.5 8.8 8.36.8 6.7
5.6 5.64.3 4.2
1
Avg. hours per week
= $37K year / person
Source: IDC, Hidden Cost of Information (2005)
Emails
Creatin
g doc
s
Analyz
ing in
fo
Search
ing
Review
ing
Gatheri
ng in
fo
Organiz
ing do
cs
Creatin
g pres
entat
ions
Creatin
g imag
es
Data en
try
Doc ap
prova
l
Publish
ing
Transla
ting
SHAREPOINT SAVES TIME Interact with SP from Outlook
Create docs collaboratively Customize search configuration
Define Managed Metadata Configure forms
Design Workflow
Use sites, sets & libraries
Text AnalyticsWhat is it and how does it work?What tasks does it solve?
Text MiningNatural Language Processing
WHAT IS TEXT ANALYTICS?
unstructured data
Opinion MiningBusiness IntelligenceDocument Organization
Data ExtractionSearch
Machine Learning
Text ProcessingStatistics
Linguistics
Emails
Creatin
g doc
s
Analyz
ing in
fo
Search
ing
Review
ing
Gatheri
ng in
fo
Organiz
ing do
cs
Creatin
g pres
entat
ions
Creatin
g imag
es
Data en
try
Doc ap
prova
l
Publish
ing
Transla
ting
TEXT ANALYTICS SAVES MORE TIME
Compose search reports Extract entities
Redact
Generate metadata Fill databases
Cluster search results
Summarize
Mine opinions & sentiment… automatically
Profanity check
Text Analytics SoftwareWhat companies offer text analytics?What are open source tools like?
TEXT ANALYTICS: GLOBAL PERSPECTIVE
User adoption has grown by 25% in 2010 creating an $835 million market because:
• Unstructured data grows (ex. social) Text analytics!
• Text analytics is central to effective information access
• Many successes in NLP: IBM Watson, Wolfram Alpha
Full report by Seth Grimes: http://altaplana.com/TA2011
APPLICATIONS OF TEXT ANALYTICS
Law enforcementMillitary intelligence
Insurance & fraudContent management
OtherFinance
Online commerceProduct design
Life sciencesE-discovery
Customer serviceCompetitive intelligence
ResearchBrand management
Customer experience managementSearch & info access
6%7%
8%8%
9%10%
11%15%15%15%
26%33%
36%39%39%39%
Source: http://altaplana.com/TA2011
SEARCH & INFO ACCESS METADATA EXTRACTION
Document Easy to extract: File type, name & location, creation & modification date, authors
Difficult to extract: Keywords, people & companies mentioned, suppliers & addresses mentioned
Metadata
SEARCH & INFO ACCESSKEYWORD EXTRACTION
Document KeywordsCandidates
Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESSKEYWORD EXTRACTION
Document KeywordsCandidates
Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESSKEYWORD EXTRACTION
Document KeywordsCandidates Properties
FrequencyPosition
Corpus statsRelatedness
Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESSKEYWORD EXTRACTION
Document KeywordsCandidates Properties
Heuristicscoring
Machinelearning
Scoring
Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
SEARCH & INFO ACCESSNAMES EXTRACTION
Document Names
If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
Training data(annotations)
Examples Properties Learning
NLP,Heuristics,Text mining
Machine Learning
<SEARCH + TEXT ANALYTICS> COMPANIES
Pingar, BasisTech, AlchemyAPI, LanguageComputer, OpenCalais, Extractiv
BRAND & CUSTOMER MANAGEMENT SENTIMENT ANALYSIS
Sentiment Analysis
If you are reading this because it is your darling fragrance, please wear it at home exclusively, and tape the windows shut.
BUT:Negativesuck
terribleawful
Positivefantasticexcellentawesome
Naïve approach: Sentiment-words dictionary!
DocumentDocumentReviewsTweetsSurveys
VisualizationSummary
No sentiment words!
BRAND & CUSTOMER MANAGEMENT SENTIMENT ANALYSIS
DocumentDocumentReviewsTweetsSurveys
VisualizationSummary
Examples
Training data(annotations)
PresencePosition
Part-of-SpeechNegation
Generalization
Properties
Lexicon induction
Learning
Machine Learning
Important: Identifying sentiment bearing sentencesAttaching sentiment to a topic!
SENTIMENT ANALYSIS COMPANIES
Attensity AlchemyAPI LexalyticsSaploMedalliaSAS
RESEARCH TEXT SUMMARIZATION
Hi All, As of today, MetaStock has several new functions. The most important new feature is the ability to display forward heat rate charts. Also, notice that the interface looks different -- this reflects and accommodates the new features.If you have any questions regarding this new version of MetaStock, please contact Bella Santuri.
AddressAnnouncement
Details
More details
Conclusion
Extractive summary: As of today, MetaStock has several new functions.Sentence compression: MetaStock has several new functions.
The new interface looks different.Abstractive summary: MetaStock has new features and a new interface.
TEXT SUMMARIZATION COMPANIES
Lexalytics, Pingar
COMPETITIVE INTELLIGENCE:ENTITY & ENTITY RELATION EXTRACTION
Companies: OpenCalais, Extractiv, Pingar, Evri, AlchemyAPI, Zemanta
FRAUD INVESTIGATION:NORMALIZATION OF DATES & NAMES
Companies: Cicero, BasisTech
OPEN-SOURCE TOOLS
• NLTK – Apache license, Book, Python & academic datasets, nltk.org
• LingPipe – Commercial licenses, Tutorials, Coreference & Chinese segment, alias-i.com/lingpipe
• OpenNLP – Apache license, Parsing, MaxEnt ML, incubator.apache.org/opennlp
• GATE – restricted GPL, Training courses, Applications & framework, gate.ac.uk
• Stanford NLP – full GPL, Online docs, Full library, nlp.stanford.edu
APIsWhat’s an API and how does it work?What are the advantages of the API model?Which API is the right one for you?
API ENGINE
API ACCESS
Developer creates an application
Software enginesolves a specific task
An interface thatensures communication
calls via a web service
includes API authentication
a call is an XML messagedescribing the request
a protocol specifies how XML needs to be encoded
• SOAP• REST
SDKusage examples
REST API ACCESS FROM A BROWSER
API requesthttp://search.yahooapis.com/WebSearchService/V1/webSearch?appid=YahooDemo&query=madonna&context=Italian+sculptors+and+painters+of+the+renaissance+favored+the+Virgin+Mary+for+inspiration
API response
SOAP API ACCESS FROM VS2010
SOAP API ACCESS IN POWERSHELL
Read complete blog post “Bulk metadata extraction in SharePoint”: http://bit.ly/powershell-migrate
API = EASY INTEGRATION & FLEXIBILITY• Integrate into existing architecture
via any programming language
• Improve known flaws in the current system/process
• Minimize adoption barriers within the companyno or little training required for stuff
• Only pay for the features you need
• Flexible deployment:• Host API on site = Secure data exchange
• Access the API in the cloud = Save on tech support & hardware
WHICH API IS BEST FOR YOU?
I need to take some text and get a list of the important entities/keywords/phrases.
Blog post on API comparison:faganm.com/blog
Y: Term ExtractorOpenCalaisBeliefNetworksOpenAmplifyAlchemyAPIEvri
API restrictionsSupported languagesQuality of resultsSemantic linksSynonyms/Duplicates
1st2nd
HOW TO CHOOSE AN API:
• Define a specific task• Think of what features are important• Get prepared:
• Subscribe for API keys
• Get SDKs
• Learn libraries
• Find representative data• Build a test framework• Compare results
METADATA EXTRACTION IN SHAREPOINT
Demo Pingar’s add-on for SharePoint 2010 built using a text analytics API
INTEGRATING APIS INTO SCANNING
Video Using Fuji Xerox SmartConnect and Pingar APIto scan documents in batch into SharePoint
http://www.youtube.com/watch?v=kluVp25upag
THE NEXT-GENERATION SHAREPOINT: POWERED BY TEXT ANALYTICS
• What can be automated?• Metadata extraction, Data entry, Opinion mining,
Sanitization, Doc approval, Summarization, …
• How to integrate text analytics into existing SharePoint applications?• Easy! Via an API
• How to find the right text analytics API?• Review what’s available
Set up an experiment Compare results
Thank you to all of our Sponsors
PLATINUM SPONSOR
SILVER SPONSORS
GOLD SPONSORS
BRONZE SPONSORS