BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic...

44
BigData@Chalmers Machine Learning Business Intelligence, Culturomics and Life Sciences Devdatt Dubhashi LAB (Machine Learning. Algorithms, Computational Biology) D&IT Chalmers

Transcript of BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic...

Page 1: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

BigData@Chalmers

Machine Learning

Business Intelligence,

Culturomics and Life Sciences

Devdatt Dubhashi

LAB

(Machine Learning. Algorithms, Computational Biology)

D&IT Chalmers

Page 2: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 3: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 4: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 5: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 6: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Entity Disambiguation

• Match names in text with the entity behind them

• Fundamental problem, addressed at annual

competitions like Semeval

• Disambiguation is needed everywhere.

Databases, web mining, linguistics, …

• Used at Recorded Future (exemplified next!)

Page 7: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 8: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 9: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 10: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 11: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 12: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

“Judge a man by the company he keeps.”- Euripides

Page 13: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Chris Anderson

IndiaOxford Uni.

Pakistan

TED Future Publishing

San Francisco

Page 14: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Chris Anderson

Page 15: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Graph Communities

Page 16: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Classification with Graph

Kernels

Page 17: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Graph Embeddings and Kernels

12

3

4 5

1√ϑ

• Embed discrete combinatorial object (graph) into continuous Euclidean space

• Define kernel based on geometry of Euclidean sp.

• V. Jethava et al NIPS 2012, JMLR 2013

• T. Kerola, L. Hermansson, V. Jethava, F. Johansson CIKM 2013

• F. Johansson, V. Jethavaet al ICML 2014.

Page 18: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Demonstrator at Recorded

Future• Classifies names as ambiguous or unique

• Uses graph classification to classify occurrence graphs of names

• Achieved state-of-the-art results (CIKM, 2013).

• Powerful extension for complete disambiguation in progress …

• Parallel/Distributed implementation in GraphLab

Ambiguous or

Unique?

Page 19: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Towards a knowledge-based

culturomics

• Språkbanken (Swedish Language Bank),

University of Gothenburg

• Language Technology, Lund University

• LAB Group Department of Computer

Science and Engineering, Chalmers

University of Technology

Page 20: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 21: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Word Embeddings

Page 22: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 23: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Deep Learning (Neural

Networks)• Revolutionized vision

and speech systems

• Dramatic

improvements in

image classification –

near human level.

• Skype real time

translation from

English to Chinese.

Page 24: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Word Embeddings capture

meaning

Page 25: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Dealing with

information overload

Page 26: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Document summarization

Word vectors

+ Multiple

Kernel

learning

+ Submodular

optimization M. Kågeback, O. Mogren

et al,

“Extractive Summarization

using Continuous Vector

Space Models”,

Workshop on

(CVSC) EACL 2014

Page 27: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 28: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 29: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Word sense induction

him political her government god influence state came us act labour given

council about authority

energy unit system battery x performance

high allows engine equipment processing systems failure

management provide

Instance cloud for: 'power'M. Kageback, F. Johansson et

al, “Neural context embeddings

for automatic discovery of word

senses”, (NAACL 2015

workshop on Vector Space

Modeling for NLP)

Used an innovative clustering

technique

Exploited word and context

vectors.

Page 30: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Senses of for ‘paper’

Vis using t-sne

Medium

Essay

Scholarly article

Newspaper

Newspaper firm

Material

Page 31: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 32: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 33: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 34: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used
Page 35: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Probabilistic Regulation of

Metabolism• Prediction of

metabolic changes

due to genetic or

environmental

perturbations

• diagnosing metabolic

disorders

• discovering novel

drug targets.

Page 36: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Genetic Regulation of Metabolism:

Using Factor Graphs and Belief

Propagation

Page 37: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

genetic regulatory network

consisting of transcription factor

genes, target genes and metabolic

reactions

Page 38: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Privacy

• Data mining with

Differential Privacy

• Programming

language technology

for differential privacy

(Sands)

• Privacy policies for

social networks

(Schneider)

Page 39: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Chalmers Machine Learning

Summer School 2015

Page 40: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Big Data Analytics May 25-29

• Hadoop

• Spark

• Spotfire

Page 41: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

• SVMs and Kernel Methods

• Graph Theoretic Methods

• Probabilistic Graphical Models

• Deep Learning

• Bayesian Decision Theory

• Reinforcement Learning

• Business intelligence

• Natural Language Technology

• Life Sciences

• Transport (Volvo)

• Infectious disease epidemiology

• Medical Imaging

• Political Science …

Page 42: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Machine Learning

Probability and

Statistics

Algorithms

Optimization

Database

s

Multicores

/GPUs

Securit

yPrivac

y

Data Science

Parallel

programming

Sparse

modelling

Page 43: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Chalmers Data-X ?

• Life Science and

Engineering

• Transport

• Energy

• Smart Cities (Built

Environment)

• Production

• Volvo cars (connected

cars, historical data)

• AstraZeneca (mining

medical literature)

• Seal software (mining

legal contracts)

Page 44: BigData@Chalmers Machine Learning Business …al, “Neural context embeddings for automatic discovery of word senses”, (NAACL 2015 workshop on Vector Space Modeling for NLP) Used

Data Science vs EScience

• Data-centric

• Probabilistic models

• GPUs

• Computational

biology, NLP, social

sciences …

• Computation-centric

• Simulation

• Large clusters/grids

• Physics, Turbulent

flows, Climate …