Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are...

21
GraphAware ® The power of polyglot searching Janos Szendi-Varga graphaware.com @graph_aware

Transcript of Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are...

Page 1: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

GraphAware®

The power of polyglot searchingJanos Szendi-Varga

graphaware.com

@graph_aware

Page 2: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Most frequently used UI element

GraphAware®

Search Go

Page 3: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Evolution of Internet Search

https://moz.com/blog/the-evolution-of-search

Page 4: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Slide from BDU 2016

Page 5: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

We started to be Polyglot

Big data architecture is not a vision

We hired Data Scientists

We started to index things (Lucene)

We started to use Solr, ElasticSearch, etc

It became the part of our Big Data architecture

We introduced Search Infrastructure

Evolution in corporate search

GraphAware®

Page 6: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

The fundamental of search infrastructure

GraphAware®

?

Page 7: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

They are aggregate oriented databases, they have limitations when it comes to connected data

Typical setup: Two users searching for the same thing will get the same results

They are in the search 3.0-4.0 phase

They are superstars of Full text search

We need to extend this with Graph-aided search

We have to boost some Search Hit (c`mon It is a recommender system)

We have to filter out or degrade the score

We need Things, not Strings!!444!!!négy!!!

Challenges

GraphAware®

Page 8: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Example of graph-based search

GraphAware®

Page 9: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

“A knowledge graph is a multi-relational graph composed of entities as nodes and relationships as edges with different types that describe facts in the world."

Knowledge graph

GraphAware®

It is about “understanding the world as you and I do”.

Page 10: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected
Page 11: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected
Page 12: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Search infrastructure should be easily integrated into existing architecture New data sources should be easily added Should support the strategic goals

e.g. Search driven e-commerceScalableShould provide personalised results Simple interface

Requirements of searching and KG

GraphAware®

Page 13: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Take a graph database (Neo4j, Cayley, OntoText GraphDB, etc.)

Graph construction:

Knowledge extraction

from the internet

open data

grabbing

from text (NLP)

from current databases (Master Data)

from logs

Knowledge Graph Construction

Have a good graph model

Connect the things together

Steps to build KG

GraphAware®

Page 14: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected
Page 15: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected
Page 16: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Apache Kafka for streaming pipelines

Product topic

Search topic

Feedback topic

Spark on the processing side

Neo4j on the consuming side

CQRS (Command Query Responsibility Segregation) pattern

Push to ElasticSearch with GraphAware plugin

Neo4j Transaction Handler (afterCommit)

You can define mappings to ES

Parts of the architecture

GraphAware®

Page 17: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Success story 1.

• Sharing Tribal Knowledge inside the company

• >20 offices

• >3000 employees

• Data sources:

• Tableau dashboards (4000)

• Knowledge posts (>1000)

• Superset charts and dashboards (>6000)

• Experiments and metrics (>5000)

GraphAware®https://www.slideshare.net/ChristopherWilliams24/20170108scaling-tribalknowledge

Page 18: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Success story 2.

•Half-century of collective NASA engineering knowledge

• It is called Lessons Learned database

• They use it in Mars mission project

GraphAware®

Impact: “Neo4j saved well over two years of work and one million dollars of taxpayers funds.”

“When we had the [Apollo 1] fire, we took a step back and said okay, what lessons have we learned from this horrible tragedy? Now let’s be doubly sure that we are going to do it right the next time. And I think that fact right there is what allowed us to get Apollo done in the ‘60s.” —Dr. Christopher C. Kraft, Jr., Director of Flight Operations

Page 19: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

Neo4j

ElasticSearch

GraphAware modules:

Neo4j to ElasticSearch

ElasticSearch Plugin

NLP plugin

Github: github.com/graphaware

Open data

Resources

GraphAware®

Page 20: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

GraphAware®

It is not a rocket science!

Anonymous NASA scientist

Page 21: Janos Szendi-Varga - BDU 3.0bdu.hu/2017/ppts/2017/Szendi-Varga_Janos.pdf · 2017-12-12 · They are aggregate oriented databases, they have limitations when it comes to connected

www.graphaware.com@graph_aware

GraphAware

GraphAware®

world’s #1 Neo4j consultancy