Changing the way people search with apache spark

21
Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve. HOW APACHE SPARK IS CHANGING THE WAY PEOPLE SEARCH? INSIGHTS FROM IMAGINEA

Transcript of Changing the way people search with apache spark

Page 1: Changing the way people search with apache spark

Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve.

HOW APACHE SPARK IS CHANGING THE WAY PEOPLE SEARCH?

INSIGHTS FROM IMAGINEA

Page 2: Changing the way people search with apache spark

My vision when we started Google 15 years ago was that eventually you wouldn't have

to have a search query at all.

- Sergey Brin, Google

SERGEY BRIN ONCE SAID …

Page 3: Changing the way people search with apache spark

SEARCH TODAY IS PERSONAL

Page 4: Changing the way people search with apache spark

SEARCH TODAY IS CONTEXTUAL

Best places to see in Goa Best places to see in Goa

Search from …

iPhone user, on street in NYSearch from …

iPhone user, on street in Goa

Contextual results …

Best places in Goa, flight charges to Goa, places to stay, etc.

Contextual results …

Places to visit with distance from your location, restaurants near-by, etc.

Page 5: Changing the way people search with apache spark

SEARCH TOMORROW WILL BE ‘HUMAN LIKE’

Experiential Intelligence

Personal Assistant

Evolutionary Brain

Page 6: Changing the way people search with apache spark

Experiential Intelligence

Personal AssistantEvolutionary

Brain

Like humans, search engines will have an

evolutionary brain that understands search behavior

to learn from it

While Machine Learning is already being used by search engines, we still have a long way to go to understand & learn from ‘mass-scale’human search patterns.

Deep learning technique is fast evolving. It’s becoming increasingly more important to capture your customer’s imagination and attention with visuals, and search companies are taking notice.

Checkout Our Deep Learning Experiment >>

Page 7: Changing the way people search with apache spark

Experiential IntelligencePersonal

Assistant

Evolutionary Brain

Search engines are fast becoming personal assistant

by enabling meaningful contextual conversation

E.g., When you search for the status of your flight, it tells you that your old friend is travelling in the same flight

Search engines today provides personalized responses for queries like “what’s the status of my flight”. Search engine crawling is transitioning from being web based to IoT based.

Search engines are truly moving away from being information providers to becoming personal assistants. In the near future, they may very well book your flight tickets, order a pizza and more.

Checkout How to Crawl Apps with Deep Linking Techniques >>

Page 8: Changing the way people search with apache spark

Experiential Intelligence

Personal Assistant

Evolutionary Brain

Search engines will provide real-time opinion &

experience from customers across the globe

More & more people search online to understand real-time experience from another person. For example, how does the food taste today in a particular restaurant, traffic congestion on a busy road, etc.

Search engines of the future will provide real time information on people’s experiences. It’s almost like asking a customer how does the coffee taste today before you place your order.

Page 9: Changing the way people search with apache spark

Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve.

CHANGING THE WAY PEOPLE SEARCH FOR CODE

OUR EXPERIMENT WITH APACHE SPARK

Page 10: Changing the way people search with apache spark

We learn code with Google

Today’s smart engineers learn with Google. We search for code syntax, use cases, properties – and learn from it. Many a times we end up reading through irrelevant blogs & articles.

As Engineers, we look for solution to solve a problem with examples of how it was done in the past & in what context a particular class was used.

Page 11: Changing the way people search with apache spark

Can we make code search more

contextual?

So, we asked ourselves – can we make code search more contextual & relevant for Engineers with real examples of how a particular piece of code was used in the past.

This opens up new opportunities to explore all possible use cases of a specific class.

We built KodeBeagle. It makes code search contextual.

Explore KodeBeagle >>

Page 12: Changing the way people search with apache spark

KodeBeagleleverages power of

Apache Spark to provide intelligent

code suggestion

KodeBeagle shows most idiomatic usages for any given code snippet. It leverages abundantly available “standard” code library from GitHub to learn interesting and useful coding patterns.

It makes code search easy using Natural Language Query technique. It summarizes new projects & files to aid quick learning.

Explore KodeBeagle >>

Page 13: Changing the way people search with apache spark

Why we chose Apache Spark for

contextual search?

Apache Spark is the next evolutionary change in the big data processing environment as it provides batch as well as streaming capabilities, making it a preferred choice of platform for speedy analysis.

We had to crawl through almost 1 billion lines of open source code from approximately 5,50,000 GitHub projects. Apache Spark provided us the processing speed along with the flexibility required to build this platform.

Page 14: Changing the way people search with apache spark

How does the platform work?

Kodebeagle Crawlers HDFS Storage Spark Compute Cluster

Elasticsearch Cluster KodeBeagle.com

Crawlers cloning GitHub

& storing to HDFS

Spark processes & stores it back

in HDFS

1 2

3Elasticsearch is loaded with processed data from HDFS

4

A webserver as a load balancer and a firewall

exposes the elasticsearchserver to the web

Page 15: Changing the way people search with apache spark

Discovering theme using

Topic Modelling technique

Topic modelling consists of a set of methods that collectively aim to discover the underlying themes within a set of documents. This technique doesn’t work if one wishes to analyze a large corpus — say all the java projects on Github or all the pages of Wikipedia.

To overcome this we developed many probabilistic topic models. These models, leverage the statistical properties of the underlying data to discover the themes or ‘topics’ in that data.

Know More on Topic Modelling in NLP >>

Page 16: Changing the way people search with apache spark

Discovering intentusing Latent

Dirichlet Allocation (LDA) with

Bayesian Network

LDA is a generative model which allows sets of observations to be explained by unobserved groups that explain why some parts of the data are similar. Bayesian network is a kind of probabilistic graphical model that provides a principled way of representing and reasoning possible relationships between random variables. We leveraged these techniques to token-ize the code repo into collection of words & assign sensible values.

Know more about LDA & Bayesian Networks >>

Page 17: Changing the way people search with apache spark

Discovering content

using repo summary with

TASSAL

TASSAL is based on HIERSUM that uses an hierarchical LDA-style model. It represents content specificity as an hierarchy of topic vocabulary distribution. It produces multiple ‘topical summaries’ to facilitate content discovery and navigation.

We parsed repo & built AST. We trained the Topic Model that involves running the topic sampling algorithm for multiple iterations, performing hyper-parameter optimization every k iterations. Once the model is trained, it can be applied to any repo for summarization.

This method does not need any prior information of repos, as it processes information about repos and identify files which summarize the repo.

Page 18: Changing the way people search with apache spark

IN SUMMARY …

Real-time streaming

analytics makes experiential intelligence

possible

Contextual search is the future – start considering

what steps to take

Intent is more than just

‘keywords done better’

Page 19: Changing the way people search with apache spark

SEARCH TOMORROW WILL BE ‘HUMAN LIKE’

Experiential Intelligence

Personal Assistant

Evolutionary Brain

Page 20: Changing the way people search with apache spark

EXPERIENCE THE POWER OF

APACHE SPARK WITH IMAGINEA

Imaginea is among the top contributors to Spark code

We have been building products on Spark since 2014

We are opensource contributors to Apache Hadoop and Zeppelin

To find out more, visit http://www.imaginea.com/apache-spark

Page 21: Changing the way people search with apache spark

Disclaimer

This document may contain forward-looking statements concerning products and strategies. These statements are based on management's current expectations and actual results may differ materially from those projected, as a result of certain risks, uncertainties and assumptions, including but not limited to: the growth of the markets addressed by our products and our customers' products, the demand for and market acceptance of our products; our ability to successfully compete in the markets in which we do business; our ability to successfully address the cost structure of our offerings; the ability to develop and implement new technologies and to obtain protection for the related intellectual property; and our ability to realize financial and strategic benefits of past and future transactions. These forward-looking statements are made only as of the date indicated, and the company disclaims any obligation to update or revise the information contained in any forward-looking statements, whether as a result of new information, future events or otherwise.

All Trademarks and other registered marks belong to their respective owners.

Copyright © 2012-2015, Imaginea Technologies, Inc. and/or its affiliates. All rights reserved.

Credits

Images under Creative Commons Zero license.

Private and confidential. Copyright (C) 2016, Imaginea Technologies Inc. All rights reserve.