5 Algorithms Every Web Developer Can Use and Understand

24
Sheldon Kreger Web Engineer Five Algorithms Every Web Developer Can Use and Understand

Transcript of 5 Algorithms Every Web Developer Can Use and Understand

Page 1: 5 Algorithms Every Web Developer Can Use and Understand

Sheldon KregerWeb Engineer

Five Algorithms Every Web Developer Can Use and Understand

Page 2: 5 Algorithms Every Web Developer Can Use and Understand

Make state-of-the-art algorithms

accessible and discoverable by

everyone.

Page 3: 5 Algorithms Every Web Developer Can Use and Understand

Sample algorithms

● Text Analysis summarizer, sentence tagger, profanity detection

● Machine Learning digit recognizer, recommendation engines

● Web crawler, scraper, pagerank, emailer, html to text

● Computer Vision image similarity, face detection, smile detection

● Audio & Video speech recognition, sound filters, file conversions

● Computation linear regression, spike detection, fourier filter

● Graph traveling salesman, maze generator, theta star

● Utilities parallel for-each, geographic distance, email validator

Page 4: 5 Algorithms Every Web Developer Can Use and Understand

A marketplace for algorithms...

We host algorithmsAnyone can turn their algorithms into scalable web servicesTypical users: scientists, academics, domain experts

We make them discoverableAnyone can use and integrate these algorithms into their solutionsTypical users: businesses, data scientists, app developers, IoT makers

We make them monetizableUsers of algorithms pay for algorithms they useTypical scenarios: heavy-load use cases with large user base

Page 5: 5 Algorithms Every Web Developer Can Use and Understand
Page 6: 5 Algorithms Every Web Developer Can Use and Understand
Page 7: 5 Algorithms Every Web Developer Can Use and Understand

+ CLIENTS

Page 8: 5 Algorithms Every Web Developer Can Use and Understand

Why?

Create something biggerEasily combine algorithms like building blocks, regardless of language

Growing Catalogue of AlgorithmsNew algorithms everyday, make them usable by software developers

Make applications smarterSmarter algorithms = cooler toys

Page 9: 5 Algorithms Every Web Developer Can Use and Understand

The Five Algorithms

- Sentiment Analysis- Language Detection- PageRank- Nudity Detection- Term Frequency-Inverse

Document Frequency

Page 10: 5 Algorithms Every Web Developer Can Use and Understand

Sentiment Analysis - Practical Applications

Businesses frequently seek feedback

on the quality of their products

from consumers, and large

amounts of reviews require too

much time to manually review.

Data can be used to in various

forecasting applications, such as

political elections.

Page 11: 5 Algorithms Every Web Developer Can Use and Understand

Sentiment Analysis - The Math

Basic sentiment analysis uses natural language

processing (NLP), via a “bag of words”, to

spot keywords that are signs of strong

emotional triggers. Once spotted, they

classify a document as positive, negative, or

neutral.

Statements can be dual in nature, such as “I

loved the food, BUT hated the service”. This

requires more advanced algorithms to

separate the two.

Page 12: 5 Algorithms Every Web Developer Can Use and Understand

PageRank - Practical Applications

The assumption is the more inbound links to a

page across the web, the more valid its

content.

Most famous application of PageRank is the

Google search engine. Its initial success is

based largely on the success of PageRank.

Not only web pages can utilize PageRank. Any

data that can be directionally modeled can

interact with PageRank.

Page 13: 5 Algorithms Every Web Developer Can Use and Understand

PageRank - Graph Terminology

Node (vertex): Item in graph.

Edge: Relationship between two or more

nodes.

Directionality: Property of an edge indicating

nature of relationship.

Page 14: 5 Algorithms Every Web Developer Can Use and Understand

PageRank - The Math

Page 15: 5 Algorithms Every Web Developer Can Use and Understand

Nudity Detection - Practical Applications

Nudity detection algorithms minimize the need for manual moderation and deletion

of malicious content.

In a CMS, this algorithm can help prevent pornography from being uploaded by

users.

Page 16: 5 Algorithms Every Web Developer Can Use and Understand

Nudity Detection - The Math

1. Detect skin-colored pixels in the image.

2. Locate skin regions based on the detected pixels.

3. Detect face in image.

4. Calculate ratio of skin toned vs non-skin toned pixels in image, taking into

account the size of the face.

5. Classify the image as nude or not.

More information at: https://algorithmia.com/algorithms/sfw/NudityDetection

Page 17: 5 Algorithms Every Web Developer Can Use and Understand

Nudity Detection - Vanilla PHP

Page 18: 5 Algorithms Every Web Developer Can Use and Understand

Nudity Detection PHP

Page 19: 5 Algorithms Every Web Developer Can Use and Understand

TF/IDF - Practical Applications

- Keyword extraction is used in search engines, and content

categorization algorithms.

- Creates great content recommendations!

- https://drupal.org/project/algorithmia

- https://wordpress.org/plugins/algorithmia

- https://algorithmia.com/recommends

Page 20: 5 Algorithms Every Web Developer Can Use and Understand

TF/IDF - The Math

TF-IDF computes a weight, recognizing the

importance of a term inside a document,

comparing its usage frequency in the

document set.

The more a term appears, the higher its

importance becomes.

Thanks toothpastefordinner.com for the comic.

Page 21: 5 Algorithms Every Web Developer Can Use and Understand

TF/IDF - The Math

Assume you have a 100 word blog post with the word "JavaScript" in it 5 times.

Term Frequency = 5/100 = 0.05

Also assume your entire collection of blog posts has 10,000 documents, and the

word "JavaScript" appears at least once in 100 of these.

Inverse Document Frequency = log(10,000/100) = 2

For this document, this gives us the score:

TF-IDF = 0.05 * 2 = 0.1

Page 22: 5 Algorithms Every Web Developer Can Use and Understand

Language Detection - Practical Applications

Applications

- Web searching, as engines bring up sites in dozens of languages.

- May be required in conjunction with other Natural Language Processing (NLP)

algorithms. Data sets may include documents in other languages. Some

algorithms will only work in their natural language due to their training data.

- Spam filtering services, so they can properly filter out specific languages and areas

of origin.

Page 23: 5 Algorithms Every Web Developer Can Use and Understand

Language Detection - The Math

Each language has a corpus at its core, a central pattern of components that uniquely

identifies it.

Profiling algorithms are used to set a core set of words to identify that language.

The problem is not all text is long enough to identify a language.

Instead, using the 3-gram algorithm via Algorithmia API, one HTTP request can

break down detection by looking at groups of 3 letters.

Page 24: 5 Algorithms Every Web Developer Can Use and Understand

Algorithmia Credits

Sign up at https://algorithmia.com

Use code: FiveAlgorithmsBook

10,000k additional free API credits.