5 Algorithms Every Web Developer Can Use and Understand
-
Upload
matt-kiser -
Category
Software
-
view
423 -
download
1
Transcript of 5 Algorithms Every Web Developer Can Use and Understand
Sheldon KregerWeb Engineer
Five Algorithms Every Web Developer Can Use and Understand
Make state-of-the-art algorithms
accessible and discoverable by
everyone.
Sample algorithms
● Text Analysis summarizer, sentence tagger, profanity detection
● Machine Learning digit recognizer, recommendation engines
● Web crawler, scraper, pagerank, emailer, html to text
● Computer Vision image similarity, face detection, smile detection
● Audio & Video speech recognition, sound filters, file conversions
● Computation linear regression, spike detection, fourier filter
● Graph traveling salesman, maze generator, theta star
● Utilities parallel for-each, geographic distance, email validator
A marketplace for algorithms...
We host algorithmsAnyone can turn their algorithms into scalable web servicesTypical users: scientists, academics, domain experts
We make them discoverableAnyone can use and integrate these algorithms into their solutionsTypical users: businesses, data scientists, app developers, IoT makers
We make them monetizableUsers of algorithms pay for algorithms they useTypical scenarios: heavy-load use cases with large user base
+ CLIENTS
Why?
Create something biggerEasily combine algorithms like building blocks, regardless of language
Growing Catalogue of AlgorithmsNew algorithms everyday, make them usable by software developers
Make applications smarterSmarter algorithms = cooler toys
The Five Algorithms
- Sentiment Analysis- Language Detection- PageRank- Nudity Detection- Term Frequency-Inverse
Document Frequency
Sentiment Analysis - Practical Applications
Businesses frequently seek feedback
on the quality of their products
from consumers, and large
amounts of reviews require too
much time to manually review.
Data can be used to in various
forecasting applications, such as
political elections.
Sentiment Analysis - The Math
Basic sentiment analysis uses natural language
processing (NLP), via a “bag of words”, to
spot keywords that are signs of strong
emotional triggers. Once spotted, they
classify a document as positive, negative, or
neutral.
Statements can be dual in nature, such as “I
loved the food, BUT hated the service”. This
requires more advanced algorithms to
separate the two.
PageRank - Practical Applications
The assumption is the more inbound links to a
page across the web, the more valid its
content.
Most famous application of PageRank is the
Google search engine. Its initial success is
based largely on the success of PageRank.
Not only web pages can utilize PageRank. Any
data that can be directionally modeled can
interact with PageRank.
PageRank - Graph Terminology
Node (vertex): Item in graph.
Edge: Relationship between two or more
nodes.
Directionality: Property of an edge indicating
nature of relationship.
PageRank - The Math
Nudity Detection - Practical Applications
Nudity detection algorithms minimize the need for manual moderation and deletion
of malicious content.
In a CMS, this algorithm can help prevent pornography from being uploaded by
users.
Nudity Detection - The Math
1. Detect skin-colored pixels in the image.
2. Locate skin regions based on the detected pixels.
3. Detect face in image.
4. Calculate ratio of skin toned vs non-skin toned pixels in image, taking into
account the size of the face.
5. Classify the image as nude or not.
More information at: https://algorithmia.com/algorithms/sfw/NudityDetection
Nudity Detection - Vanilla PHP
Nudity Detection PHP
TF/IDF - Practical Applications
- Keyword extraction is used in search engines, and content
categorization algorithms.
- Creates great content recommendations!
- https://drupal.org/project/algorithmia
- https://wordpress.org/plugins/algorithmia
- https://algorithmia.com/recommends
TF/IDF - The Math
TF-IDF computes a weight, recognizing the
importance of a term inside a document,
comparing its usage frequency in the
document set.
The more a term appears, the higher its
importance becomes.
Thanks toothpastefordinner.com for the comic.
TF/IDF - The Math
Assume you have a 100 word blog post with the word "JavaScript" in it 5 times.
Term Frequency = 5/100 = 0.05
Also assume your entire collection of blog posts has 10,000 documents, and the
word "JavaScript" appears at least once in 100 of these.
Inverse Document Frequency = log(10,000/100) = 2
For this document, this gives us the score:
TF-IDF = 0.05 * 2 = 0.1
Language Detection - Practical Applications
Applications
- Web searching, as engines bring up sites in dozens of languages.
- May be required in conjunction with other Natural Language Processing (NLP)
algorithms. Data sets may include documents in other languages. Some
algorithms will only work in their natural language due to their training data.
- Spam filtering services, so they can properly filter out specific languages and areas
of origin.
Language Detection - The Math
Each language has a corpus at its core, a central pattern of components that uniquely
identifies it.
Profiling algorithms are used to set a core set of words to identify that language.
The problem is not all text is long enough to identify a language.
Instead, using the 3-gram algorithm via Algorithmia API, one HTTP request can
break down detection by looking at groups of 3 letters.
Algorithmia Credits
Sign up at https://algorithmia.com
Use code: FiveAlgorithmsBook
10,000k additional free API credits.