Internet MarketingPart 2 of 12
George A. Rubsam@GARubsam
2
Background: Websites• 60 trillion webpages and constantly growing.• Google = Googol 1 with 100 zeros• Build your website for users (first) and search
engines (second).- Know what you are trying to accomplish with your
website (Chapter 1)- Know the audience for which you are building the
site. Needs, wants, internet behaviors, etc.
3
Background: Market Share
• Google U.S. market share has plateaud at 67%
• Google European market share is 90%+
4
Position Matters
• #1 position = 18.2% of all clicks
• #2 position = 10.1% • #3 position = 7.2%• #4 position = 4.8% • #5 position and greater
= <2%• Top ten positions =
52.32%
• Google Search
How Search Engines Work: Crawling and
Indexing
6
How Search Engines Work• Start with a “Spider” or “Webcrawler”
- Software the reads webpages.• Text• Meta tags/HTML code
- It creates an index of content on each page.- Follows every link and indexes those pages too.- It indexes entire site maps/website architecture.- Creates a searchable index database noting the
content on every readable webpage.
7
What is an Index?• Example: Back of a book section lists
keywords, people, concepts, events, page numbers etc.
• Without an index, every webpage would have be scanned, then ranked, which could take hours or days.
- Ex: You would have to scan every page of the book to find the relevant pages.
8
Indexed Data Trade Off• Example: Shirt company expands product line
to include coats.• Google indexes the Internet in two to four
weeks.- Relevance to “coat” keyword searches could take
weeks or months.- Exceptions: Viral videos and lots of press
• Users save time in search, but there could be potential delay in being considered relevant to certain keywords.
9
What Spiders See
10
11
Indexing Keywords, Files & Multimedia
• Spiders easily read text files, but are challenged by images, video, or animations (e.g., Flash or Java).
• Algorithms look at the webpage content and HTML code associated with the file.
How Search Engines Work: Providing Answers
13
What is a Search Engine?• A program that
- Accepts a query and searches an index database.- Using algorithms it finds keyword combinations in a
database that correspond with a query.- Ranks all the webpages that are relevant to the
keywords, not just the ones that matched.- Provides a list of webpages in order of relevance.
14
Types of Queries• Do
- Accomplishment: buy a plane ticket or shoes.• Know
- Information: Names of American designers ex: Ralph Lauren.
• Go - Visit Bloomingdales (specific), department stores
(less specific), shoe stores (least specific).
15
Keyword: Fashion
• Trends• Magazines• News (weather.com)• Content Aggregators• Wikipedia• Images + Categories
16
Relevance Ranking FactorsRanked most impactful to least• Domain-Level, Link Authority Features
- Quantity of links, trust, domain-level PageRank, etc.• Page-Level Link Metrics:
- Quality/spamminess of linking sources, etc.• Page-Level Keyword & Content-Based Metrics
- Content relevance scoring, on-page optimization of keyword usage, topic-modeling algorithm scores on content, content quantity/quality/relevance, etc.
• Page-Level, Keyword-Agnostic Features- Content length, readability, uniqueness, load speed, etc.
• User Usage & Traffic/Query- Data Search Engine Results Page (SERP), engagement metrics, clickstream data, visitor
traffic/usage signals, quantity/diversity/CTR of queries, both on the domain and page level
• Domain-Level Brand Metrics- Usage of brand/domain name relative to mentions in news/media/press, browser data
of usage for specific site/page.• Domain-Level Keyword Usage
- Exact-match keyword domains, partial-keyword matches.• Domain-Level, Keyword-Agnostic Features
- Domain name length• Page-Level Social Metrics
- Quantity/quality of tweeted links, Facebook shares, Google +1s, etc. to the page
17
Relevance Ranking Factors• How many links from the site to other relevant
sources?• Trustworthy server? Long server history?• Quality of sites linking back to your content.• Content readable, unique or “scraped”?• Are people engaging with the content?• Is the site/brand mentioned in credible news sources?• Does the site have exact or partial-keyword matches?• Is the domain relevant to the query?• On what social media platforms and how often is your
site/brand mentioned?
18
Ranking Logic: Keyword Hits
Hit Type Weight*
URL 100
Title Tag 95
Anchor Text 90
Text large 60
Text medium 30
Text small 10
*For context only and not accurate.
• Notice how is know Evening Dresses relates to gowns, cocktail dresses, formal dresses, etc.
• Google uses 200+ considerations to rank content.
19
Ranking Logic: Keyword Hits Hit Type Type Weight No. of Hits Weight x Hits
URL 100 1 100
Title Tag 95 1 95
Anchor Text (links) 90 5 450
Text large font 60 1 60
Text medium font 30 3 90
Text small font 10 20 200
Relevance Score 995For context only and not accurate.
20
Ranking Logic: Keywords• Number of times a word appears on a page.• Where on page terms occur (distribution)
- URL, header, headlines, body copy, footer
• The main theme and topics (on-topic issues) of the page.
- Ray Ban Sunglasses referenced on TMZ vs. PopSugar.• Relative distance between keywords (proximity)
- Evening dresses … … … … evening dresses- Evening dresses … evening dresses (most relevant)- Evening dresses, evening dresses, evening dresses, etc.
• The frequency between individual terms (occurrence)- Evening dresses … evening … dresses … dress … related words
(dinner, dancing, gowns, party planning, etc.)
Search Engine Limitations
22
Search Engine Limitations• Search engines struggle with completing login/online forms.
- CAPTCHA protects forms from bots.
• “Robots.txt” code purposely blocking search engines.• “Nofollow links” code blocks search engines from following
links.• Minimally-exposed content may be deemed unimportant by
the engine's index.• Uncommon terms normally unused in seach.
- Ex: "food cooling units” vs. "refrigerators"• International languages
- Color vs. Colour- Content in French when the majority of the visitors are from Japan.
• Mixed context signals- Blog post reads "Mexico's Best Coffee" but content is about a
Canadian vacation resort that serves great coffee.
23
Broken Link Structures
• Spiders can reach page A and sees links to pages B and E.
• Without a link pages C and D are not accessible.
24
Limiting Search Engine Access
• The robots exclusion protocol (REP), or robots.txt is a text file webmasters create to instruct robots (search engine software) how to crawl and index pages on their website.
Addendum
26
Resources• Video: How Search Works• Interactive Page: How Search Works• Internet Archive: Wayback Machine
27
Anatomy of a Link
• "<a" tag indicates the start of a link. • The link referral location tells the browser (and the search engines)
where the link points. • Next, the visible portion of the link for visitors, called anchor text in
the SEO world, describes the page the link points to. • "</a>" tag closes the link to constrain the linked text between the
tags and prevent the link from encompassing other elements on the page.
• The crawlers can read this basic link and use it to calculate query-independent variables, and follow it to index the contents of the referenced page.
28
Ranking Factors Survey
QuestionsPart 3 of 12 is up next.
Top Related