Optimal SEO (Marianne Sweeny)

33
Some time ago, we fell asleep at the switch. Search engines are now “evaluating the merit” of our content and are not entirely clear about the criteria that they are using. 1

description

Given at UXPA-DC's User Focus Conference, Oct. 19, 2012

Transcript of Optimal SEO (Marianne Sweeny)

Page 1: Optimal SEO (Marianne Sweeny)

Some time ago, we fell asleep at the switch. Search engines are now “evaluating the merit” of our content and are not entirely clear about the criteria that they are using.

1

Page 2: Optimal SEO (Marianne Sweeny)

This presentation is about Google’s latest updates, Panda and Penguin, and how they impact the content that is retained by the search engines and presented in search results. We will look at: 1. What has happened with search engine technology over the years and what it is today 2. Why we should care. How search engine technology impacts what we do. How what we

do can impact the performance of search engines. 3. What we can do about it.

2

Page 3: Optimal SEO (Marianne Sweeny)

Search engines came first. They have been around for over 70 years, since the their early days of “information retrieval” when text began to be electronically transformed in the late 40’s. However, information organization and retrieval goes back even further than that…

3

Page 4: Optimal SEO (Marianne Sweeny)

An argument could be made that “search engine” optimization came first with the early great care was taken to present information in a “findable” fashion…e.g. great care by a designated few to make information available in limited format to the limited few who would consume and make available to the masses. People optimized text for people.

4

Page 5: Optimal SEO (Marianne Sweeny)

Then came the beautiful places where the information was organized in a standardized way so that people could find it. And helpful people to ask for help finding information if we got lost. Early search engines used traditional information retrieval concepts and structured content repositories that were mediated by human generated metadata. Dialog & ProQuest where SQL queries rules, thought-processing bipeds associated tags, categories and abstracts to the content item. dB methods of linear query construction delivered most success.

5

Page 6: Optimal SEO (Marianne Sweeny)

First web page can still be found here http://www.w3.org/History/19921103-hypertext/hypertext/WWW/TheProject.html Then came the World Wide Web, altruistically developed by Tim Berners Lee so that the military, industrial and scientific complexes could communicate with each other, be on the same page and save money in the long distance exchange of information. This worked well until the medium was made available to the rest of us. The result….

6

Page 7: Optimal SEO (Marianne Sweeny)

Then limitless growth, questionable quality and zero governance with no end in sight • 1997: 15 million pages • 2010: Google announces its 100 billion+ page index • 2012: rumored 1 trillion URLs found

7

Page 8: Optimal SEO (Marianne Sweeny)

© Tefko Saracevic

8

Source: Saracevic 1997, Information Today One thing that did not change was information retrieval (IR). Despite the technology advancements, the IR process remained the same.

Page 9: Optimal SEO (Marianne Sweeny)

Slide from LIS 544 IMT 542 INSC 544 by Jeff Huang [email protected] and Shawn Walker [email protected]

1. Documents were selected from the index based on the presence of query terms in

document text. 2. Documents containing more of the term(s) scored higher 3. Longer documents discounted 4. Rare terms weighted higher

9

Page 10: Optimal SEO (Marianne Sweeny)

The environment, devices, participants and content has changed. What does that mean for IR? Search Engines?

10

Page 11: Optimal SEO (Marianne Sweeny)

IR’s locked in legacies are centered on • text deconstruction • the capacity for sequential instructions to derive meaning, • its reliance on systems that do not scale well and while incorporating human

behavior, do not fully understand it

Search engines today believe that it is perfectly natural for them to abstract the whole based on the nature of a small subset = “digital Maoism”

11

Page 12: Optimal SEO (Marianne Sweeny)

Using Google’s Latent Semantic Indexing, a machine-learning technique that manually maps relationships, a search for ~vacation turns up results for: hotels, rentals, travel, tourism, resorts… Machines know only what they are trained to know. Rules are based on an analysis of a subset and applied to the content corpus writ large. Machines have no sense of accountability when things go bad.

12

Page 13: Optimal SEO (Marianne Sweeny)

Stanford research project that was once greeted as a savior due to the simplicity and seeming incorruptability. Both creators PHD students in data mining Standard IR with introduction of 2 human elements

1. Random Surfer model •At any time t, surfer is on some page P •At time t+1, the surfer follows an outlink from uniformly at random •Ends up on some page Q (from page P) •Process repeats indefinitely

2. Link = vote Unfortunately, flaws in this system were soon revealed: 1. Those who were able to build links dictated relevance for the rest 2. The cottage industry of SEO started building links for reasons other then endorsing the

merits of site content

13

Page 14: Optimal SEO (Marianne Sweeny)

Google goes public around this time and the cash infusion enables expansion Starts acquiring top computer scientists Google purchases technology (Kaltix – personalized search, context sensitive search) This is the first step away from the PageRank model, not entirely though as PageRank is part of Google’s locked-in technology foundation. And the response from us thought-processing bipeds?

14

Page 15: Optimal SEO (Marianne Sweeny)

We’re constructing worse queries but feel that we’re getting better results. Which canary in what coal mine just died?

15

Page 16: Optimal SEO (Marianne Sweeny)

Using the Internet: Skill Related Problems in User Online Behavior; van Deursen & van Dijk; 2009 Pew Internet Trust Study of Search engine behavior http://www.pewinternet.org/Reports/2012/Search-Engine-Use-2012/Summary-of-findings.aspx In January 2002, 52% of all Americans used search engines. In February 2012 that figure grew to 73% of all Americans. On any given day in early 2012, more than half of adults using the internet use a search engine (59%). That is double the 30% of internet users who were using search engines on a typical day in 2004. And people’s frequency of using search engines has jumped dramatically. Moreover, users report generally good outcomes and relatively high confidence in the capabilities of search engines: 91% of search engine users say they always or most of the time find the information they are seeking when they use search engines 73% of search engine users say that most or all the information they find as they use search engines is accurate and trustworthy 66% of search engine users say search engines are a fair and unbiased source of information 55% of search engine users say that, in their experience, the quality of search results is getting better over time, while just 4% say it has gotten worse 52% of search engine users say search engine results have gotten more relevant and useful over time, while just 7% report that results have gotten less relevant And Google’s response…

16

Page 17: Optimal SEO (Marianne Sweeny)

Location on the page = good quality content “The goal of many of our ranking changes is to help searchers find sites that provide a great user experience and fulfill their information needs. We also want the “good guys” making great sites for users, not just algorithms, to see their effort rewarded. To that end we’ve launched Panda changes that successfully returned higher-quality sites in search results. And earlier this year we launched a page layout algorithm that reduces rankings for sites that don’t make much content available “above the fold.” Matt Cutts http://googlewebmastercentral.blogspot.com/2012/04/another-step-to-reward-high-quality.html

UX run Amok: if not enough content appears above the fold, the page will be seen as less relevant? How many are dictating this for the rest of us? Where did they get this from?

“As we’ve mentioned previously, we’ve heard complaints from users that if they click on a result and it’s difficult to find the actual content, they aren’t happy with the experience. Rather than scrolling down the page past a slew of ads, users want to see content right away. So sites that don’t have much content “above-the-fold” can be affected by this change. If you click on a website and the part of the website you see first either doesn’t have a lot of visible content above-the-fold or dedicates a large fraction of the site’s initial screen real estate to ads, that’s not a very good user experience. Such sites

17

Page 18: Optimal SEO (Marianne Sweeny)

may not rank as highly going forward.” http://insidesearch.blogspot.com/2012/01/page-layout-algorithm-improvement.html

17

Page 19: Optimal SEO (Marianne Sweeny)

Panda 1.0: Google’s first salvo against “spam” (shallow, thin content sites) in the form of content duplication and low value

original content (i.e. “quick, give me 200 words on Brittany Spear’s vacation in the Maldives”) – biggest target was content

farms – Biggest Impact: keyword optimization and link building

Keyword optimization: Shift in focus from text on page to user experience makes optimizing for keywords counter

intuitive. Biggest impact: shift from developer/shady SEO influence to usability/user experience focus – average loss in

positioning (% of KWs falling out of top 10 search results) – 70 to 90% for sites like merchantcircle.com, find articles.com,

buzzle.com, mahalo.com and ezinearticles.com (SISTRIX)

Link building: PageRank does not scale well to a 1 trillion page Web. Google cannot calculate PR fast enough to rerank

sites. PR now devalued as strongest influence behind ranking. Biggest impact: link building for higher PR = “what’s the

point?”

Panda 2.0: Changed rolled out to all English language queries English speaking countries , UK, Australia, etc., and in

countries where English Language results are stipulated. Ranking incorporates searcher “blocking” data (from Google

Chrome feature).

Panda 2.1: Having unique content not enough – quality factors introduced (some below)

Trustworthiness: with my credit card information

Uniqueness: is this saying what I’ve found somewhere else

Origination: does the person writing the content have “street cred,” do I believe that this is an

authoritative resource on this topic

Display: does the site look professional, polished

Professional: is the content well constructed, well edited and without grammatical or spelling errors

Panda 2.2: Google going after site scrapers that repurpose content not their own or those who “outsource” content

development and maintenance

Panda 2.3: Bounce rate (whether the user engages with the page at all) – Click through - Conversion

18

Page 20: Optimal SEO (Marianne Sweeny)

And sort of blames SEO for it (not outright but in a passive/aggressive) kind of way 2007 Google Patent: Methods and Systems for Identifying Manipulated Articles (November 2007) Manipulation: • Keyword stuffing (article text or metadata) • Unrelated links • Unrelated redirects • Auto-generated in-links • Guestbook pages (blog post comments) Followed up: Google Patent: Content Entity Management (May 2012)

19

Page 21: Optimal SEO (Marianne Sweeny)

February 2011: algorithm focused on content quality - originally thought to be aimed at content farms

June 2011: update to identify scraped or duplicated content

October 2011: unannounced update to rectify site “unfairly impacted” by original updates

January 2012: sites with too much ad space above the fold are devalued

The slide lists approximately 10% of the changes that Google told us about and what they tell us about likely represents .10% of the changes that they actually make. (source: http://insidesearch.blogspot.com) Re: freshness bug fix: “This change turns off a freshness algorithm component in certain cases when it should be affecting the search results.” Will serve up the newer document when choosing between two (from a given site)

20

Page 22: Optimal SEO (Marianne Sweeny)

Where’s Heidi Klum when we need her. Google’s quality content bar is higher and more subjective than Project Runway. Google: Arbiter of Content & Relevance http://www.stonetemple.com/matt-cutts-and-eric-talk-about-what-makes-a-quality-site/ “Those other sites are not bringing additional value. While they’re not duplicates they bring nothing new to the table.” Google’s advice to site owners: “If it is already a crowded space with entrenched players, consider focusing on a niche area initially, instead of going head to head with the existing leaders of the space.”

21

Page 23: Optimal SEO (Marianne Sweeny)

The Penguin update is a bit different because it is an aggressive move on Google’s part that starts with an algorithmic review. If a threshold is crossed, a human review takes place and most sites are then significantly demoted in rankings or removed all together.

• Overly repetitive anchor text (“manipulative, repetitive anchor text”) • Blog comments filled with spam (reviews/comments that contain links to “spam”) –

Google’s definition of spam similar to Supreme Court for • Porn, no explanation of what this is. The search engine spiders just know it when they see

it • Obscene content • Web “clusters” – multiple Web sites on the same host, from same domain owner, linking

to article in artificial way

22

Page 24: Optimal SEO (Marianne Sweeny)

Targets “exact match” keyword-ed links or aggressive anchor text to google • sites penalized had “moneyed keywords” in 65% of their incoming links • Obviously aimed at the long standing practice of outsourcing link building to 3rd

world countries and the weed-like growth of useless directories (i.e. link farms) Too many links from “related sites

• Same niche • Same domain host • Same domain owner

Standard SEO signals • Stuffed <title> and metaDescription • Hidden text • Unrelated links on and pointing to the page • Computer generated text (i.e. dynamically rendered product pages)

23

Page 25: Optimal SEO (Marianne Sweeny)

24

Page 26: Optimal SEO (Marianne Sweeny)

The search engines think that we’re superfluous because we don’t “get search” That’s what I’m here to end. I want you to “get search.” We are information professionals, not mice! We’re going to use every neuron, synapsis and gray cell to fight back. We will shift from trying to optimize search engine behavior to optimizing what the search engines consume, move from search engine optimization to information optimization • We will Focus • We will be Collaborative • We will get Connected • We will stay Current

Because we are user experience professionals, not Matt Cutts, Sergey Brin or Larry Page.

25

Page 27: Optimal SEO (Marianne Sweeny)

26

Page 28: Optimal SEO (Marianne Sweeny)

Tools:

Core Metadata: 20-30 terms that represent intersection between client objectives and how

their customers search for the product/service

Content analytics: top pages, bounce rate, visitor flow

Content audit: keep/kill/revise based on thorough review using manual audit or tools

available through resources those from @content_insight

27

Page 30: Optimal SEO (Marianne Sweeny)

If it barks, sings, dances, plays, changes whatever, annotate with something the search engine can crawl, deconstruct, associate with surrogate and store in the index • Relational content model: Next Steps as well as More Information using: guided

tours, Best Bets, produced view, etc • Best Bets: editorially assigned result that may not be chosen by the search engine • Guided Tours: built on analysis of other user pathways and knowledge of corpus

Produced Views: page of assembled content items focused on a single subject • Task List Drop Downs: “I Want To…” links to pages of assembled content focused

on single common task

29

Page 31: Optimal SEO (Marianne Sweeny)

30

Page 32: Optimal SEO (Marianne Sweeny)

This is a team effort.

31

Page 33: Optimal SEO (Marianne Sweeny)

It is not too soon to get started.

32