Infinite Loops Dirty Architecture And Too Many Indexed URLs

26
INFINITE LOOPS & crawl rank DIRTY ARCHITECTURE Dawn Anderson

description

Dawn Anderson's Brighton SEO deck from April 2014. Looks at crawlability issues on large sites and in particular to infinite URLs / infinite loops, dirty architecture and too many indexed URLs. There is a blog post / article that I wrote for the Brighton SEO newspaper which covers the information in this deck in a lot more detail. It is here: http://bit.ly/Ss6Lf1

Transcript of Infinite Loops Dirty Architecture And Too Many Indexed URLs

Page 1: Infinite Loops Dirty Architecture And Too Many Indexed URLs

INFINITE LOOPS& crawl rankDIRTY ARCHITECTURE

Dawn Anderson

Page 2: Infinite Loops Dirty Architecture And Too Many Indexed URLs

CAMEINDUSTRY

VIA A DIFFERENT ROUTETHIS

to

Page 3: Infinite Loops Dirty Architecture And Too Many Indexed URLs

I decided to add an additional dimension

to the site

TO ‘EXPLODE’ NATURAL SEARCH TRAFFIC

Page 4: Infinite Loops Dirty Architecture And Too Many Indexed URLs

1.5 Million URLs

Page 5: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Crawl RateGoing Down

Indexation LevelsGoing Up

Page 6: Infinite Loops Dirty Architecture And Too Many Indexed URLs

GOOGLEOnly crawling

0.1% Of our pages per

day

Page 7: Infinite Loops Dirty Architecture And Too Many Indexed URLs
Page 8: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Infinite Loop Definition:An infinite loop is a sequence of

instructions in a computer program which loops endlessly, either due to the loop

having no terminating condition, having one that can never be met, or one that

causes the loop to start over. ..

Page 9: Infinite Loops Dirty Architecture And Too Many Indexed URLs

PENGUIN & PANDAupdates came along

Page 10: Infinite Loops Dirty Architecture And Too Many Indexed URLs

TOO MANY URLS=SEO DEATH

‘WE’RE ALL ‘DOOMED’’

Page 11: Infinite Loops Dirty Architecture And Too Many Indexed URLs

BudgetCRAWL

Roughly proportionate to PageRank

Pages with a lot of links get crawled more

Still applies in current search landscape

Page 12: Infinite Loops Dirty Architecture And Too Many Indexed URLs

RankCRAWL

A ranking metric for ‘no’ to ‘low’ PageRank pages??

Pages crawled more often rank higher

Get ‘low’ to ‘no’ PageRank pages crawled more than competitors = YOU WIN

Page 13: Infinite Loops Dirty Architecture And Too Many Indexed URLs

CRAWL OPTIMISATION

Googlebot goes

AND KEEP WATCHING

FIND OUT WHERE

Page 14: Infinite Loops Dirty Architecture And Too Many Indexed URLs

CHECK & MONITORfor over-indexation

500 Page Website 50,00 URLs in

GoogleYOU MAY HAVE DODGY CODE

Page 15: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Shoes.sitemap.xml

Dresses.sitemap.xml

tshirts.sitemap.xml

Check THOROUGHLY, Name & Categorise XML Sitemaps

yoursite.sitemap.xml

Page 16: Infinite Loops Dirty Architecture And Too Many Indexed URLs

DON’T BE AFRAIDof hard 404’s

Use 410’s where you can

Giraffe

AVOIDsoft 404’s

Page 17: Infinite Loops Dirty Architecture And Too Many Indexed URLs

ENSURE THATDynamic variables / parameters are checked for validation

Don’t render to just any old thing with a ‘200 OK’ response code or return a soft 404

HOW WILL YOU KNOW IF THERE’S A PROBLEM?

You won’t

Page 18: Infinite Loops Dirty Architecture And Too Many Indexed URLs

AVOID A ‘JUMBLE SALE’

BUT

Page 19: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Use Robots.txt, nofollows, sitemaps, nav paths & cross

module internal linking

‘Herd’ Googlebot

Page 20: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Get Those Low Level Pages Crawled - OftenWhichever way you can

Pass equity to Siblings as Well as children

Page 21: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Visit the internal links section on GWT

Most Important Page 1

Most Important Page 2

Most Important Page 3

IS THIS YOUR BLOG?? HOPE NOT

Page 22: Infinite Loops Dirty Architecture And Too Many Indexed URLs

CANONICALISATIONIn web search and search engine optimization (SEO), URL

canonicalization deals with web content that has more than one

possible URL. Having multiple URLs for the same web content

can cause problems for search engines - specifically in determining which URL should be shown in search results.[2]

Example:

•http://wikipedia.com

•http://www.wikipedia.com

•http://www.wikipedia.com/

•http://www.wikipedia.com/?source=asdf

All of these URLs point to the homepage of Wikipedia,

but a search engine will only consider one of them to

be the canonical form of the URL.(source - Wikipedia)

Page 23: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Deal Well WithNear & near duplicate content

Via canonicalization, 301’s & Content Build Out

Page 24: Infinite Loops Dirty Architecture And Too Many Indexed URLs

STOP LYING & ‘GET FRESH’

Genuine ‘last modified dates’ are ALL important- FORGET PRIORITY

Page 25: Infinite Loops Dirty Architecture And Too Many Indexed URLs

"It's not that Google will penalize you, it's the opportunity cost for dirty architecture based on a finite crawl budget" (A.J.Kohn) (BLIND FIVE YEAR OLD)

REMEMBER THIS

Page 26: Infinite Loops Dirty Architecture And Too Many Indexed URLs

Me@dawnieando