Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017

Post on 21-Jan-2018

3.147 views 0 download

Transcript of Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

StrategiQChris Green

@chrisgreen87http://bit.ly/snog-marry-avoid

Robots: Txt, Meta & X The Snog, Marry & Avoid of the Webcrawling World

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

How do we knowthe best way to manage

Googlebot’s crawl/indexing?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

There are many methods(we’re spoilt for choice really)

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

But the two most commonly misused are Robots.txt vs Meta Robots directives

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Why?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

To the casual observer they’re very similar ways of doing the same thing…

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

To block Google

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

But that’s not a helpful way of thinking of them

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

In many

it can stop you getting the most out of your site

circumstances

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

I’m going to run through a framework

to help to change this thinking

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

I’m going to run through a framework

to help to you make the right choices

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

But some words of warning

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

This is advanced stuffOne foot wrong & you could cause serious damage

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

They’re not always the first-choice

These are only part of your toolkit

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

There are so many “ifs” & “buts”

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

If we can finish today

with slightly more understanding

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

And a different approach

then we’re onto a winner!

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Time to introduce the robots

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Robots.txt

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Meta Robots

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

X-Robots

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Possibly the most important SEO tools

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

But which do you...

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Snog

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Marry

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Or avoid?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

But what does a slightly s**t

BBC 3 show have to do with SEO?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

One site’s “snog” is

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

One site’s “snog” is another’s “marry”

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

One site’s “snog” is another’s “marry”

Or perhaps even “avoid”

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

There are lots of thoughts on how to use these

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

There are lots of thoughts on how to use these - many are wrong

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

I’m going to show you a way of simplifying

things

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

To pick the right tool for the job

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Know the problem you’re trying to fix!

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Is it a crawl problem?

Google isn’t seeing enough of your site

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Or an index problem?

Google’s indexing too much of it

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

We fix crawl problems with Robots.txt

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Disallow: /*mad-spider-trap*

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

And we fix index problems with Meta Robots

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

www.domain.com/crap-page

<meta name=“robots” content=“NOINDEX, FOLLOW”>

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Identifying the problem

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Index problems are simple to spot

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Does your site look too big?

(in Google’s eyes)

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

But ID’ing a crawl problem...

Can be trickier

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Look for spider traps

https://www.portent.com/blog/seo/field-guide-to-spider-traps-an-seo-companion.htm

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Where does this “cost” you on crawl budget?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

A word on crawl budget.

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

It’s “a thing”

http://searchengineland.com/google-explains-crawl-budget-means-webmasters-267597

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

But Google doesn’t publicise a site’s crawl

budget

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

You can work out a version of it yourself

Thanks to Yoast for this - https://yoast.com/crawl-budget-optimization/

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Look at GSC Crawl stats average pages crawled per day

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Look at GSC, how big Google sees your site as

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Pages / Avg crawled per day =

Crawl Score

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

9,781 / 1,458 = 6.7

x 6.7 more pages than are getting crawled each day

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

If you have 10

You have 10x the pages that Google is crawling daily

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

A pretty big crawl problem!

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

But, how big is big?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

< 1,000 pages, crawl budget is less of a problem

https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html?m=1

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

1,000 - 10,000 is moderate

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

10,000+ pages… things start to get “fun”

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Crawl & Index problems aren’t mutually exclusive

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Index bloat at scale can hurt crawl

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Crawl issues can stop or slow the repair of

index issues

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Some example scenarios

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

eCommerce filters which are getting indexed

(badly)

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

www.domain.com/shop/mens/trainers/size-12/red/

<meta name=“robots” content=“NOINDEX, FOLLOW”>

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Disallow: /*size*Disallow: /*red*

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Disallow: /*size*Disallow: /*red*

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Not until index issue is cleared up*

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

*Unless...

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Noindex: /*size*Disallow: /*size*Noindex: /*red*Disallow: /*red*

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Google isn’t “cool” with this

https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

But it’s proved to work

https://www.deepcrawl.com/blog/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/

http://ohgm.co.uk/de-index-pages-blocked-robots-txt/

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Only ~0.3% of the Majestic Million use this method

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Don’t be too aggressive though!

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Be aware that some filtered pages can be

worth indexing

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Blog taxonomies

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

www.domain.com/blog/blog-category/www.domain.com/blog/tags-bloody-tags/

<meta name=“robots” content=“NOINDEX, FOLLOW”>

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

eCommerce site without indexed filters but x6+ crawl

score

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Disallow: /*filters*

(& meta robots just in case)

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Other misc pages?

Just noindex

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Anything else?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Back to my original premise

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Meta Robots is my“marry”

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Robots.txt is my “snog”

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

It really can make the difference

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Meta robots replaced with robots.txt disallow:

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

But it is easy to screwit up

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

What about x-robots?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

For when meta robots isn’t possible…

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

For when meta robots isn’t possible…

… assuming you can edit htaccess

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

It’s not my “avoid” though

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

What is?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

The lazy option!

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Disallow: /*pointless*Disallow: /*disallow-rules*Disallow: /*instead-of*Disallow: /*fixing-the-problem.html

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

www.domain.com/200000-filtered-combos

<meta name=“robots” content=“NOINDEX, NOFOLLOW”>

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

User-agent: *

Disallow: /we-should-write-better-content/

#but don’t want to prioritise

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

The “best choice” depends onyour limitations

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Do you have all the access you need?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Or enough buy-in?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

@chrisgreen87www.strategiq.co

#BrightonSEO15th September 2017

Otherwise, some workarounds are better than doing nothing

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Meta Robots via GTM

https://moz.com/blog/seo-changes-using-google-tag-manager

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Robots.txt when no other option

User-agent: *

Disallow: /better-than-doing-nothing/

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Key takeaways

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Is it a crawl or index problem?

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Check what you can change

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Check what you can changeAnd what you can’t…

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Make the “best case” fix

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Implement, crawl & check again!

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Use this flowchart to help

http://bit.ly/bseo-flow

#BrightonSEO15th September 2017

@chrisgreen87www.strategiq.co

Thank you.http://bit.ly/snog-marry-avoid@chrisgreen87