Crawling The Web For a Search Engine Or Why Crawling is Cool.
Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017
-
Upload
chris-green -
Category
Marketing
-
view
3.147 -
download
0
Transcript of Robots: Txt, Meta & X - The Snog, Marry & Avoid of the Web Crawling World - Brighton SEO Sep 2017
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
StrategiQChris Green
@chrisgreen87http://bit.ly/snog-marry-avoid
Robots: Txt, Meta & X The Snog, Marry & Avoid of the Webcrawling World
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
How do we knowthe best way to manage
Googlebot’s crawl/indexing?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
There are many methods(we’re spoilt for choice really)
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
But the two most commonly misused are Robots.txt vs Meta Robots directives
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Why?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
To the casual observer they’re very similar ways of doing the same thing…
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
To block Google
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
But that’s not a helpful way of thinking of them
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
In many
it can stop you getting the most out of your site
circumstances
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
I’m going to run through a framework
to help to change this thinking
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
I’m going to run through a framework
to help to you make the right choices
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
But some words of warning
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
This is advanced stuffOne foot wrong & you could cause serious damage
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
They’re not always the first-choice
These are only part of your toolkit
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
There are so many “ifs” & “buts”
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
If we can finish today
with slightly more understanding
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
And a different approach
then we’re onto a winner!
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Time to introduce the robots
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Robots.txt
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Meta Robots
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
X-Robots
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Possibly the most important SEO tools
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
But which do you...
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Snog
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Marry
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Or avoid?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
But what does a slightly s**t
BBC 3 show have to do with SEO?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
One site’s “snog” is
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
One site’s “snog” is another’s “marry”
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
One site’s “snog” is another’s “marry”
Or perhaps even “avoid”
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
There are lots of thoughts on how to use these
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
There are lots of thoughts on how to use these - many are wrong
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
I’m going to show you a way of simplifying
things
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
To pick the right tool for the job
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Know the problem you’re trying to fix!
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Is it a crawl problem?
Google isn’t seeing enough of your site
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Or an index problem?
Google’s indexing too much of it
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
We fix crawl problems with Robots.txt
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Disallow: /*mad-spider-trap*
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
And we fix index problems with Meta Robots
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
www.domain.com/crap-page
<meta name=“robots” content=“NOINDEX, FOLLOW”>
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Identifying the problem
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Index problems are simple to spot
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Does your site look too big?
(in Google’s eyes)
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
But ID’ing a crawl problem...
Can be trickier
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Look for spider traps
https://www.portent.com/blog/seo/field-guide-to-spider-traps-an-seo-companion.htm
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Where does this “cost” you on crawl budget?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
A word on crawl budget.
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
It’s “a thing”
http://searchengineland.com/google-explains-crawl-budget-means-webmasters-267597
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
But Google doesn’t publicise a site’s crawl
budget
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
You can work out a version of it yourself
Thanks to Yoast for this - https://yoast.com/crawl-budget-optimization/
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Look at GSC Crawl stats average pages crawled per day
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Look at GSC, how big Google sees your site as
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Pages / Avg crawled per day =
Crawl Score
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
9,781 / 1,458 = 6.7
x 6.7 more pages than are getting crawled each day
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
If you have 10
You have 10x the pages that Google is crawling daily
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
A pretty big crawl problem!
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
But, how big is big?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
< 1,000 pages, crawl budget is less of a problem
https://webmasters.googleblog.com/2017/01/what-crawl-budget-means-for-googlebot.html?m=1
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
1,000 - 10,000 is moderate
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
10,000+ pages… things start to get “fun”
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Crawl & Index problems aren’t mutually exclusive
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Index bloat at scale can hurt crawl
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Crawl issues can stop or slow the repair of
index issues
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Some example scenarios
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
eCommerce filters which are getting indexed
(badly)
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
www.domain.com/shop/mens/trainers/size-12/red/
<meta name=“robots” content=“NOINDEX, FOLLOW”>
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Disallow: /*size*Disallow: /*red*
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Disallow: /*size*Disallow: /*red*
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Not until index issue is cleared up*
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
*Unless...
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Noindex: /*size*Disallow: /*size*Noindex: /*red*Disallow: /*red*
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Google isn’t “cool” with this
https://www.seroundtable.com/google-do-not-use-noindex-in-robots-txt-20873.html
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
But it’s proved to work
https://www.deepcrawl.com/blog/best-practice/robots-txt-noindex-the-best-kept-secret-in-seo/
http://ohgm.co.uk/de-index-pages-blocked-robots-txt/
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Only ~0.3% of the Majestic Million use this method
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Don’t be too aggressive though!
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Be aware that some filtered pages can be
worth indexing
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Blog taxonomies
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
www.domain.com/blog/blog-category/www.domain.com/blog/tags-bloody-tags/
<meta name=“robots” content=“NOINDEX, FOLLOW”>
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
eCommerce site without indexed filters but x6+ crawl
score
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Disallow: /*filters*
(& meta robots just in case)
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Other misc pages?
Just noindex
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Anything else?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Back to my original premise
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Meta Robots is my“marry”
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Robots.txt is my “snog”
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
It really can make the difference
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Meta robots replaced with robots.txt disallow:
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
But it is easy to screwit up
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
What about x-robots?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
For when meta robots isn’t possible…
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
For when meta robots isn’t possible…
… assuming you can edit htaccess
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
It’s not my “avoid” though
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
What is?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
The lazy option!
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Disallow: /*pointless*Disallow: /*disallow-rules*Disallow: /*instead-of*Disallow: /*fixing-the-problem.html
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
www.domain.com/200000-filtered-combos
<meta name=“robots” content=“NOINDEX, NOFOLLOW”>
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
User-agent: *
Disallow: /we-should-write-better-content/
#but don’t want to prioritise
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
The “best choice” depends onyour limitations
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Do you have all the access you need?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Or enough buy-in?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
@chrisgreen87www.strategiq.co
#BrightonSEO15th September 2017
Otherwise, some workarounds are better than doing nothing
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Meta Robots via GTM
https://moz.com/blog/seo-changes-using-google-tag-manager
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Robots.txt when no other option
User-agent: *
Disallow: /better-than-doing-nothing/
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Key takeaways
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Is it a crawl or index problem?
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Check what you can change
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Check what you can changeAnd what you can’t…
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Make the “best case” fix
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Implement, crawl & check again!
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Use this flowchart to help
http://bit.ly/bseo-flow
#BrightonSEO15th September 2017
@chrisgreen87www.strategiq.co
Thank you.http://bit.ly/snog-marry-avoid@chrisgreen87