Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance...

55
#pubcon Presented by: Dawn Anderson @dawnieando ‘Myths, Facts And Theories On Crawl Budget And The Importance Of ‘URL Importance Optimization’’

Transcript of Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance...

Page 1: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Presented by: Dawn Anderson@dawnieando

‘Myths, Facts And Theories On Crawl Budget And The Importance Of ‘URL

Importance Optimization’’

Page 2: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Dawn Anderson• Move It Marketing• University Lecturer – Digital Marketing• From Manchester, UK (rains a lot)• International SEO Consultant – 10+ yrs in SEO• Pomeranian pooch lover - Bert• Fascinated by crawling (practice & academia)• Doesn’t fare well in YouTube screen grabs ;P• Party trick: Remembering UK postcode areas

(US Zip code equivalent)• Search Awards Judge• Twitter chatterer @dawnieando

Page 3: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Defining Crawl Budget‘Host Load’ = What can you handle?

+‘URL Scheduling’ = What is important to crawl & how often?

Page 4: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myths About Crawl Budget

Page 5: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myth – It’s All About Just My Site, Right?

• NO – HOST LOAD is apportioned at an IP level and shared amongst the sites there (Host load)

Page 6: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Host Load - When Will This Matter?• It’s more about server capacity than SEO TBH• Your site is massive (similar in size e.g. to ’Amazon’)• Your site is massive and you’re on a shared hosting• You’re using a CDN and your site is massive• You have lots of large subdomains sharing space• Crawlable test or staging sites• You have ‘infinite loops’ and ‘spider traps’• You keep throwing server errors during

crawling

‘Average’ sites don’t normally hit the payload (‘host load’)

Page 7: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myth - Google Search ConsoleCrawl Stats Is Where It’s At Right?

Page 8: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

GSC Crawl Stats Is Not Really Just ‘Web Pages’

• Includes ALL CSS, JS, Zip, XML, PDF, AMP, HTML files crawled

• Pages are NOT just single webpages

https://support.google.com/webmasters/answer/35253

Not just ‘web pages

Page 9: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Visits By ALL The 10 Types Of Googlebots Are Recorded Together In GSC

Web Image News

Video Feature Phone Smartphone

Mobile Adsense

Adsense Adsbot

App Crawler

ALL The Googlebot Family

Page 10: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

It Also Includes All 200 And 30X Responses

• That massive crawl you thought you just got on new pages or existing pages 200 Oks could also be many, many 30X redirections

• Especially when using * wildcard redirections on large sites

• NO 400, 500, robotted or unreachables are recorded here

https://support.google.com/webmasters/answer/35253

Page 11: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

GSC Doesn’t Even Show You WHAT URLs Have Been Crawled & When

It will likely just a few URLs being crawled very often, some very rarely and most others somewhere in between – YOU NEED TO KNOW

Page 12: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

REALITY – Server Logs & Log Analysis Is Where It’s At

AUTOMATE SERVER LOG RETRIEVAL VIA CRON JOB

grep Googlebot access_log >googlebot_access.txt

Page 13: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Use Tools Or Just Export, Convert Data & Use Mr Mu’s Spreadsheet

Spreadsheet - https://goo.gl/1pToL8

Page 14: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

For The Avoidance Of Doubt –I Asked To Be Sure

Page 15: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Why Does This Matter?On A Large Site You Need To Be Able To

See Through ‘Spider Eyes’

You need to see what Googlebot‘REALLY’ thinks of your site

Page 16: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myth – It’s The No Of ‘Pages’ Crawled In GSC Crawl Stats Divided By Days

For all of the reasons in the previous 7+ slides

Page 17: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myth – Googlebot Crawls Through Your Website From One End To The Other

Then Starts Again• This is where it gets complicated• Web crawl efficiency is key• There is an order to things• Minimizing visibility of existing stale content is

key too – the rest of the web is changing• Fresh results are vital to searchers

Page 18: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

“What I Think You Are Talking About Is Scheduling” (Illyes, Google)

Remember that time when Mr Mu kicked Andrey under the table?

(joking JJ)

Page 19: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Why Web Crawling Efficiency?

“WE ARE ALL PUBLISHERS”

THE NUMBER OF WEBSITES DOUBLED IN SIZE BETWEEN 2011 AND 2012AND AGAIN BY 1/3 IN 2014

The Content‘Explosion’

Page 20: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

“We don't index every one of those trillion pages -- many of them are similar to each other” (J Alpert, Google)

“There’s a needle in here somewhere”

“It’s an important needle too” If only we could identify it

“So how many unique pages does the web really contain? We don't know; we don't have time to look at them all!”(J Alpert, Google)

Page 21: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

The Duplicate Content ‘Penalty’ Myth• ‘Real’ duplicates (matchingcontent checksum) filtered and not indexed

“Each content filter sends the retrieved web pages to Dupserverto determine if they are duplicatesof other web pages”

http://www.google.ch/patents/US20120317089

Page 22: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Duplication & ’The Battle To Be The Single URL / Content Fingerprint’

URL / CONTENT FINGERPRINT

REDIRECT

YOU HAVE THE POWER TO CHOOSE ‘THE ONE’

CANONICALIZATION, HREFLANG, CONSISTENT SIGNALS INTERNALLY

Page 23: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

NON-PREFERRED VERSION ‘IMPOSTER INDEXATION’ & ‘TOO SIMILAR’ CONTENT

The wrong version of your URL is selected and indexed

Users may pick the wrong version of the duplicate content and link to that one. Then signals are dissipated

Page 24: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

De-duping, URL Sorting & Scheduling

Original Image -https://patentimages.storage.googleapis.com/US8666964B1/US08666964-20140304-D00004.png

https://www.google.com/patents/US8666964Lots and lots of patents on crawling efficiency

Page 25: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Important Pages Are Crawled More Frequently

These pages are important and need to be up to date. They cannot be returned as stale data

Page 26: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Depth Of Crawl Is Greater In Higher Quality Sections Of Sites

• Important grandparents and parents begets ’important’ children and grandchild URLs

• Higher quality site sections (descendants) get crawled more

Page 27: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Low Quality Sites Get Crawled Less Frequently

https://support.google.com/webmasters/answer/35253They are low importance

Page 28: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myth – It’s Based Just On PageRank”There’s a ‘shit-ton’ of other stuff going on which plays an important role” (Illyes, Google)

Page 29: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

PageRank Has Become Just One Of Very Many Things

“WHATEVER YOU ARE THINKING… WHETHER IT BE ABOUT CRAWLING OR RANKING… IT (PAGERANK) HAS BECOME JUST ONE OF VERY MANY THINGS” (Andrey Lipattsev, Google, 2016)

Page 30: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

It’s Mostly Driven By ‘Importance’“SCHEDULING  IS  MOSTLY  DRIVEN  BYIMPORTANCE”  (Illyes,  Google)

IMPORTANCE  MAY  INCLUDE  PAGERANK  (Patents)  …  BUT  IT  IS  ONLY  A  PART  OF  IT

RANKING  IS  ALSO  DRIVEN  BY  IMPORTANCE  (IN  PART)

Page 31: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Page (URL) Importance Is MahoossivelyImportant (May Include PageRank)

Page 32: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

PAGE IMPORTANCE - The importance of a page independent of a query• Location in Site (e.g. home page more important than

parameter 3 level output)• PageRank• Page type / file type• Internal PageRank• Internal Backlinks• In-site Anchor Text Consistency• Relevance (content, anchors and elements) to a topic

(Similarity Importance)• Directives from in-page robot and robots.txt management• Parent quality brushes off on child page quality• Inclusion in XML sitemaps and the indexIMPORTANT PARENTS LIKELY SEEN TO HAVE IMPORTANT CHILD PAGES

Several Google Patents

Page 33: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

But…Importance Signs From Whom?3 Types Of ‘Importance Signal Sender’?

SEARCHERS WEBMASTERS LINKERATILooking for results, creating

queries, triggering

impressions, demanding freshness

Hreflang, Canonicalization,

Internal links, Sitemap and index inclusion,

Information Architecture, Anchors, Building content at a

URL on a topic

Passing PageRank

AND WHY IS ‘IMPORTANCE’ SO IMPORTANT?

Page 34: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Concept Of Search Engine Embarrassment

A concept mostly originally attributed to Joel Wolf

Page 35: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Search Engine Embarrassment

Credit: Joel Wolf Et Al GOODNESS & BADNESS IN SEARCHENGINE EMBARRASSMENT

Concept of using probability estimates to revisit web pages ‘just in time’ and based around limiting ‘likelihood of stale pages being exposed’ to searchers

Page 36: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Search Engine Embarrassment

Probability(Seen_Stale_Data)=Function(User_View_Rate,Document_Update_Rate,Web_Crawl_Interval).

Page 37: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Search Engine Embarrassment

User_View_Rate – Likelihood of the document being seen+

Document_Update_rate – How often it has material changes+

Web_Crawl_Interval – How often is it currently crawled

COMBINED TO CALCULATE

Probability(Seen_Stale_Data) = Risk of Search Engine Embarrassment?

‘JUST IN TIME SMART CRAWLING’

Page 38: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

THEORY - Search Engine Embarrassment

Joel Wolf’s ‘Optimal Crawl Strategies’ (Search Engine Embarrassment) Paper is Cited in this Google Patent

Page 39: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Triggering More ’Real Searcher Impressions’

A SMALL TEST

THE PAGES BECAME ARGUABLY MORE IMPORTANT

CRAWLING IMPROVEDRANKING IMPROVEDTRAFFIC IMPROVED

Page 40: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Myth – Don’t We Just Have To Make Random Changes To Get Crawled More?

NOT ALL CHANGE IS CREATED EQUAL

Page 41: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

WHAT Changed? Was it important?

https://www.seroundtable.com/google-crawl-frequency-ranking-21153.html

HINTS &

C = ∑ i = 0 n - 1 � weight i * feature

CRITICAL MATERIALCHANGE

Page 42: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Randomization & Lying About ‘Change’ To Googlebot Won’t Help

• NOT ALL CHANGE IS IMPORTANT ENOUGH TO BE RECRAWLED• DO NOT TRY TO MANIPULATE ‘CHANGE’• You can’t get more crawl just by changing your pages alone &

you may actually be doing your site harm• WHY – Because… ‘hints’ & ’thresholds’ designed to pick up on

this• If every URL changes header response will always be modified

since (current date)• Randomization and shuffling could be preventing Googlebot from

crawling the important pages• Last-modified is taken into consideration, IF it is correct• Priority == ignored so don’t make it up• Change frequency == ignored so don’t make it up

’IMPORTANCE’ BEATS ‘CHANGE’

Page 43: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

‘Crawl Rank’ – Causation or Correlation?• By getting your URL crawled more frequently do

they automatically rank higher?• “A lot of people confuse crawling with ranking”

(John Mu)• Crawl Rank - It seems this is more correlation

than causation• You got your URLs crawled more by making

them more important (e.g. via internal linking strategies), canonicalization, hreflang, merging and improving thin content, etc, updating with fresh and rich content to a topic… and subsequently ranked higher

“Often times, it is kind of a relationship that, when we think something is important we tend to crawl it more frequently and that might be more visible in search” John Mueller, Google

Page 44: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

The Four Main Types Of Cannibalisation – Slideshare@jonearnshawhttp://www.slideshare.net/jonathanearnshaw/seo-46813620

Consistently Avoiding Importance Cannibalisation

You must be consistently clear in emphasising the ‘importance’ of the right version of your ‘special ones’ (your key most important URLs).

Page 45: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Consistently avoiding ‘Mixed Signals’ & Skewed URL Importance

GOOGLE CAN GET CONFUSED AS TO WHICH PAGE IT SHOULD RANK FROM YOUR SITE FOR KEY TERMS – BE CLEAR ON TARGETS

Page 46: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Consistency - Avoiding ‘importance dissipation’ from generational cruft

Consider keeping the same URL for annual events and optimisethe content for current year

“Choose a URL structure that can stand the test of time” (John Mu, Google)

Page 47: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Cool URIs (And URLs) Don’t Change• The iterative drip, drip, drip of Importance• Nurture & mature (grow) importance• Consistent importance signals ongoing• Think URL as well as URI

“…many, many things can change and your URIs can and should stay the same” (Sir Tim Berners-

Lee)

COOL URIs DON’T CHANGEhttps://www.w3.org/Provider/Style/URI

“allocate URIs which you will be able to stand by in 2 years, in 20 years, in 200 years” (Sir Tim-Berners Lee)

IMPORTANCE VIA CONSISTENCY

Page 48: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

“all over the Web, webmasters are making decisions which will make it really difficult for themselves in the future” (Sir Tim Berners-Lee)

Don’t Let That Be You

Page 49: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

THANK  YOU TWITTER - @dawnieandoGOOGLE+ -+DawnAnderson888LINKEDIN – msdawnandersonwww.move-it-marketing.co.uk

Page 50: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Importance Via Internal Links

Most Important Page 1

Most  Important  Page  2

Most  Important  Page  3

IS THIS YOUR BLOG?? HOPE NOT

https://support.google.com/webmasters/answer/138752?hl=en

Page 51: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Descending Importance Clues Via Internal Links (Breadcrumbs)

SINGLE TEXT OUTPUT ONLYBREADCRUMB

FEWERFEWER

MOST

Image credit: https://www.smashingmagazine.com/2009/03/breadcrumbs-in-web-design-examples-and-best-practices/

Home

Category

Sub

Product

Page 52: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

YES? … YOU’RE INNO? … YOU’RE OUT(sitemaps and index)

Importance By Inclusion (& Unimportance via Exclusion

Page 53: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

Importance Via Consistently Indicating ‘Correct Version’ of Duplicates

• Canonicalisation• Choose one https / http / nonwww / www version and 301 redirect the others• Eliminate ‘too similar’ URLs• Consistency of internal link targets (right site version, right target for

keywords / topics / topic intent / user intent)• Right version inclusion in XML sitemaps• Re-optimization / unpicking of 30X redirect chains internally and externally• Review of internal links in GSC for ‘skew’• Review of existing content to improve on topic for ‘importance’• Save / nurture the URL (think for the long term in URL planning)• Breadcrumbs• Minimize boiler plate content• Minimize regurgitated content in various parts of your site

Page 54: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

SOURCES• Scheduler For Search Engine Crawler -http://www.google.ch/patents/US20120317089• We Knew The Web Was Big - https://googleblog.blogspot.co.uk/2008/07/we-knew-

web-was-big.html• https://www.youtube.com/watch?v=GVKcMU7YNOQ• http://webpromo.expert/google-qa-duplicate-content/

Page 55: Technical SEO Myths Facts And Theories On Crawl Budget And The Importance Of URL Importance Optimization Pubcon Vegas 2016

#pubcon

SOURCES• http://webpromo.expert/google-qa-crawlingrendering/• https://twitter.com/dergal/status/777782401497980928• Cool URIs Don’t Change -https://www.w3.org/Provider/Style/URI• https://searchenginewatch.com/2016/04/06/webpromos-qa-with-googles-andrey-

lipattsev-transcript/• https://www.youtube.com/watch?v=Wcnz1kCoiks• https://www.youtube.com/watch?v=MryA3F0ySew• ‘Optimal Crawling Strategies For Web Search Engines’ -

http://dl.acm.org/citation.cfm?id=511465