News Specific Crawl Errors

16
Google Webmasters Central Tool News Specific Crawl Errors

description

News Specific Crawl Errors

Transcript of News Specific Crawl Errors

Page 1: News Specific Crawl Errors

Google Webmasters Central Tool

News Specific Crawl Errors

Page 2: News Specific Crawl Errors

Causes:Too many snippets for related articles.Features such as ‘Send this article to friends’

with long descriptions.User commentsIf problem exist, then contact Google

Article Disproportionately Short

Page 3: News Specific Crawl Errors

Causes:Try formatting your articles into text

paragraphs of a few sentences each.Make sure your sentences are well punctuated.Make sure you don't use frequent <br> and

<p> tags within your paragraphs, and try to avoid breaking up the article body in general.

Consider removing some of the non-article text from the article page.

If problem exist, then contact Google

Article Fragmented

Page 4: News Specific Crawl Errors

Causes:Enclosing them in an iframe.Dynamically fetching them with AJAX.Moving part of the comments to an adjacent

page.If problem exist, then contact Google

Article Too Long

Page 5: News Specific Crawl Errors

Causes:Try formatting your articles into text

paragraphs of a few sentences each. If the article content appears to contain too few words to be a news article, we won't be able to include it.

Make sure your articles have more than 80 words.

If problem exist, then contact Google

Article Too Short

Page 6: News Specific Crawl Errors

Causes: Place a clear date and time for each of your articles in between

the article's title and the article's text in a separate line of HTML. The date should specify when the article was first published.

Remove any other dates from the HTML of the article page so that the crawler doesn't mistake them for the correct publication time.

If you'd like to use a date meta tag, please contact us first. Date meta tags should be of the form: <meta name="DC.date.issued" content="YYYY-MM-DD">, where the date is in W3C format, using either the "complete date" (YYYY-MM-DD) format, or the "complete date plus hours, minutes and seconds" (YYYY-MM-DDThh:mm:ssTZD) format with a time zone suffix.

Create a News Sitemap. The <publication_date> tag will ensure we're able to pick the correct date for your articles.

Date Not Found

Page 7: News Specific Crawl Errors

Causes:Make sure that the full text of each of your

articles is available in the source code of your article pages (and not embedded in a JavaScript file or iframe, for example).

Make sure that you're not using a style in the source code of your articles such as "display:none" or "visibility:hidden".

Make sure the links to your articles lead directly to your articles pages rather than to an intermediate page using a Java script redirect.

Empty Article

Page 8: News Specific Crawl Errors

Causes:Make sure that your title, body, and timestamp

are easily crawlable (are available as text and not as images, for instance), but at this time, this error is primarily for informational purposes. We are actively working to improve our extraction methods so that you'll see this error less often.

Submit a News Sitemap.

Extraction Failed

Page 9: News Specific Crawl Errors

Causes:Date <meta> tags should be of the form:

<meta name="DC.date.issued" content="YYYY-MM-DD">, where the date is in W3C format (http://www.w3.org/TR/NOTE-datetime), using either the "complete date" (YYYY-MM-DD) or "complete date plus hours, minutes and seconds" (YYYY-MM-DDThh:mm:ss) format, with optional fraction and time zone suffixes. The date should specify when the article was first published.

Invalid Date Meta Tag

Page 10: News Specific Crawl Errors

Causes:Make sure your article URLs contain at least a 3-digit

number as specified in the following guidelines. Otherwise, consider submitting your articles through a News Sitemap.

Make sure your articles are located within the domain of the site included in Google News.

Check the page that generated the error and make sure it includes crawlable links to news articles. Googlebot-News is best able to crawl HTML links and is unable to crawl image links or linked embedded in JavaScript. See our Webmaster Guidelines and tips for creating a Google-friendly site for information on how to ensure your links are crawlable.

No Links Found

Page 11: News Specific Crawl Errors

Causes:If the article content doesn't have punctuated

sequences of contiguous words, we won't be able to include it in Google News. Make sure that the text of your articles is made up of sentences, and that you don't use frequent <br> or <p> tags within your paragraphs.

Make sure that the full text of each of your articles is available in the source code of your article pages (and not embedded in a JavaScript file, for example).

Make sure the links to your articles lead directly to your articles pages rather than to an intermediate page using a JavaScript redirect.

No Sentences Found

Page 12: News Specific Crawl Errors

Causes:Remove the "noindex" <meta> tag from your

article pages.

Noindex Tag Found

Off-site Redirect• Causes:• All section pages and articles must be located

within the domain of the site included in Google News.

• If you are not using off-site redirects, please make sure your site has not been modified by a third party.

Page 13: News Specific Crawl Errors

Causes:The HTML source page can be up to 256KB in

size.

Page Too Large

Title Not Allowed• Causes:– Often this problem can be fixed by setting the

<title> tag on the HTML page to the title of the article, and repeating the title in a prominent place on the HTML page, such as in an <h1> tag.

Page 14: News Specific Crawl Errors

Causes:Follow our

title formatting recommendations.To make sure your articles display

properly on mobile devices, don't include a leading number (which sometimes corresponds to an access key) in the anchor text of the title.

Title Not Found

Page 15: News Specific Crawl Errors

Causes:Please check network/webserver.

Uncompression Failed

Unsupported Content Type

• Causes:– Articles must have a content-type of

text/html, text/plain or application/xhtml+xml.

Page 16: News Specific Crawl Errors

Causes:Make sure your article is less than 2 days old.

Currently we are only collecting articles that are 2 days old or less.

Follow the date formatting recommendations above.

Date Too Old