The Botify report answers a virtually unlimited number of questions. Here are 30 you should be asking. Illustrations have been updated to show the current Botify interface.
**Virtually unlimited, really? **
Yes, that’s a bold statement. Nevertheless realistic: there are around 50 indicators available for every single page in a Botify website analysis - built on a dozen pieces of data gathered for each page during the crawl. As a result, the crawl report, combined with the URL Explorer that allows to extract and export any data, provides answers to a virtually unlimited number of questions.
Let’s set the scene first. Here is what the left menu looks like in the Botify report:
The different sections cover three sorts of indicators:
- Indicators related to the way pages can be accessed (that’s the three first tabs: distribution, performance and HTTP status codes)
- Indicators related to page content meta data (HTML tags and canonical)
- Indicators related to internal linking (Inlinks for incoming links, Outlinks for outgoing links)
Note that there will be additional sections if optional analysis settings were used:
- A visits section if Google Analytics imports were enabled
- A sitemaps section if the sitemaps analysis option was enabled
And now, let’s walk through the main report sections and some of the questions they answer.
Distribution of all pages by page depth
Depth is the number of clics needed to reach a page from the home page using the shortest path (actually, from the page the crawl was started from - that’s usually the home page).
- What’s the average depth of my website ?
- What’s the maximum depth of my website ?
- What does the distribution graph look like?
Any of these could hint to depth problems, and result in insufficient crawl by search engine robots.
Note that the depth distribution graph also shows SEO-compliant vs non-compliant pages, according to the most basic compliance criteria: read more about SEO compliant URLs.
Page download time performance
Download perfomance is the delay to get the full page code, without requesting associated ressources such as images etc. This ‘total delay’ for the HTML page only is what should be considered for SEO. We recently explained why performance matters.
- What is the average page download time on my website?
- Are some pages much slower than others?
- Can I find out which templates they correspond to?
- How much of the total delay does the ‘delay first byte received’ represent? (that’s the the time that elapses before the robot starts receiving data). That one is for advanced performance improvements.
HTTP status codes (page not found, redirects etc.)
HTTP status code returned when the page is requested
‘Page not found’ errors, too many redirections… These are negative signals for search engines trying to assess your website’s quality. Not to mention that a high redirection rate has a negative impact on their robots’ crawl.
In addition, issues with HTTP 404s are not necessarily visible from a user’s perspective: a page can return HTTP 404 (Not found) and still return content, and appear as a valid page (like a HTTP 200 - OK) to the user ! However, search engines won’t index it….
- Which percentage of my pages return a HTTP 200 (OK) status code?
- Which percentage of my pages are ‘not found’ (HTTP 404)?
- Do some pages respond with HTTP 403 (Forbidden)?
- Which percentage of my pages are redirected? Permanent redirections (301) or temporary ones (302)?
- Where are they redirected to? Is there a large number of pages redirected to the same page? Are some redirected to pages ‘not found’?
- Are there redirection chains?
HTML tags related to page content
H1, Title, meta description tags
The simplest of SEO optimizations, so self-evident and necessary. Why would we neglect simple tags that clarify and emphasize the page content?
For each type of tag, we answer questions such as:
- How many are unique to each page, or common to a number of pages (duplicate)?
- Are there pages with several tags of the same kind in the same page? (two H1 for instance)
- In which pages is this type of tag missing?
- Can I get the list of pages with a tag common to at least 5 other pages? (and the tag content of course!)
<link rel="canonical” href=”http://www...” /> tags
To indicate that the current page is not the primary version of this content, but the page in the canonical tag is.
The indicator has three possible values: ‘different’ when the canonical tag points to another page (that’s the tag’s purpose); ‘equal’ means the page points toitself (which has no real benefit, but allows to implement the tag systematically ) and ‘not set’.
If there are canonical tags pointing to alternate urls, it means there is duplicate or near-duplicate content. That’s worth looking into it. The other possibility is that canonical tags are implemented wrong. Either way, you will want to know!
- How many pages have a canonical tag pointing to another page ?
- How many pages point to the same canonical page? That’s the number of duplicates or pseudo-duplicates of the page.
- Do some canonical tags point to pages that are redirected (HTTP 3XX) or not found (404)?
Internal incoming links, from a page’s perspective
Internal links are ‘votes’ for your own pages. The website’s internal linking structure should be in phase with the content hierarchy and create a sort of relief map that puts important content forward. Inlinks provide a view from the vote’s recipients perspective.
- Which pages receive the highest number of internal links? Or more precisely the highest number of ‘follow’ links?
- Which pages receive the highest number of links and do not return HTTP 200 (OK)?
- Which pages receive only one link?
- Which pages receive the highest number ‘no follow’ links?
- Which pages have a ‘no follow’ directive both in links and in a meta tag?
** Links coming out of a page - and going either inside the site (internal outlinks), or outside (external outlinks)**
As a complement to inlinks information, outlinks indicators show where internal ‘votes’ go. Are some votes wasted on content that doesn’t deserve them ?
- Which pages contain the highest number of links?
- Which pages include broken links (links to pages ‘not found’ - HTTP 404)?
- Which pages link to redirected pages (HTTP 3XX)?
- Which pages send the highest amount of “link juice” outside of my website?
- Which pages contain no follow links? To internal pages, or to external pages?
In all these metrics, the number of links in a page are links to distinct urls (what matters most), but the total number of links that takes into account multiple links to the same page is also available.
This post covers some of the most usual questions. Let us know if there are scenarios you would like to explore. Leave a comment! We’ll do our best to indulge.