E-commerce? Publishing? Classifieds? Although on-site SEO essentials are universal, it makes sense to pay special attention to specific items and related indicators, depending on the type of website you are managing. Because some issues – such as page accessibility to robots and low quality content – can reach particularly large proportions on some websites structures and have a huge impact on organic traffic.
Let’s take a look at a few issues that are particularly impactful for these three types of websites:
E-commerce websites often face several types of issues related to their size, their navigation structure, and their typical organic traffic pattern: they tend to get either mainly long tail traffic (large product catalog, low number of visits per page, on many product pages) or mainly middle tail traffic (high brand recognition, significant direct traffic, and most organic traffic on top category pages, through more generic, competitive search queries).
Typical issues include:
1) A significant portion of products are not explored by Google
This generally has a big leverage: with a large product catalog, chances are only a portion of products are known to search engines. Either the website already generates most its traffic on product pages and having more products crawled will have a certain, mechanical effect on this long tail traffic; or it doesn’t, and this is an important source of potential incremental traffic.
The goal is to make sure that Google explores all products.
Global indicator: ratio of crawled pages among product pages (tool: log analyzer).
What can be done:
Encourage products crawl ‚Üí Minimize website depth, as deeper pages are less crawled: for instance, reduce pagination by adding navigation subcategories and increasing the number of items per page.Indicator: page depth for products (tool: website crawler).‚Üí Optimize internal “link juice” flow within the website to make sure all products receive more than just a couple of links. Typically, some products receive a single link, often from a long paginated list. Add complementary navigation criteria so a product can be listed in several lists, add product-to-product links between similar products.Indicators: number of incoming links per product page – both average and distribution, because of variability: product linked from a “top products” section could receive many links, while a significant portion of products may receive a single link (website crawler).
Allow your Web server to deliver more content to Google within the same time frame or “crawl budget”: deliver content faster, and avoid delivering again content which has not changed since it was last explored by the search engine:‚Üí Optimize performance. This is very specific to each website.Indicator: load time performance for product pages (website crawler)‚Üí Implement HTTP 304 (Not Modified) status codes, in response to HTTP headers which include an “if modified since” option. This will allow a search engine crawler to get a fast response as no content is delivered, for product pages which didn’t change since the last exploration.Indicator: ratio of product pages returning HTTP 304 in search engines crawl (log analyzer).
Make sure Google will explore a strategic subset of product: ‚Üí Pay special attention to these products’ depth and number of incoming links.Indicators: page depth, number of incoming links per page (website crawler).‚Üí Implement XML sitemaps for these strategic products. Caveat: if the number of products in sitemaps is much larger than Google’s crawl budget, then instead of encouraging a higher crawl ratio for these products, the sitemaps will most likely introduce some unpredictable rotation in Google’s index.Indicators: crawl budget, crawl ratio for sitemaps, time Google needs to crawl sitemaps (log analyzer, Google Webmaster Tools).
2) Near duplicates within products
There can be many products which are almost the same, apart from a few details (color for clothing, minor technical characteristics for high tech products) that are not differentiators internet users are likely to include in search queries.
The goal is to make sure product pages present products that are differentiated enough to respond to different queries, while avoiding the negative impact undifferentiated content has on quality criteria.
What can be done:
Implement a notion of “meta product “: a master product which common characteristics which will be better positioned than products in the near-duplicate pool which compete with each other. This will most certainly be justified only for a subset of products, which need to be identified.Indicators: HTML tags content and internal linking to identify products from the same list, pages with organic visits – active pages (website crawler).
Navigation pages are targets for top to middle tail SEO traffic queries (for instance, “Nike childrens shoes”). It’s an issue if they are not accessible to robots, or if too many are accessible through crawlable filter combinations.
The right balance must be found so that search engines see all navigation pages with potential for organic traffic, but are not swamped by additional pages that will waste search engine crawl and degrade global website quality indicators.
Indicators: number of navigation pages on the website, pages with organic traffic, HTML tags (website crawler which allows to filter data based on URL characteristics, such as URL parameters / parameter names).
Avoid creating a large number of low quality pages which result from too many filter combinations: very similar pages created because they include filters which are not significant differentiators, pages with a very small number of products or none, pages with filter combinations that don’t make sense for the user (with all possible combinations generated automatically).Best practice: allowing only one filter at a time, or a low number of filter combinations hand picked by product managers.Indicator: number of navigation pages – very high, probably also disproportionately high compared to products (website crawler).
A variant of this issue can be caused by internal search pages linked on the website, with too many search criteria, and very often as well, duplicates due to similar queries with the same words in a different order.
Let’s leave aside news-specific SEO (Google News) and focus on “regular” search (like Google’s universal search). Publishing websites which regularly publish new articles have a continuously growing content which poses some challenges. Very often, SEO concentrates on new articles, while a great deal of untapped potential lays in the bulk of older content.
1) Older articles get deeper and deeper in the website
This has to do with navigation and internal linking. Once they stop being linked from the home page, from top navigation, from a “most read” block, “hot topics” tags and the like, older articles get deeper and deeper in the website and become harder to reach for search engine robots. Typically, at this second stage in their life cycle – but by far the longest one – they are just linked from a long paginated list of articles, and related articles that may link to them are also deep. As a result, these older articles don’t reach their full potential, or don’t perform at all.
What can be done:
Reduce depth so that articles remain easily accessible for users and robots.Indicator: page depth for articles (website crawler).
Make sure articles are promoted by links to and from related articles and tag pages (combined with next point).Indicator: number of internal links to articles (website crawler).
2) Tag pages that are not “hot” any more are not accessible via top navigation
For similar reasons, tag (topic) pages which don’t include a recent article also get deeper and deeper, if they are only linked from articles.
What can be done:
Add navigation to tag pages from the website’s top navigation. This also efficiently reduces the depth of articles listed in tag pages, by providing one or several shorter permanent paths to older articles.Indicators: page depth for tag pages, internal links to tag pages – if there are just a couple, it means the tag page is not justified (website crawler).
Typical issues are related to user-generated content, which we have no control over, and the fact that content has a high rotation rate: many new pages are created on a daily basis, and they may expire quickly.
1) Search engine crawl does not focus on relevant ads
This implies, in particular, making sure new ads are crawled, and expired ads are not.
What can be done:
Encourage exploration of new ads‚Üí Limit depth,‚Üí Use an XML sitemapIndicators: new ads crawled by search engines – never crawled before (log analyzer with this advanced functionality)
Make sure older ads are still crawled. Log analysis will help determine where to focus your efforts: it will show how old the oldest ads which generate visits are.Indicators: crawl ratio for ads, age of active pages (log analyzer with these advanced functionalities, combined with website crawler).
Manage expired ads properly: expired ads are not linked on the website any more, but search engines robots explored them at an earlier date and know their URL. They continue to explore them and possibly present them in results unless the page returns an HTTP status code which indicates the content is not there and more.‚Üí Return an HTTP 404 (Not Found) or HTTP 410 (Gone) status code;‚Üí Alternatively redirect these ads to the parent category. These bulk redirects are likely to be considered as 404s, so the choice depends on expected user experience.Indicators: HTTP status codes in search engines crawl, orphan pages (log analyzer combined with website crawler).
2) The way empty categories are managed sends confusing messages
A category page can at times be empty, as its ads list entirely depends on users. We should avoid creating categories that are likely to be often empty, or broaden their scope to minimize chances this will happen. But it can still happen, because of seasonal effects for instance, or market trends. So this should be carefully planned for.
If category page returns HTTP 404 (Not Found) when there aren’t any ads, and HTTP 200 (OK) when there are some, its chances of ranking will be low: this “blinking”‘ page which only exists part of the time won’t be considered as reliable by search engines. The page should exist at all times, whether there are ads or not – in which case the page content can include links to similar ads.
What can be done:
Define business rules to decide under which conditions an empty page must be maintained (HTTP 200), and/or under which conditions it should be removed (permanently). For instance, it could be removed if there hasn’t been any ads for a significant length of time, and no traffic either.Indicator: HTTP status code (website crawler with changes tracking between analyses, log analyzer which displays HTTP status code changes for pages crawled by search engines).
3) Ads are semantically poor
Some ads may fail to include some important keywords, or include many abbreviations. Or there won’t be much differentiation between some ads. Unfortunately, there is not much we can do at the ad level.
What can be done:
Work on semantics at the categories and subcategories levels. Individual ads will target very long tail traffic only.Indicators: links to category and subcategory pages, anchor texts on these links to validate expressions and their diversity (website crawler).
A shorter version of this article was published at Brighton SEO (April 2015 edition) in the conference’s print publication.