How to decide what faceted pages you should index – #CrawlingMondays 2nd Episode

In the 2nd episode of Crawling Mondays Aleyda goes through two main criteria that can be taken into consideration to index faceted pages, which are especially common in e-commerce sites.

You can also watch this video and leave a comment over YouTube. To follow more updates on Crawling Mondays, subscribe to the YouTube channel and follow to @CrawlingMondays in Twitter.

Video Transcription

“Hey, Aleyda, is it really worth it for us to start indexing this set of pages, usually filter pages? Or should we just canonicalize them, the words, their parent categories?” This is one of the most common questions that I will get, I’m pretty sure that you too after going through the first stages of the SEO process. And today, I would like to share with you which are the main criteria that I take into consideration to answer this question in the best possible way, in a way that is completely targeted and personalized, based on the specific characteristics, and is in areas of my clients, because you definitely want to make the most out of each specific situation. You really want to be able to fulfill a demand, if it actually exists. Let’s take a look at ’em.

The criteria that we should take into consideration, and this I believe that, as with everything with our business, it should be about per the market feed and supply and demand. I know that many of us would like to have some sort of rule of thumbs, and there are, definitely. For example, sorting pages. Sorting pages like this. Sorting filters like these ones here, they don’t change the products themselves. So the supply doesn’t change. It’s the same. It’s only the organization that changes. So the content itself doesn’t change, what we are offering to our users doesn’t change.

So realistically, there is a very limited option with this sort of sorting. So, there are these very easy to follow rule of thumb of not indexing these pages. And if we go to any e-commerce website like here, for example, price high to low here, we will see that these particular pages here, they are not self-canonicalizing, as you can see here, they can only go tag is going to the women’s backpack category page, instead of this filter page here, that is only for sorting the same products in a different way.

And this is very straightforward, and it’s reasonable to not index these pages and canonicalize that to their parent categories, because the offering doesn’t change. However, all the specific situations, like for example with leather backpacks that we can already consider is a second level category, not necessarily the main category of a website. Of course, it will depend if we are only offering backpacks or we are offering any type of retail type of products. However, in any case, backpacks is usually a category, and leather backpacks will be already a super category. And in many, many, many situations and scenarios, I have seen like four websites, e-commerce websites to say, “Oh, all the color filter pages, they should be no index, canonicalizing to their parent categories to avoid content duplication and cannibalization issues.

And in many situations and the scenarios, it is okayish to do this, because we definitely don’t want to run into content duplication and cannibalization issues. And then the other alternative, instead of canonicalizing to the parent category is to differentiate that content. But is that realistic? It is worth you to sort of generate and develop specific descriptions for each one of these color pages. Well, my answer will be, “It depends.” “On what will it depend?” “On this.” So for example, if we check with any keyword, I love the SEMrush Keyword Magic Tool, because they will show me already segmented, the most important patterns here in a very easy to understand way to identify the enough set of permutations with a specific term. Like for example here for leather backpacks, after ‘women’, the top term is ‘black’. And there are so many search potential here, potential traffic from these, whether black leather backpacks, black leather backpack purse, back leather backpack women’s, men’s, black leather backs.

So many permutations with black leather backpacks. So, of course, if we canonicalize the black color page in the leather backpacks category to the parent category, we will be leaving out the opportunity to specifically address this query. We will definitely avoid any type of content duplication issues and cannibalization issues between color pages. But then we will be leaving this out. We won’t be addressing and leveraging that opportunity to specifically address these queries. So it’s important for validate, it’s important to check and to see that if there is enough search volume potential to bring enough traffic towards this particular filter pages, then it is very worth it that we specifically optimize these pages to become relevant towards this queries, to become unique, to be come different, to make them worthy to be ranked in the best possible positions by Google.

And if we take a look at which the specific search result page for this query here ‘black leather backpacks’, in this particular case, we can see how in the first position, we have Amazon, surprise, surprise. They have a specific landing page. And if we take a look at the specific page that Amazon has, addressing this particular query, we can see that it’s not the typical e-commerce type of listing, where the color will be just a filter, and this very common type of e-commerce listing page. No, it is specific landing page that we can see that they are specifically targeting this, this particular term. And if we see, of course this page is self-canonicalizing here. We can see how, of course, the title is specific, and including the ‘black leather backpack’ term here too in the meta description, the heading too, the main heading, et cetera.

So, of course, they are addressing that specific opportunity. Of course, this might not be worth it to do with every single color out there, but with the most important one, the most popular ones, it is. And then let’s take a look, the fourth position, we have Macy’s on the other hand. And we can see to how Macy’s is ranking, but not with the black page. They are ranking with the generic leather backpack listing page. So we can see here, leather backpack here, this page is focusing on its own specific term, nothing about black here that I can see. No color leather black pack, yes, but not specific black. Why? Because they do have a specific color page, but of course we need to go to that particular page. We need to select it here. Let’s see. And yes, in this particular case, we see that the URL has changed. And if we go and take a look at the title tag and everything, and let’s check out the canonical tag too.

So for example, of course here the title is of course specifically addressing the black specific term, it’s being included in the description, shop for a leather backpack, neutral leather backpack color’. Oh, in this case they don’t change the meta description. Let’s take a look at the canonical tag. So if we take a look at the canonical tag, it’s canonicalizing again to the Perrin Leather parent category that we saw before. It’s not including the ‘slash color normal black’ id one here. No. No. So it’s not self-canonicalizing. And this is the reason why Macy’s is not specifically ranking with it particular black page, because this is not indexable. It’s canonicalizing back to the leather backpack ones. And you are going to tell me that, and it’s like, “Okay, so Macy’s is getting away with it, because they are not necessarily indexing their particularly relevant and specific black page to target that query, and it is okay enough to get be ranked in the fourth … What? One, two, three, fourth position, so that should be okay.”

Well, it is Macy’s, again. And this is what happens with big brands, that they are able to get away with it, because they’re already … leather backpack pages already worthy enough and relevant enough to rank with that too et cetera, et cetera. There are many reasons regarding relevance and popularity with an authority of these type of pages. However, imagine what could happen if they were addressing, specifically addressing, by indexing their specific black page. They might not necessarily be in the fourth position, but in the third, in the second, we don’t know their reasonable score. I have no idea what’s going on with Macy’s. We shouldn’t necessarily say, “Well, they are doing this wrong,” because they don’t know.

They certainly might know about it, but this is what happens with huge websites. They have so many constraints with the platform. So many priorities that need to be addressed, and not necessarily that match flexibility. They do have resources, but not necessarily flexibility. However, if they have the opportunity to start specifically indexing for this very popular colors, not necessarily for all of them, again, because it’s not worth it to specifically optimize every single color page, because some of them won’t necessarily have as much potential, have as much of a high search volume. So it won’t be worth it. But for black, definitely. So in one hand, we don’t want you to leave out this potentially profitable and relevant queries that will bring tons of value to our facet pages. On the other hand though, it’s equally important that we make sure that these facet pages feature enough products, enough content in order to fulfill those queries.

Not only to provide the best possible experience to our users and customers, but also in order to avoid any type of thin content and content duplication issue, which is something that we definitely don’t like. So this is another requirement, another criteria to take into consideration. These pages need to feature enough products. If these pages, for example have less than three products or five products, those that we consider the minimum to fulfill our customer’s requirements, then it might be not worth it to index them, and we should leave that canonicalized to their parent categories, for example.

So if we take this couple of criteria into consideration, again for the market fit, supply and demand, the connection between these two, we are not going to be able to also target, and profit, and make the most out of this potential, really profitable queries on one hand, but also to avoid content duplication, thin content issues. And making sure that our web structure is currently targeted to fulfill our customer’s behaviors and demand towards our products and services.

I hope that you have liked today’s episode, and of course there’s so much more regarding this topic. For example, if we decide to no index the facet pages at some point, if it is not worth it, because the search volume of the queries is not big enough, or because maybe we don’t have a good amount of products, a good volume of products to feature in these pages, what is really the best way to no index them? Is it to canonicalize the parent’s categories? Should we instead no indexing these pages, or should also avoid, at the same time, completed or crawling from the website.

There’s so much more here, but that will be for a future episode. And thank you very much again for watching this. Don’t forget to subscribe, in case you haven’t yet done it. And please follow Crawling Mondays in Twitter too, to keep updated of any new videos, and also provide a few ideas for future episodes too, if you want to. Thank you very much. And until next Monday.