A Market Wide Hreflang Implementation Extension Alternative via Robots.txt
Gary Illyes asked for alternatives to make hreflang simpler.
But why? To summarize, this is what is wrong with hreflang at the moment:
- People think hreflang is a definitive signal for language/geolocation, when is not (it needs to align with others) so when Google doesn’t take it into account, site owners get frustrated.
- People also think they need to implement hreflang in every single URL of the site all the time, or end up doing it so due to tech constraints of their platforms, since most CMS still don’t give direct or flexible features for it and you need to rely on plugins/add-on’s that are not that flexible either. This is not true, and hreflang is only needed on those indexable pages with actual international overlay issues. Because of this, the implementation approach across every single URL generates an unnecessary complexity and triggers an excess of errors when some of the pages become not indexed, canonicalized, redirected, generate errors, etc.
- Google’s lack of official reporting on what they’re seeing (eliminated from the Google Search Console) and how they’re actually assigning geolocation to pages. This doesn’t help to troubleshoot and/or understand if there’s an actual problem.
- The complexity also increases by having to implement the hreflang annotation at the URL level, rather than a market level, so Google can apply the language and/or country value to all indexable pages under the scope themselves.
The combination of all this makes hreflang implementation unnecessarily complex. So, here’s a proposal: What about leveraging robots.txt to specify hreflang at a market-wide level per property, while keeping the core elements of the current hreflang specification?
What?
Market Wide Hreflang Implementation allows to specify Hreflang language (and alternatively) country values at a market wide scope, by declaring the values for each Web version at a property level via the robots.txt, rather than what is required today, at a page level, while having to specify each page alternate.
This allows to simplify Hreflang implementation and minimize issues, while keeping its most important functionality of specifying the language / country target of pages within the scope of the properties included, as well as their alternates, in robots.txt.
How would this work?
The market wide Hreflang implementation alternative would be specified via the robots.txt while leveraging the core of current Hreflang specification:
A. For a multilingual site in subdirectories under a gTLD: example.com/en/ in English, example.com/es/ in Spanish, example.com/de/ in German.
In example.com/robots.txt the language market-wide hreflang specification will be:
User-agent: *
hreflang_scope: https://example.com/en/
hreflang=”en”
hreflang_scope: https://example.com/es/
hreflang=”es”
hreflang_scope: https://example.com/de/
hreflang=”de”
B. For a multi country site in subdirectories under a gTLD: example.com/en-us/ in English for the US, example.com/es-es/ in Spanish for Spain, and example.com/de-de/ in German for Germany.
In example.com/robots.txt the country market-wide hreflang specification will be:
User-agent: *
hreflang_scope: https://example.com/en-us/
hreflang=”en-us”
hreflang_scope: https://example.com/es-es/
hreflang=”es-es”
hreflang_scope: https://example.com/de-de/
hreflang=”de-de”
C. For a multi country site across ccTLDs: example.com in English for the US and example.co.uk in English for the UK.
In example.com/robots.txt the country market-wide hreflang specification will be:
User-agent: *
hreflang_scope: https://example.com
hreflang=”en-us”
hreflang_scope: https://example.co.uk
hreflang=”en-gb”
In example.co.uk/robots.txt the country market-wide hreflang specification will be:
User-agent: *
hreflang_scope: https://example.co.uk
hreflang=”en-gb”
hreflang_scope: https://example.com
hreflang=”en-us”
Why this hreflang alternative/extension?
- It minimizes complexity of implementation by specifying the language and alternatively the country of a Web property per market as well as the alternates through the robots.txt of each language/country version rather than at a page level.
- It leverages the core elements of current Hreflang specification by keeping the syntax and values, facilitating implementations with already known elements and through an already used standard, robots.txt.
- It’s less prone to execution errors, and makes it easier to troubleshoot them.
- Another also simple option would be to bring back the Google Search Console geolocation feature, although I imagine that the goal is to have an option that can be easy to standardize, which that wouldn’t allow.
Another hreflang implementation option is the Json based alternative specified by Dave Smart here.
What is clear is that whatever the alternative, bringing back the Google Search Console hreflang report would be ideal to help identify its status and have direct information for troubleshooting.