A Survey of Rel Values on the Web (original) (raw)
One of the interesting things about sharing an office with Jyri is that our free-association stream-of-consciousness conversations often lead to places worth exploring further.
On Friday Jyri and I started wondering about the link rel values documented in the XFN 1.1 profile, which include not only the relatively commonplace me
and friend
values, but also such unconventional values such as colleague
, muse
, andspouse
. But how frequently are the lesser known rel values really used? Rather than speculate blindly, I wrote a simple mapreduce to check the web and find out for sure.
The mapreduce scanned approximately 177 million recently crawled HTML documents, parsing and counting rel
values in link and anchor tags along the way. In those 177M documents, I found just over 19 billion and tags in total. And of those 19B tags, 1.8 billion of them contained a non-empty rel
attribute.
Following the HTML5 rules for space separated tokens I split each rel value on [\s\t\n\r\f]
and extracted each individual value. In total, over 1.9B instances of rel values were found, or an average of just over 10 per HTML document (with some tags having more than one rel value).
I found a staggering 1.8M unique rel value strings in use, with many used only once or twice across all the web. In fact, the top 6 most-frequently-used rel values accounted for 80% of all usage, and the top 11 alone were responsible for 90% of all usage. In fact, less than 1000 of the most frequently unique rel values are sufficient to represent the 99th percentile of all usage. In other words, the tail is long indeed, with the remainder of those 1.8M unique rel values accounting for less than 1% of the total usage.
In passing, I noticed that approximately 3 million rel value strings also contained a comma character; presumably cases where the author may mistakenly have thought that the ","
character would be used as a delimiter. However, since these cases account for just 0.18% of all rel value strings, they have little impact in the overall totals.
Here are the top 25 rel values found in and tags in a moderately sized sample of the web today:
Rank | Value | Count | Relative Frequency |
---|---|---|---|
1 | nofollow | 832980014 | ![]() |
2 | stylesheet | 338648161 | ![]() |
3 | tag | 168764800 | ![]() |
4 | alternate | 109150404 | ![]() |
5 | icon | 69183607 | ![]() |
6 | chapter | 56395793 | ![]() |
7 | forum | 55920646 | ![]() |
8 | shortcut | 53906964 | ![]() |
9 | bookmark | 30683701 | ![]() |
10 | archives | 25381711 | ![]() |
11 | category | 24361195 | ![]() |
12 | external | 19181232 | ![]() |
13 | search | 14227485 | ![]() |
14 | edituri | 8109835 | ![]() |
15 | apple-touch-icon | 6753583 | ![]() |
16 | help | 4842211 | ![]() |
17 | prev | 4537344 | ![]() |
18 | next | 4390373 | ![]() |
19 | pingback | 4302068 | ![]() |
20 | wlwmanifest | 4125573 | ![]() |
21 | contents | 3959350 | ![]() |
22 | contact | 3504587 | ![]() |
23 | service.post | 2678873 | ![]() |
24 | top | 2502015 | ![]() |
25 | me | 2501273 | ![]() |
The most frequently used values are not surprising at all. The nofollow
value is used as a hint to search engines that the target of an tag should not be used in ranking calculations. The stylesheet
value is used on tags to indicate that the target is an external CSS document. The tag
is a microformat used to indicate a category for the page, as popularized by sites such as Technoratiand Delicious. And alternate
is frequently used to facilitate the autodiscovery of an RSS or Atom feed for a given site.
Further down we learn that as OpenID continues to gain in adoption the openid.server
and openid.delegate
rel values come in at #35 and #43 respectively -- impressive, since each are only needed once per-page. And even the newer OpenID2-style tags not far behind, with openid2.provider
and openid2.local_id
reaching #51 and #837 respectively.
Near and dear to my heart, I was pleased to see the search
rel value, the OpenSearch discovery mechanism, ranked so high at #13. Again these discovery links are only needed once per page; a sign of strong adoption. Admittedly, not allrel="search"
links are OpenSearch related, but I have another more comprehensive analysis of OpenSearch documents that shows similarly pervasive adoption rates.
Even the newly agreed-uponcanonical
rel value makes a showing at #271, and will surely rise to the top 25 or so over the next year or two.
And the XFN rel values? The contact
rel value is the most common at #22, with me
and friend
just behind at #25 and #28 respectively. Filling out the list are acquaintance
(#58), met
(#68), colleague
(#84), co-worker
(#126), neighbor
(#180), muse
(#196), co-resident
(#232), parent
(#255), sibling
(#414), sweetheart
(#446), spouse
(#570), crush
(#794), kin
(#834), child
(#879), with date
bringing up the rear at #1086.
This survey indicates that rel values are both widely and meaningfully used, with adoption being driven by a wide array of needs, such as semantic markup, search engine hints, client-side rendering, discovery and identity protocols, blogging, and/or content that can be later edited.
But more importantly, we learned that a full 0.0003% of all the links have declared, for all the world to see, that some URI out there is their source of inspiration, their Calliope, their Erato, their muse.