Distribution "popularity" [LWN.net] (original) (raw)

Benefits for LWN subscribers

The primary benefit from subscribing to LWN is helping to keep us publishing, but, beyond that, subscribers get immediate access to all site content and access to a number of extra site features. Please sign up today!

There has been quite a bit of press, and some hand-wringing, over reports that Linux Mint has overtaken Ubuntu as the "most popular" Linux distribution. The reports are based on the DistroWatch rankings, which some—though notably not the DistroWatch folks—seem to think indicates the popularity of various distributions. While it's a bit hard to imagine that untold legions of Ubuntu users have switched to Linux Mint en masse, it does have a non-zero probability of being true. But there aren't, and really can't be, any numbers to back that up. Is popularity even really the best measure of a distribution?

The "rankings" that have spawned the uproar are simple page-hit counts. Each unique IP address that lands on DistroWatch's page for a given distribution increments the count for the day. It is, at best, a count of the amount of "buzz" a particular distribution has over the past one, three, six, and twelve months. It can also be fairly easily manipulated by someone who has unfettered access to a large number of IP addresses or a botnet—as well as by over-exuberant distribution fans—though there is no evidence to suggest that's what's happening here. As DistroWatch says, those numbers are:

[A] light-hearted way of measuring the popularity of Linux distributions and other free operating systems among the visitors of this website. They correlate neither to usage nor to quality and should not be used to measure the market share of distributions.

But, for whatever reason, Mint shows up at the top of the list for average number of hits per day (HPD) for each of the four periods. In fact, Ubuntu has "slipped" to fourth place over the last month with Fedora and openSUSE taking second and third place respectively. Mint shows nearly three times the number of HPD that any of the rest of the top four do. That's interesting, perhaps, but not meaningful. It is a self-selected "poll" that could be fairly easily manipulated—likely unintentionally.

The ranking is also heavily skewed toward desktop distributions, as can be seen by the numbers for server-oriented distributions like Red Hat (which ranks below things like GhostBSD, Zorin, and Tiny Core) or SUSE (which ranks a bit lower). Both of those distributions should have accurate sales numbers that may show a tad more popularity than reading things into the DistroWatch numbers will show. In short, even a brief look at the rankings page should be enough to deter anyone from deriving conclusions that result in headlines like "Ubuntu sees massive slide in popularity, Mint sprints ahead ... but why?".

Part of the problem here is that it is somewhere between difficult and impossible to get accurate figures for distribution usage. In fact, it goes well beyond just distributions; accurately counting users of any free or proprietary software is well-nigh impossible. Vendors who sell their software have some advantage, but even they don't know how many_users_ there are. Microsoft can undoubtedly report how many copies of Windows it sold in the last month (quarter, year, ...), but that most certainly doesn't count the number of Windows users. That number is likely to be much higher due to unlicensed users, which probably dwarfs the not completely insubstantial number of pre-installed systems that get wiped to run other operating systems.

The usual methods to try to track users, like phoning home with some kind of unique ID, are intrusive. For free software, those mechanisms are unlikely to be tolerated by some, but even users of proprietary software may find ways to avoid being counted. Companies selling software count their users in terms of dollars (euros, ...) so, other than being able to report inflated piracy numbers as "lost sales", there is no real need for additional counting. Free software projects and distributions are different.

Those who work on free projects would certainly like to feel that their work is being used and appreciated. That's not unreasonable at all, but is popularity really the right measure of that? Even if it can be reliably measured, popularity just measures ... well ... what's popular—not what works best, solves the most problems, or anything else. Does it really matter if Ubuntu has X million users and Linux Mint has X/4 million—or the reverse? In both cases, the distributions are serving a substantial number of people and, presumably, solving lots of their problems.

There are some "active counting" efforts by various distributions but, as would be expected for free software projects, they are "opt-in" services. Fedora and openSUSE both use smolt to gather semi-anonymized installation data. Debian and Ubuntu use popcon to generate information on the popularity of various packages. While users are asked to enable these counting mechanisms at install time, it's not clear how many actually do so.

Since directly measuring users is difficult, distributions often use indirect (and fairly inaccurate) methods to try to get a handle on their number of users. Both Fedora and openSUSE count unique IP-address connections to their update servers and have fairly detailed pages that outline what they are counting (openSUSE, Fedora). Ubuntu has been notoriously lax in providing any real information on its methodology—without being shy about producing numbers like 20 million Ubuntu users—but one would guess it is doing something similar.

That kind of data collection isn't really accurate to generate a "number of users" figure, though it may be fine as an estimate. Assuming the methodology remains the same, it may also serve as a reasonable indicator of trends in the number of users. If Fedora 16 has 50% more unique IPs getting updates, that's a pretty good indicator that F16 has been adopted more widely. Comparing F16's raw numbers to those of openSUSE 11.3, for example, is much less useful.

But obsessing over estimated numbers—or illusory trends based on web page hits—seems counterproductive. While it is harder to generate numbers, the measure of a community distribution really should be how vibrant its community is. Are new people showing up, filing bugs, participating in development or design discussions, packaging new software, translating existing software, taking on new tasks, running for elected positions, and so on? Those are certainly measures of growth, though numerically hard to quantify.

Focusing on a "zero sum" game for Linux distributions is equally counterproductive. While the GNOME 3 and Unity decisions made by various distributions have generated a lot of noise (and likely some distribution and desktop environment switches), it's pretty hard to justify a "Ubuntu users are running to Mint because of Unity" stance on anything other than anecdotal evidence. If the suggested trend is even real, it could be that Mint is attracting many of the first-time Linux users that Ubuntu once did, or that it is attracting more than Ubuntu currently is. That could be due to the "buzz" factor for Mint these days, for example. Not all (or even most) growth of Linux distributions needs to come at the expense of other distributions.

Unlike the choice between Windows and OS X (or Linux and either of those), the choice between Linux distributions is far less susceptible to concerns about lock-in. Part of what free software enables is relatively easy migration between distributions, with full data and application portability, which undoubtedly leads to some "distro hopping". But it's also true that providing that freedom can attract new users. We've seen it over the past 20 years, even if the growth on the desktop is not up to what most had hoped for. Focusing on serving existing users, while attracting new ones, rather than worrying about pumping up popularity numbers, is a much more likely road to success.