The great decline in Wikipedia pageviews (condensed version)


13


To keep this post manageable in length, I have only included a small subset of the illustrative examples and discussion. I have published a longer version of this post, with more examples (but the same intro and concluding section), on my personal site.

Last year, during the months of June and July, as my work for MIRI was wrapping up and I hadn't started my full-time job, I worked on the Wikipedia Views website, aimed at easier tabulation of the pageviews for multiple Wikipedia pages over several months and years. It relies on a statistics tool called stats.grok.se, created by Domas Mituzas, and maintained by Henrik.

One of the interesting things I noted as I tabulated pageviews for many different pages was that the pageview counts for many already popular pages were in decline. Pages of various kinds peaked at different historical points. For instance, colors have been in decline since early 2013. The world's most populous countries have been in decline since as far back as 2010!

Defining the problem

The first thing to be clear about is what these pageviews count and what they don't. The pageview measures are taken from stats.grok.se, which in turn uses the pagecounts-raw dump provided hourly by the Wikimedia Foundation's Analytics team, which in turn is obtained by processing raw user activity logs. The pagecounts-raw measure is flawed in two ways:

  • It only counts pageviews on the main Wikipedia website and not pageviews on the mobile Wikipedia website or through Wikipedia Zero (a pared down version of the mobile site that some carriers offer at zero bandwidth costs to their customers, particularly in developing countries). To remedy these problems, a new dump called pagecounts-all-sites was introduced in September 2014. We simply don't have data for views of mobile domains or of Wikipedia Zero at the level of individual pages for before then. Moreover, stats.grok.se still uses pagecounts-raw (this was pointed to me in a mailing list message after I circulated an early version of the post).
  • The pageview count includes views by bots. The official estimate is that about 15% of pageviews are due to bots. However, the percentage is likely higher for pages with fewer overall pageviews, because bots have a minimum crawling frequency. So every page might have at least 3 bot crawls a day, resulting in a minimum of 90 bot pageviews even if there are only a handful of human pageviews.

Therefore, the trends I discuss will refer to trends in total pageviews for the main Wikipedia website, including page requests by bots, but excluding visits to mobile domains. Note that visits from mobile devices to the main site will be included, but mobile devices are by default redirected to the mobile site.

How reliable are the metrics?

As noted above, the metrics are unreliable because of the bot problem and the issue of counting only non-mobile traffic. German Wikipedia user Atlasowa left a message on my talk page pointing me to an email thread suggesting that about 40% of pageviews may be bot-related, and discussing some interesting examples.

Relationship with the overall numbers

I'll show that for many pages of interest, the number of pageviews as measured above (non-mobile) has declined recently, with a clear decline from 2013 to 2014. What about the total?

We have overall numbers for non-mobile, mobile, and combined. The combined number has largely held steady, whereas the non-mobile number has declined and the mobile number has risen.

What we'll find is that the decline for most pages that have been around for a while is even sharper than the overall decline. One reason overall pageviews haven't declined so fast is the creation of new pages. To give an idea, non-mobile traffic dropped by about 1/3 from January 2013 to December 2014, but for many leading categories of pages, traffic dropped by about 1/2-2/3.

Why is this important? First reason: better context for understanding trends for individual pages

People's behavior on Wikipedia is a barometer of what they're interested in learning about. An analysis of trends in the views of pages can provide an important window into how people's curiosity, and the way they satisfy this curiosity, is evolving. To take an example, some people have proposed using Wikipedia pageview trends to predict flu outbreaks. I myself have tried to use relative Wikipedia pageview counts to gauge changing interests in many topics, ranging from visa categories to technology companies.

My initial interest in pageview numbers arose because I wanted to track my own influence as a Wikipedia content creator. In fact, that was my original motivation with creating Wikipedia Views. (You can see more information about my Wikipedia content contributions on my site page about Wikipedia).

Now, when doing this sort of analysis for individual pages, one needs to account for, and control for, overall trends in the views of Wikipedia pages that are occurring for reasons other than a change in people's intrinsic interest in the subject. Otherwise, we might falsely conclude from a pageview count decline that a topic is falling in popularity, whereas what's really happening is an overall decline in the use of (the non-mobile version of) Wikipedia to satisfy one's curiosity about the topic.

Why is this important? Second reason: a better understanding of the overall size and growth of the Internet.

Wikipedia has been relatively mature and has had the top spot as an information source for at least the last six years. Moreover, unlike almost all other top websites, Wikipedia doesn't try hard to market or optimize itself, so trends in it reflect a relatively untarnished view of how the Internet and the World Wide Web as a whole are growing, independent of deliberate efforts to manipulate and doctor metrics.

The case of colors

Let's look at Wikipedia pages on some of the most viewed colors (I've removed the 2015 and 2007 columns because we don't have the entirety of these years). Colors are interesting because the degree of human interest in colors in general, and in individual colors, is unlikely to change much in response to news or current events. So one would at least a priori expect colors to offer a perspective into Wikipedia trends with fewer external complicating factors. If we see a clear decline here, then that's strong evidence in favor of a genuine decline.

I've restricted attention to a small subset of the colors, that includes the most common ones but isn't comprehensive. But it should be enough to get a sense of the trends. And you can add in your own colors and check that the trends hold up.

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Black 431K 1.5M 1.3M 778K 900K 1M 958K 6.9M 16.1 Colors
Blue 710K 1.3M 1M 987K 1.2M 1.2M 1.1M 7.6M 17.8 Colors
Brown 192K 284K 318K 292K 308K 300K 277K 2M 4.6 Colors
Green 422K 844K 779K 707K 882K 885K 733K 5.3M 12.3 Colors
Orange 133K 181K 251K 259K 275K 313K 318K 1.7M 4 Colors
Purple 524K 906K 847K 895K 865K 841K 592K 5.5M 12.8 Colors
Red 568K 797K 912K 1M 1.1M 873K 938K 6.2M 14.6 Colors
Violet 56K 96K 75K 77K 69K 71K 65K 509K 1.2 Colors
White 301K 795K 615K 545K 788K 575K 581K 4.2M 9.8 Colors
Yellow 304K 424K 453K 433K 452K 427K 398K 2.9M 6.8 Colors
Total 3.6M 7.1M 6.6M 6M 6.9M 6.5M 6M 43M 100 --
Percentage 8.5 16.7 15.4 14 16 15.3 14 100 -- --
 

Since the decline appears to have happened between 2013 and 2014, let's examine the 24 months from January 2013 to December 2014:

 

MonthViews of page BlackViews of page BlueViews of page BrownViews of page GreenViews of page OrangeViews of page PurpleViews of page RedViews of page VioletViews of page WhiteViews of page YellowTotal Percentage
201412 30K 41K 14K 27K 9.6K 28K 67K 3.1K 21K 19K 260K 2.4
201411 36K 46K 15K 31K 10K 35K 50K 3.7K 23K 22K 273K 2.5
201410 37K 52K 16K 34K 10K 34K 51K 4.5K 25K 26K 289K 2.7
201409 37K 57K 16K 35K 9.9K 37K 45K 4.8K 27K 29K 298K 2.8
201408 33K 47K 14K 34K 8.5K 31K 38K 3.9K 21K 22K 253K 2.4
201407 33K 47K 14K 30K 9.3K 31K 37K 4.2K 22K 22K 250K 2.3
201406 32K 49K 14K 31K 10K 34K 39K 4.9K 23K 22K 259K 2.4
201405 44K 55K 17K 37K 10K 51K 42K 5.2K 26K 26K 314K 2.9
201404 34K 60K 17K 36K 14K 38K 47K 5.8K 27K 28K 306K 2.8
201403 37K 136K 19K 51K 14K 123K 52K 5.5K 30K 31K 497K 4.6
201402 38K 58K 19K 39K 13K 41K 49K 5.6K 29K 29K 321K 3
201401 40K 60K 19K 36K 14K 40K 50K 4.4K 27K 28K 319K 3
201312 62K 67K 17K 44K 12K 48K 48K 4.4K 42K 26K 372K 3.5
201311 141K 96K 20K 65K 11K 68K 55K 5.3K 71K 34K 566K 5.3
201310 145K 102K 21K 69K 11K 77K 59K 5.7K 71K 36K 598K 5.6
201309 98K 80K 17K 60K 11K 53K 51K 4.9K 45K 30K 450K 4.2
201308 109K 87K 20K 57K 20K 57K 60K 4.6K 53K 28K 497K 4.6
201307 107K 92K 21K 61K 11K 66K 65K 4.6K 61K 30K 520K 4.8
201306 115K 106K 22K 69K 13K 73K 64K 5.5K 70K 33K 571K 5.3
201305 158K 122K 24K 79K 14K 83K 69K 11K 77K 39K 677K 6.3
201304 151K 127K 28K 83K 14K 86K 74K 12K 78K 40K 694K 6.4
201303 155K 135K 31K 92K 15K 99K 84K 12K 80K 43K 746K 6.9
201302 152K 131K 31K 84K 28K 95K 84K 17K 77K 41K 740K 6.9
201301 129K 126K 32K 81K 19K 99K 84K 9.6K 70K 42K 691K 6.4
Total 2M 2M 476K 1.3M 314K 1.4M 1.4M 152K 1.1M 728K 11M 100
Percentage 18.1 18.4 4.4 11.8 2.9 13.3 12.7 1.4 10.2 6.8 100 --
Tags Colors Colors Colors Colors Colors Colors Colors Colors Colors Colors -- --

 

As we can see, the decline appears to have begun around March 2013 and then continued steadily till about June 2014, at which numbers stabilized to their lower levels.

A few sanity checks on these numbers:

  • The trends appear to be similar for different colors, with the notable difference that the proportional drop was higher for the more viewed color pages. Thus, for instance, black and blue saw declines from 129K and 126K to 30K and 41K respectively (factors of four and three respectively) from January 2013 to December 2014. Orange and yellow, on the other hand, dropped by factors of close to two. The only color that didn't drop significantly was red (it dropped from 84K to 67K, as opposed to factors of two or more for other colors), but this seems to have been partly due to an unusually large amount of traffic in the end of 2014. The trend even for red seems to suggest a drop similar to that for orange.
  • The overall proportion of views for different colors comports with our overall knowledge of people's color preferences: blue is overall a favorite color, and this is reflected in its getting the top spot with respect to pageviews.
  • The pageview decline followed a relatively steady trend, with the exception of some unusual seasonal fluctuation (including an increase in October and November 2013).

One might imagine that this is due to people shifting attention from the English-language Wikipedia to other language Wikipedias, but most of the other major language Wikipedias saw a similar decline at a similar time. More details are in my longer version of this post on my personal site.

Geography: continents and subcontinents, countries, and cities

Here are the views of some of the world's most populated countries between 2008 and 2014, showing that the peak happened as far back as 2010:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
China 5.7M 6.8M 7.8M 6.1M 6.9M 5.7M 6.1M 45M 9 Countries
India 8.8M 12M 12M 11M 14M 8.8M 7.6M 73M 14.5 Countries
United States 13M 15M 18M 18M 34M 16M 15M 129M 25.7 Countries
Indonesia 5.3M 5.2M 3.7M 3.6M 4.2M 3.1M 2.5M 28M 5.5 Countries
Brazil 4.8M 4.9M 5.3M 5.5M 7.5M 4.9M 4.3M 37M 7.4 Countries
Pakistan 2.9M 4.5M 4.4M 4.3M 5.2M 4M 3.2M 28M 5.7 Countries
Bangladesh 2.2M 2.9M 3M 2.8M 2.9M 2.2M 1.7M 18M 3.5 Countries
Russia 5.6M 5.6M 6.5M 6.8M 8.6M 5.4M 5.8M 44M 8.8 Countries
Nigeria 2.6M 2.6M 2.9M 3M 3.5M 2.6M 2M 19M 3.8 Countries
Japan 4.8M 6.4M 6.5M 8.3M 10M 7.3M 6.6M 50M 10 Countries
Mexico 3.1M 3.9M 4.3M 4.3M 5.9M 4.7M 4.5M 31M 6.1 Countries
Total 59M 69M 74M 74M 103M 65M 59M 502M 100 --
Percentage 11.7 13.8 14.7 14.7 20.4 12.9 11.8 100 -- --

Of these countries, China, India and the United States are the most notable. China is the world's most populous. India has the largest population with some minimal English knowledge and legally (largely) unfettered Internet access to Wikipedia, while the United States has the largest population with quality Internet connectivity and good English knowledge. Moreover, in China and India, Internet use and access have been growing considerably in the last few years, whereas it has been relatively stable in the United States.

It is interesting that the year with the maximum total pageview count was as far back as 2010. In fact, 2010 was so significantly better than the other years that the numbers beg for an explanation. I don't have one, but even excluding 2010, we see a declining trend: gradual growth from 2008 to 2011, and then a symmetrically gradual decline. Both the growth trend and the decline trend are quite similar across countries.

We see a similar trend for continents and subcontinents, with the peak occurring in 2010. In contrast, the smaller counterparts, such as cities, peaked in 2013, similarly to colors, and the drop, though somewhat less steep than with colors, has been quite significant. For instance, a list for Indian cities shows that the total pageviews for these Indian cities declined from about 20 million in 2013 (after steady growth in the preceding years) to about 13 million in 2014.

Some niche topics where pageviews haven't declined

So far, we've looked at topics where pageviews have been declining since at least 2013, and some that peaked as far back as 2010. There are, however, many relatively niche topics where the number of pageviews has stayed roughly constant. But this stability itself is a sign of decay, because other metrics suggest that the topics have experienced tremendous growth in interest. In fact, the stability is even less impressive when we notice that it's a result of a cancellation between slight declines in views of established pages in the genre, and traffic going to new pages.

For instance, consider some charity-related pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Against Malaria Foundation 5.9K 6.3K 4.3K 1.4K 2 0 0 18K 15.6 Charities
Development Media International 757 0 0 0 0 0 0 757 0.7 Pages created by Vipul Naik Charities
Deworm the World Initiative 2.3K 277 0 0 0 0 0 2.6K 2.3 Charities Pages created by Vipul Naik
GiveDirectly 11K 8.3K 2.6K 442 0 0 0 22K 19.2 Charities Pages created by Vipul Naik
International Council for the Control of Iodine Deficiency Disorders 1.2K 1 2 2 0 1 2 1.2K 1.1 Charities Pages created by Vipul Naik
Nothing But Nets 5.9K 6.6K 6.6K 5.1K 4.4K 4.7K 6.1K 39K 34.2 Charities
Nurse-Family Partnership 2.9K 2.8K 909 30 8 72 63 6.8K 5.9 Pages created by Vipul Naik Charities
Root Capital 3K 2.5K 414 155 51 1.2K 21 7.3K 6.3 Charities Pages created by Vipul Naik
Schistosomiasis Control Initiative 4K 2.7K 1.6K 191 0 0 0 8.5K 7.4 Charities Pages created by Vipul Naik
VillageReach 1.7K 1.9K 2.2K 2.6K 97 3 15 8.4K 7.3 Charities Pages created by Vipul Naik
Total 38K 31K 19K 9.9K 4.6K 5.9K 6.2K 115K 100 --
Percentage 33.4 27.3 16.3 8.6 4 5.1 5.4 100 -- --

For this particular cluster of pages, we see the totals growing robustly year-on-year. But a closer look shows that the growth isn't that impressive. Whereas earlier, views were doubling every year from 2010 to 2013 (this was the take-off period for GiveWell and effective altruism), the growth from 2013 to 2014 was relatively small. And about half the growth from 2013 to 2014 was powered by the creation of new pages (including some pages created after the beginning of 2013, so they had more months in a mature state in 2014 than in 2013), while the other half was powered by growth in traffic to existing pages.

The data for philanthropic foundations demonstrates a fairly slow and steady growth (about 5% a year), partly due to the creation of new pages. This 5% hides a lot of variation between individual pages:

Page namePageviews in year 2014Pageviews in year 2013Pageviews in year 2012Pageviews in year 2011Pageviews in year 2010Pageviews in year 2009Pageviews in year 2008TotalPercentageTags
Atlantic Philanthropies 11K 11K 12K 10K 9.8K 8K 5.8K 67K 2.1 Philanthropic foundations
Bill & Melinda Gates Foundation 336K 353K 335K 315K 266K 240K 237K 2.1M 64.9 Philanthropic foundations
Draper Richards Kaplan Foundation 1.2K 25 9 0 0 0 0 1.2K 0 Philanthropic foundations Pages created by Vipul Naik
Ford Foundation 110K 91K 100K 90K 100K 73K 61K 625K 19.5 Philanthropic foundations
Good Ventures 9.9K 8.6K 3K 0 0 0 0 21K 0.7 Philanthropic foundations Pages created by Vipul Naik
Jasmine Social Investments 2.3K 1.8K 846 0 0 0 0 5K 0.2 Philanthropic foundations Pages created by Vipul Naik
Laura and John Arnold Foundation 3.7K 13 0 1 0 0 0 3.7K 0.1 Philanthropic foundations Pages created by Vipul Naik
Mulago Foundation 2.4K 2.3K 921 0 1 1 10 5.6K 0.2 Philanthropic foundations Pages created by Vipul Naik
Omidyar Network 26K 23K 19K 17K 19K 13K 11K 129K 4 Philanthropic foundations
Peery Foundation 1.8K 1.6K 436 0 0 0 0 3.9K 0.1 Philanthropic foundations Pages created by Vipul Naik
Robert Wood Johnson Foundation 26K 26K 26K 22K 27K 22K 17K 167K 5.2 Philanthropic foundations
Skoll Foundation 13K 11K 9.2K 7.8K 9.6K 5.8K 4.3K 60K 1.9 Philanthropic foundations
Smith Richardson Foundation 8.7K 3.5K 3.8K 3.6K 3.7K 3.5K 2.9K 30K 0.9 Philanthropic foundations
Thiel Foundation 3.6K 1.5K 1.1K 47 26 1 0 6.3K 0.2 Philanthropic foundations Pages created by Vipul Naik
Total 556K 533K 511K 466K 435K 365K 340K 3.2M 100 --
Percentage 17.3 16.6 15.9 14.5 13.6 11.4 10.6 100 -- --

 

The dominant hypothesis: shift from non-mobile to mobile Wikipedia use

The dominant hypothesis is that pageviews have simply migrated from non-mobile to mobile. This is most closely borne by the overall data: total pageviews have remained roughly constant, and the decline in total non-mobile pageviews has been roughly canceled by growth in mobile pageviews. However, the evidence for this substitution doesn't exist at the level of individual pages because we don't have pageview data for the mobile domain before September 2014, and much of the decline occurred between March 2013 and June 2014.

What would it mean if there were an approximate one-on-one substitution from non-mobile to mobile for the page types discussed above? For instance, non-mobile traffic to colors dropped to somewhere between 1/3 and 1/2 of their original traffic level between January 2013 and December 2014. This would mean that somewhere between 1/2 and 2/3 of the original non-mobile traffic to colors has shifted to mobile devices. This theory should be at least partly falsifiable: if the sum of traffic to non-mobile and mobile platforms today for colors is less than non-mobile-only traffic in January 2013, then clearly substitution is only part of the story.

Although the data is available, it's not currently in an easily computable form, and I don't currently have the time and energy to extract it. I'll update this once the data on all pageviews since September 2014 is available on stats.grok.se or a similar platform.

Other hypotheses

The following are some other hypotheses for the pageview decline:

  1. Google's Knowledge Graph: This is the hypothesis raised in Wikipediocracy, the Daily Dot, and the Register. The Knowledge Graph was introduced in 2012. Through 2013, Google rolled out snippets (called Knowledge Cards and Knowledge Panels) based on the Knowledge Graph in its search results. So if, for instance, you only wanted the birth date and nationality of a musician, Googling would show you that information right in the search results and you wouldn't need to click through to the Wikipedia page. I suspect that the Knowledge Graph played some role in the decline for colors seen between March 2013 and June 2014. On the other hand, many of the pages that saw a decline don't have any search snippets based on the Knowledge Graph, and therefore the decline for those pages cannot be explained this way.
  2. Other means of accessing Wikipedia's knowledge that don't involve viewing it directly: For instance, Apple's Siri tool uses data from Wikipedia, and people making queries to this tool may get information from Wikipedia without hitting the encyclopedia. The usage of such tools has increased greatly starting in late 2012. Siri itself was released with the third generation iPad in September 2012 and became part of the iPhone released the next month. Since then, it has shipped with all of Apple's mobile devices and tablets.
  3. Substitution away from Wikipedia to other pages that are becoming more search-optimized and growing in number: For many topics, Wikipedia may have been clearly the best information source a few years back (as judged by Google), but the growth of niche information sources, as well as better search methods, have displaced it from its undisputed leadership position. I think there's a lot of truth to this, but it's hard to quantify.
  4. Substitution away from coarser, broader pages to finer, narrower pages within Wikipedia: While this cannot directly explain an overall decline in pageviews, it can explain a decline in pageviews for particular kinds of pages. Indeed, I suspect that this is partly what's going on with the early decline of pageviews (e.g., the decline in pageviews of countries and continents starting around 2010, as people go directly to specialized articles related to the particular aspects of those countries or continents they are interested in).
  5. Substitution to Internet use in other languages: This hypothesis doesn't seem borne out by the simultaneous decline in pageviews for the English, French, and Spanish Wikipedia, as documented for the color pages.

It's still a mystery

I'd like to close by noting that the pageview decline is still very much a mystery as far as I am concerned. I hope I've convinced you that (a) the mystery is genuine, (b) it's important, and (c) although the shift to mobile is probably the most likely explanation, we don't yet have clear evidence. I'm interested in hearing whether people have alternative explanations, and/or whether they have more compelling arguments for some of the explanations proffered here.