Protiguous

Question

Recently, this paper has received a lot of attention (e.g. from WSJ). Basically, the authors conclude that Facebook will lose 80% of its members by 2017.

They base their claims on an extrapolation of the SIR model, a compartmental model frequently used in epidemiology. Their data is drawn from Google searches for "Facebook", and the authors use the demise of Myspace to validate their conclusion.

Question:

Are the authors making a "correlation does not imply causation" mistake? This model and logic may have worked for Myspace, but is it valid for any social network?

Update: Facebook hits back

In keeping with the scientific principle "correlation equals causation," our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely.
We don’t really think Princeton or the world’s air supply is going anywhere soon. We love Princeton (and air),” and adding a final reminder that “not all research is created equal – and some methods of analysis lead to pretty crazy conclusions.

Well, the number of Facebook searches may spike upwards based on this article. ;) — RobertF, Jan 23 at 17:25
@Glen Mr. Develin appears to have thoroughly missed the point of the study. Firstly, it is not simply forecasting a trend in searches, but using them to validate and calibrate a model from the well-known SIR family, which is thought to be a good descriptor of fad adoption and abandonment. Second, his "clever" counterexamples fail because unlike Facebook, neither Princeton nor air are used primarily online. He chants the correlation-causation chant, but the correlation is over MySpace to Facebook, not over Facebook's historical data. Also, there is a conflict of interest. — Superbest, Jan 24 at 0:00
The analysis is tongue-in-cheek. The point of extrapolation as if nothing changes is valid, as the two answers have described. — Glen, Jan 24 at 0:35
This doesn't answer the question but is merely a bunch of personal opinions, totally unrelated to statistics. — ziggystar, Jan 24 at 8:58

Peter Mortensen · Answer 1 · 2014-01-27 14:12:54Z

The answers so far have focused on the data itself, which makes sense with the site this is on, and the flaws about it.

But I'm a computational/mathematical epidemiologist by inclination, so I'm also going to talk about the model itself for a little bit, because it's also relevant to the discussion.

In my mind, the biggest problem with the paper is not the Google data. Mathematical models in epidemiology handle messy data all the time, and to my mind the problems with it could be addressed with a fairly straightforward sensitivity analysis.

The biggest problem, to me, is that the researchers have "doomed themselves to success" — something that should always be avoided in research. They do this in the model they decided to fit to the data: a standard SIR model.

Briefly, a SIR model (which stands for susceptible (S) infectious (I) recovered (R)) is a series of differential equations that track the health states of a population as it experiences an infectious disease. Infected individuals interact with susceptible individuals and infect them, and then in time move on to the recovered category.

This produces a curve that looks like this:

Beautiful, is it not? And yes, this one is for a zombie epidemic. Long story.

In this case, the red line is what's being modeled as "Facebook users". The problem is this:

In the basic SIR model, the I class will eventually, and inevitably, asymptotically approach zero.

It must happen. It doesn't matter if you're modeling zombies, measles, Facebook, or Stack Exchange, etc. If you model it with a SIR model, the inevitable conclusion is that the population in the infectious (I) class drops to approximately zero.

There are extremely straightforward extensions to the SIR model that make this not true — either you can have people in the recovered (R) class come back to susceptible (S) (essentially, this would be people who left Facebook changing from "I'm never going back" to "I might go back someday"), or you can have new people come into the population (this would be little Timmy and Claire getting their first computers).

Unfortunately, the authors didn't fit those models. This is, incidentally, a widespread problem in mathematical modeling. A statistical model is an attempt to describe the patterns of variables and their interactions within the data. A mathematical model is an assertion about reality. You can get a SIR model to fit lots of things, but your choice of a SIR model is also an assertion about the system. Namely, that once it peaks, it's heading to zero.

Incidentally, Internet companies do use user-retention models that look a heck of a lot like epidemic models, but they're also considerably more complex.

Yes, I missed other models too. I am not aware of epidemiology models, but I am aware about S-curve models used in marketing. There was one review article (Meade, Islam, Technological Forecasting - Model Selection, Model Stability and Combining Models, Management Science, 1998, Vol 44, No 8.) which listed like 30 different models. Most of these models have similar reasoning, instead of susceptible, infectious and recovered they use the terms early adopter and immitator (or similar). The model is then the solution to some differential equation. — mpiktas, Jan 24 at 18:59
You hardly need to justify talking about the statistical model here on Cross Validated (CV)...Are you suggesting that not talking about the model is a flaw of CV itself? Either way, clarification would help if you want to promote awareness, or to critique constructively at all in that regard, really. Alternately, if it's a tangent not worth clarifying, how is it worth mentioning at all? As for the (inadvertent?) suggestion that Facebook users are zombies...I have no objections. (Even though I am one! :) — Nick Stauner, 2 days ago
(+1) This was my primary gripe with their article. They assumed a model that necessarily predicts a crash and then validated the model by cherry-picking a single site that exhibited the behavior they were predicting (MySpace). The meaningful reps for this sort of model is the number of comparable sites, and they tested it on one. — guy, 2 days ago
@NickStauner No, it was merely an observation that most of the critiques here (and indeed, in the rest of the internets) were on the data itself. Which means sense because the data itself is something most users here could easily criticize, while the actual details of the model aren't something I'd expect "Average statistician/Machine learning expert" to necessarily have encountered. — Fomite, 2 days ago

Adrian · Answer 2 · 2014-01-23 23:09:48Z

My primary concern with this paper is that it focuses primarily on Google search results. It is a well-established fact that smartphone use is on the rise (Pew Internet, Brandwatch), and traditional computer sales are declining (possibly just due to old computers still functioning) (Slate,ExtremeTech), as more people use smartphones to access the internet. Considering there is a native Facebook app for (at least) iOS, Android, Blackberry, and Windows Phone, it's no surprise that the number of Google queries for "facebook" has fallen significantly. If users no longer need to open a browser and mistype "facebook.com" in the URL bar, then that would definitely negatively impact the number of searches. In fact, the number of FB users who use the app has gone up significantly (TechCrunch, Forbes).

I think this study is just some "huh, interesting correlation" that got taken too far by alarmist media outlets; "Did you know the world is changing? How unexpected!"

Very well put, like you said smart phone use is on the rise and facebook gets tremendous amounts of monthly visits from cellphones/smartphones. Just because people are not searching for it does not mean it will cause a decrease in facebook usage, the way people are using facebook is changing/changed. They are no longer searching for it, they are just clicking the icon on their phone and going to it. — MCP_infiltrator, Jan 24 at 13:27
I was just about to answer the same about smartphone and Google searches — syed mohsin, Jan 25 at 0:41
"Considering there is a native Facebook app for (at least) iOS, Android, Blackberry, and Windows Phone, it's no surprise that the number of Google queries for "facebook" has fallen significantly" ... search is on smart phones as well, and there's this thing called a bookmark that is implemented on every browser. — Jeffrey Blattman, yesterday

Peter Mortensen · Answer 3 · 2014-01-27 13:43:23Z

Well, this paper establishes the fact that the number of Google searches on Facebook fits a certain curve nicely. So at best it can predict that the searches on Facebook will decline by 80%. Which might be feasible, because Facebook might become so ubiquitous that nobody would need to search about it.

The problem with such type of models is that they assume that no other factors can influence the dynamics of the observed variable. This assumption is hard to justify when dealing with data related to people. For example, this model assumes that Facebook cannot do anything to counter the loss of its users, which is a very questionable assumption to make.

And mpiktas's first paragraph touches on a good point as well - the authors are using Google search queries as a proxy for the number of Facebook accounts. Why not go straight to the account data? It's not difficult to find: news.yahoo.com/number-active-users-facebook-over-230449748.html — RobertF, Jan 23 at 18:55
Although to be fair, graphing the data from the above article does show the number of active users was close to peaking in 2013. — RobertF, Jan 23 at 18:56
Factors other than patient to patient infection dynamics can influence disease spread (such as public health programs). That doesn't stop the underlying model being useful. I don't think the exact date of Facebook's demise (which no doubt can be influenced) is as interesting as the idea/model that social networks spread like diseases. — david25272, Jan 24 at 0:38
@david25272 These kind of models are certainly useful, there is a whole literature in marketing concerning S-curves which uses similar approaches. For example I suspect that Bass model and its counterparts might fit the same data pretty well too. — mpiktas, Jan 24 at 5:14

G M · Answer 4 · 2014-01-24 18:09:21Z

Google Trend in my opinion can't produce a good data set for this case of study. Google trend shows how often a term is searched with Google so there are at least two reasons for raising some doubts about the prevision:

We don't know if the user searches on Google Facebook to log in or if he searches information about Facebook

Facebook is not only a site is a phenomenon, with many articles, books and a film about it and Facebook Inc. on May 18, 2012 began selling stock to the public and trading on the NASDAQ. Google Trend shows you both: the searches for the site and the searches for the "phenomenon". New things always have a great impact to the mass, TV had a great impact to the mass now no one write articles about it but is still one of the most used appliance.

Most users don't search "facebook" on Google to login

With mobile applications and Bookmarks a user with a decent knowledge of internet search "facebook" on Google only the first time then he usually saves the page as a bookmark or download the application. The graph below is the Google trend for Wikipedia, it seems that we will not use Wikipedia in the future. Obviously this is not true we simply don't access to wikipedia typing "wikipedia" we simply search and then use the wikipedia page or we use the bookmark to access to it.

Don't forget autocomplete on browser history in the address bar. I type the letter "f" into Chrome or Firefox and it autocompletes to facebook.com as the first suggestion. This feature has been active for several years now. — paul, yesterday
Most users don't search "facebook" on Google to login ... I bet a bounty of 50 that this is indeed the purpose of the majority of those searches. — Evgeni Sergeev, 21 hours ago
@EvgeniSergeev I bet with you too! your hypothesis does not contradict my statement, I think this is the reason of those searches but in fact is not the most used method to access to facebook (and this is what matter for the study) one simple fact is that last year Facebook mobile users surpassed desktop users — G M, 15 hours ago

Nick Stauner · Answer 5 · 2014-01-25 13:20:20Z

A few basic issues stand out with this paper:

It assumes correlation of search engine queries about a rising social network with the membership increases. This may have correlated in the past, but may not in the future.
There are very few new large social networks. You can almost count them on one hand. Friendster, Myspace, Facebook, Google+. Also, Stack Exchange, Tumblr, and Twitter function similarly to social networks. Is anyone predicting Twitter is over? Quite to the contrary, it seems to have major momentum. There is not much mention or study of other ones to see if they fit. In a way we are talking about, does a trend exist among 5-7 data points? (The number of social networks.) It's just too little data to make any conclusion about the future.
Facebook displaced Myspace. That was the chief dynamic. It doesn't consider the idea that one infection is displacing another, it tends to consider them separately. What is displacing Facebook? Google+? Twitter? The interaction and "defection" of customers from one "brand" or "product" to the other is the critical phenomenon in this area.
Social networks coexist. One can be a member of multiple sites. It is true that members may tend to prefer one over the other.
It would seem a much better model is that there is a consolidation going on, like in economics, such as with automobiles, radio makers, web sites, etc. As in any new disruptive technology, there are many competitors in the beginning, and then, later, the field narrows, they tend to consolidate, there are buyouts and mergers, and some die out in the competition. We already see examples of this, e.g. Yahoo buying out Tumblr recently.
A similar concept might be with television networks consolidating and being owned by large conglomerates, e.g. major media companies owning many media assets. Indeed, Myspace was bought out by News Corporation.
The way to go is to look for more analogies between economics and infections (biology). Companies acquiring customers from competitors and the uptake of products do indeed have many epidemiological parallels. There are strong parallels to evolutionary "red queen" races [see the book, Red Queen by Ridley]. There might be connections to a field called bionomics.
Another basic model is products that compete with each other and have various "barriers to entry" for customers to switch from one brand to another. It is true the cost of switching is very low in cyberspace. It's similar to brands of beers competing for customers, etc.
In an asymptotic model, it is much more likely that a network increases its members toward some asymptotic maximum and then it tends to plateau. Early in the plateau, it will not be apparent that it is a plateau.

That all said, I think it has some very valid and engaging ideas and is likely to spur much further research. It's groundbreaking, pioneering, and it just needs to be adjusted a bit in its claims. I am delighted in this use of Stack Exchange and collaborative wisdom/collective intelligence analyzing this paper. (Now if only reporters researching the subject would read this whole page carefully before preparing their simplistic sound bites.)

btw re terminology. "barriers to entry" is used to refer to companies wanting to release new products and compete in a new area, similar concept applies "on other side of transaction" to customers switching products but maybe there is a different term there? anyway the authors need to tie in their ideas withmarketing which is indeed using more "viral" models. also a key concept in this area [should have mentioned this above] is market share. — vzn, 2 days ago
ps maybe a much more relevant question which is supported by other recent research in this area: is facebook growth coming to an end. usage is down in teenager demographics for example (which is quite notable because its initial rise was due to teenagers). several recent studies/experts confirm this. therefore, looking at demographic group shifts is also key to understanding social network usage trends. also, facebook is attempting to expand internationally after "saturating" in US & there the barriers are things like fewer internet networks, cellphones/computers, etc... — vzn, yesterday

EngrStudent · Answer 6 · 2014-01-26 21:34:22Z

The question isn't "if" but "when".

That it will end is already guaranteed.http://www.ted.com/talks/geoffrey_west_the_surprising_math_of_cities_and_corporations.html

I take umbrage with the use of the SIR model. It comes with assumptions.

One of the assumptions is that eventually everyone is "recovered". Infections are not perpetual, while technology adoption can be (consider the automobile for example).

If the business is doomed to eventually die, then when going through death throes the relationships between susceptible, infected, and recovered might be adequately modeled by a particular SIR model. This does not mean the model is descriptive of any of the seasons before end-of-life. It does not take into account other forces - the context. Facebook was part of the context of end of "Myspace" and so while an SIR was appropriate for Myspace-only use, it was not for Social-Network use because many users had accounts on both, and switched to FB-dominant usage.

I dug through the zombie-model, and even through some non-zombie SIR fits, and a time and population punctuated-windowed SIR is more appropriate there. It is not a universal model, and it has strengths and weaknesses. That means that the SIR is imperfect even for the systems that it was engineered to model. Such fundamental imperfection for its target suggests that without careful use, application outside the target area can be, ceteris paribus, more problematic than other model.

edubriguenti · Answer 7 · 2014-01-24 18:56:09Z

If we take a look at the map of social networks, there are some cases that epidemic model applies.

http://vincos.it/world-map-of-social-networks/

The article could have some other examples (Friendster and Orkut are a good example of massive declination of its users) and also taking into account the fact that normally people migrate to other social network that offers better or new services.

Facebook inovates the way people comunicate. Comparing with Orkut, an user needed to enter another person profile to see their updates. On the other hand on facebook the feeds are now on his own timeline. That's a major change.

This model and logic may have worked for MySpace, but is it valid for any social network?

IMHO, people don't leave Social Network. They migrate, based on a better service, functionality or experience.

The question is: Will there be a better Social Network ? Maybe Google +.

This answer does not appear to address the questions, which are (1) a statistical one about possibly confusing correlation with causation and (2) whether a predictive model can be expected to apply universally. If I am misunderstanding, perhaps it is because it is not at all apparent what the referent of "this" is in the first sentence. — whuber♦, Jan 24 at 16:25
@whuber This answer says there is no correlation as long as people still need social networks. Unless there is a better alternative to Facebook (which the paper in the question does not take into account), then Facebook will be king. Statistically, the "Social network" need has only grown, and people have simply migrated from one social network to another. The use of social networks has only grown so far. — Tiberiu-Ionuț Stan, Jan 24 at 21:32
@Tiberiu-Ionuț Stan Your comment might be correct but it consists only of unsupported remarks about social networks; it does not seem to contain any statistical reasoning nor to throw any additional light on the question. In particular, I still cannot see any specific reference in this particular answer to correlation or causation. Remember, we're not here to debate the future of Facebook or the quality of social networks, but rather we have been asked to evaluate the statistical arguments in the paper in question. — whuber♦, Jan 24 at 22:05
@whuber I'm trying to evaluate the statistical arguments of the paper showing the reasons behind the results. The paper do not take account others OSNs and emerging trents, only numbers. I'm just adding information. IMO this is the same as technical and fundamental analysis in stock market (both are ok). I'm trying to explain the facts behind the change, not only numbers and graphs. — edubriguenti, Jan 24 at 22:21

Nebu · Answer 8 · 2014-01-25 15:48:38Z

up vote2down vote

To answer your question

This model and logic may have worked for MySpace, but is it valid for any social network?

Probably not. Historical data can only predict future events if the 'environment' is similar. This paper assumes that the total of Google users and queries is a constant, which of course it is not. Now this article may say more about Google than about Facebook.

However, based on the rapid rise and fall of many other social networks like MySpace and others I think one can safely assume that there is a big chance Facebook will no longer be the dominant social network in 5 years.

edited 2 days ago

answered Jan 24 at 16:25

Nebu
292

Predictions don't depend entirely on environmental similarity (depending on what you mean by "environment" of course). Nonetheless, your answer seems internally inconsistent. It is not safe to assume Facebook's future will resemble other social networks' courses on that basis alone, much less within such a short time frame. – Nick Stauner 2 days ago

My prediction on Facebooks future is an opinion for which i use one argument. My opinion is clearly not based on statistics or models. The prediction in the paper discussed here is based on statistics and models with historical reference. I do not see why my answer is internally inconsistent. – Nebu 2 days ago

Your opinion's one supporting argument sounds an awful lot like the logic behind the model that you criticize for that very same logic. If the (generational?) environment isn't similar enough now to when the model fit Myspace, why is it similar enough to base any opinion on Myspace's history? Furthermore, is Facebook really just another social network that will behave like every other? It's different enough for doubt in plenty of ways, as are the times, so again, I don't see how it's safe to assume its chance of a similar fate is big within such a short time frame. – Nick Stauner 2 days ago

Protiguous

Pages

Thursday, January 30, 2014

Gears API Blog: Hello HTML5

Hello HTML5

Tin Isles: Should you trust lastpass.com?

Using Amazon EC2 to mine Dogecoin

Monday, January 27, 2014

Is Facebook coming to an end? - Cross Validated

This question has an open bounty worth +300 reputation from LessFaceMoreBook endingin 4 days.

8 Answers