Monday, January 27, 2014

Is Facebook coming to an end? - Cross Validated

Recently, this paper has received a lot of attention (e.g. from WSJ). Basically, the authors conclude that Facebook will lose 80% of its members by 2017.
They base their claims on an extrapolation of the SIR model, a compartmental model frequently used in epidemiology. Their data is drawn from Google searches for "Facebook", and the authors use the demise of Myspace to validate their conclusion.
Question:
Are the authors making a "correlation does not imply causation" mistake? This model and logic may have worked for Myspace, but is it valid for any social network?
In keeping with the scientific principle "correlation equals causation," our research unequivocally demonstrated that Princeton may be in danger of disappearing entirely.
We don’t really think Princeton or the world’s air supply is going anywhere soon. We love Princeton (and air),” and adding a final reminder that “not all research is created equal – and some methods of analysis lead to pretty crazy conclusions.
shareimprove this question
16 
Well, the number of Facebook searches may spike upwards based on this article. ;) –  RobertF Jan 23 at 17:25
7 
9 
@Glen Mr. Develin appears to have thoroughly missed the point of the study. Firstly, it is not simply forecasting a trend in searches, but using them to validate and calibrate a model from the well-known SIR family, which is thought to be a good descriptor of fad adoption and abandonment. Second, his "clever" counterexamples fail because unlike Facebook, neither Princeton nor air are used primarily online. He chants the correlation-causation chant, but the correlation is over MySpace to Facebook, not over Facebook's historical data. Also, there is a conflict of interest. –  Superbest Jan 24 at 0:00 
4 
The analysis is tongue-in-cheek. The point of extrapolation as if nothing changes is valid, as the two answers have described. –  Glen Jan 24 at 0:35 
5 
This doesn't answer the question but is merely a bunch of personal opinions, totally unrelated to statistics. –  ziggystar Jan 24 at 8:58
show 16 more comments

This question has an open bounty worth +300 reputation from LessFaceMoreBook endingin 4 days.

The question is widely applicable to a large audience. A detailed canonical answer is required to address all the concerns.
Thank you for the answers and comments.
The answers so far have focused on the data itself, which makes sense with the site this is on, and the flaws about it.
But I'm a computational/mathematical epidemiologist by inclination, so I'm also going to talk about the model itself for a little bit, because it's also relevant to the discussion.
In my mind, the biggest problem with the paper is not the Google data. Mathematical models in epidemiology handle messy data all the time, and to my mind the problems with it could be addressed with a fairly straightforward sensitivity analysis.
The biggest problem, to me, is that the researchers have "doomed themselves to success" — something that should always be avoided in research. They do this in the model they decided to fit to the data: a standard SIR model.
Briefly, a SIR model (which stands for susceptible (S) infectious (I) recovered (R)) is a series of differential equations that track the health states of a population as it experiences an infectious disease. Infected individuals interact with susceptible individuals and infect them, and then in time move on to the recovered category.
This produces a curve that looks like this:
Enter image description here
Beautiful, is it not? And yes, this one is for a zombie epidemic. Long story.
In this case, the red line is what's being modeled as "Facebook users". The problem is this:
In the basic SIR model, the I class will eventually, and inevitably, asymptotically approach zero.
It must happen. It doesn't matter if you're modeling zombies, measles, Facebook, or Stack Exchange, etc. If you model it with a SIR model, the inevitable conclusion is that the population in the infectious (I) class drops to approximately zero.
There are extremely straightforward extensions to the SIR model that make this not true — either you can have people in the recovered (R) class come back to susceptible (S) (essentially, this would be people who left Facebook changing from "I'm never going back" to "I might go back someday"), or you can have new people come into the population (this would be little Timmy and Claire getting their first computers).
Unfortunately, the authors didn't fit those models. This is, incidentally, a widespread problem in mathematical modeling. A statistical model is an attempt to describe the patterns of variables and their interactions within the data. A mathematical model is an assertion about reality. You can get a SIR model to fit lots of things, but your choice of a SIR model is also an assertion about the system. Namely, that once it peaks, it's heading to zero.
Incidentally, Internet companies do use user-retention models that look a heck of a lot like epidemic models, but they're also considerably more complex.
shareimprove this answer
3 
Yes, I missed other models too. I am not aware of epidemiology models, but I am aware about S-curve models used in marketing. There was one review article (Meade, Islam, Technological Forecasting - Model Selection, Model Stability and Combining Models, Management Science, 1998, Vol 44, No 8.) which listed like 30 different models. Most of these models have similar reasoning, instead of susceptible, infectious and recovered they use the terms early adopter and immitator (or similar). The model is then the solution to some differential equation. –  mpiktas Jan 24 at 18:59
1 
You hardly need to justify talking about the statistical model here on Cross Validated (CV)...Are you suggesting that not talking about the model is a flaw of CV itself? Either way, clarification would help if you want to promote awareness, or to critique constructively at all in that regard, really. Alternately, if it's a tangent not worth clarifying, how is it worth mentioning at all? As for the (inadvertent?) suggestion that Facebook users are zombies...I have no objections. (Even though I am one! :) –  Nick Stauner 2 days ago
4 
zombies are awesome! ... until they bite you :P –  Joe DF 2 days ago
8 
(+1) This was my primary gripe with their article. They assumed a model that necessarily predicts a crash and then validated the model by cherry-picking a single site that exhibited the behavior they were predicting (MySpace). The meaningful reps for this sort of model is the number of comparable sites, and they tested it on one. –  guy 2 days ago 
7 
@NickStauner No, it was merely an observation that most of the critiques here (and indeed, in the rest of the internets) were on the data itself. Which means sense because the data itself is something most users here could easily criticize, while the actual details of the model aren't something I'd expect "Average statistician/Machine learning expert" to necessarily have encountered. –  Fomite 2 days ago
show 3 more comments
My primary concern with this paper is that it focuses primarily on Google search results. It is a well-established fact that smartphone use is on the rise (Pew InternetBrandwatch), and traditional computer sales are declining (possibly just due to old computers still functioning) (Slate,ExtremeTech), as more people use smartphones to access the internet. Considering there is a native Facebook app for (at least) iOS, Android, Blackberry, and Windows Phone, it's no surprise that the number of Google queries for "facebook" has fallen significantly. If users no longer need to open a browser and mistype "facebook.com" in the URL bar, then that would definitely negatively impact the number of searches. In fact, the number of FB users who use the app has gone up significantly (TechCrunchForbes).
I think this study is just some "huh, interesting correlation" that got taken too far by alarmist media outlets; "Did you know the world is changing? How unexpected!"
shareimprove this answer
16 
Welcome to the site –  steffen Jan 23 at 21:05 
1 
Thank you for the edits. You cleaned it up good. :) –  Adrian Jan 23 at 21:57
3 
Very well put, like you said smart phone use is on the rise and facebook gets tremendous amounts of monthly visits from cellphones/smartphones. Just because people are not searching for it does not mean it will cause a decrease in facebook usage, the way people are using facebook is changing/changed. They are no longer searching for it, they are just clicking the icon on their phone and going to it. –  MCP_infiltrator Jan 24 at 13:27
1 
I was just about to answer the same about smartphone and Google searches –  syed mohsin Jan 25 at 0:41
 
"Considering there is a native Facebook app for (at least) iOS, Android, Blackberry, and Windows Phone, it's no surprise that the number of Google queries for "facebook" has fallen significantly" ... search is on smart phones as well, and there's this thing called a bookmark that is implemented on every browser. – Jeffrey Blattman yesterday
show 2 more comments
Well, this paper establishes the fact that the number of Google searches on Facebook fits a certain curve nicely. So at best it can predict that the searches on Facebook will decline by 80%. Which might be feasible, because Facebook might become so ubiquitous that nobody would need to search about it.
The problem with such type of models is that they assume that no other factors can influence the dynamics of the observed variable. This assumption is hard to justify when dealing with data related to people. For example, this model assumes that Facebook cannot do anything to counter the loss of its users, which is a very questionable assumption to make.
shareimprove this answer
3 
And mpiktas's first paragraph touches on a good point as well - the authors are using Google search queries as a proxy for the number of Facebook accounts. Why not go straight to the account data? It's not difficult to find: news.yahoo.com/number-active-users-facebook-over-230449748.html –  RobertF Jan 23 at 18:55
 
Although to be fair, graphing the data from the above article does show the number of active users was close to peaking in 2013. –  RobertF Jan 23 at 18:56 
3 
Factors other than patient to patient infection dynamics can influence disease spread (such as public health programs). That doesn't stop the underlying model being useful. I don't think the exact date of Facebook's demise (which no doubt can be influenced) is as interesting as the idea/model that social networks spread like diseases. –  david25272 Jan 24 at 0:38
2 
@david25272 These kind of models are certainly useful, there is a whole literature in marketing concerning S-curves which uses similar approaches. For example I suspect that Bass model and its counterparts might fit the same data pretty well too. –  mpiktas Jan 24 at 5:14
add comment
Google Trend in my opinion can't produce a good data set for this case of study. Google trend shows how often a term is searched with Google so there are at least two reasons for raising some doubts about the prevision:
  • We don't know if the user searches on Google Facebook to log in or if he searches information about Facebook
Facebook is not only a site is a phenomenon, with many articles, books and a film about it and Facebook Inc. on May 18, 2012 began selling stock to the public and trading on the NASDAQ. Google Trend shows you both: the searches for the site and the searches for the "phenomenon". New things always have a great impact to the mass, TV had a great impact to the mass now no one write articles about it but is still one of the most used appliance.
  • Most users don't search "facebook" on Google to login
With mobile applications and Bookmarks a user with a decent knowledge of internet search "facebook" on Google only the first time then he usually saves the page as a bookmark or download the application. The graph below is the Google trend for Wikipedia, it seems that we will not use Wikipedia in the future. Obviously this is not true we simply don't access to wikipedia typing "wikipedia" we simply search and then use the wikipedia page or we use the bookmark to access to it.
enter image description here
shareimprove this answer
2 
Don't forget autocomplete on browser history in the address bar. I type the letter "f" into Chrome or Firefox and it autocompletes to facebook.com as the first suggestion. This feature has been active for several years now. –  paul yesterday
1 
Most users don't search "facebook" on Google to login ... I bet a bounty of 50 that this is indeed the purpose of the majority of those searches. –  Evgeni Sergeev 21 hours ago
 
@EvgeniSergeev I bet with you too! your hypothesis does not contradict my statement, I think this is the reason of those searches but in fact is not the most used method to access to facebook (and this is what matter for the study) one simple fact is that last year Facebook mobile users surpassed desktop users – G M 15 hours ago
add comment
A few basic issues stand out with this paper:
  • It assumes correlation of search engine queries about a rising social network with the membership increases. This may have correlated in the past, but may not in the future.
  • There are very few new large social networks. You can almost count them on one hand. Friendster, Myspace, Facebook, Google+. Also, Stack Exchange, Tumblr, and Twitter function similarly to social networks. Is anyone predicting Twitter is over? Quite to the contrary, it seems to have major momentum. There is not much mention or study of other ones to see if they fit. In a way we are talking about, does a trend exist among 5-7 data points? (The number of social networks.) It's just too little data to make any conclusion about the future.
  • Facebook displaced Myspace. That was the chief dynamic. It doesn't consider the idea that one infection is displacing another, it tends to consider them separately. What is displacing Facebook? Google+? Twitter? The interaction and "defection" of customers from one "brand" or "product" to the other is the critical phenomenon in this area.
  • Social networks coexist. One can be a member of multiple sites. It is true that members may tend to prefer one over the other.
  • It would seem a much better model is that there is a consolidation going on, like in economics, such as with automobiles, radio makers, web sites, etc. As in any new disruptive technology, there are many competitors in the beginning, and then, later, the field narrows, they tend to consolidate, there are buyouts and mergers, and some die out in the competition. We already see examples of this, e.g. Yahoo buying out Tumblr recently.
  • A similar concept might be with television networks consolidating and being owned by large conglomerates, e.g. major media companies owning many media assets. Indeed, Myspace was bought out by News Corporation.
  • The way to go is to look for more analogies between economics and infections (biology). Companies acquiring customers from competitors and the uptake of products do indeed have many epidemiological parallels. There are strong parallels to evolutionary "red queen" races [see the book, Red Queen by Ridley]. There might be connections to a field called bionomics.
  • Another basic model is products that compete with each other and have various "barriers to entry" for customers to switch from one brand to another. It is true the cost of switching is very low in cyberspace. It's similar to brands of beers competing for customers, etc.
  • In an asymptotic model, it is much more likely that a network increases its members toward some asymptotic maximum and then it tends to plateau. Early in the plateau, it will not be apparent that it is a plateau.
That all said, I think it has some very valid and engaging ideas and is likely to spur much further research. It's groundbreaking, pioneering, and it just needs to be adjusted a bit in its claims. I am delighted in this use of Stack Exchange and collaborative wisdom/collective intelligence analyzing this paper. (Now if only reporters researching the subject would read this whole page carefully before preparing their simplistic sound bites.)
shareimprove this answer
2 
btw re terminology. "barriers to entry" is used to refer to companies wanting to release new products and compete in a new area, similar concept applies "on other side of transaction" to customers switching products but maybe there is a different term there? anyway the authors need to tie in their ideas withmarketing which is indeed using more "viral" models. also a key concept in this area [should have mentioned this above] is market share. –  vzn 2 days ago 
1 
ps maybe a much more relevant question which is supported by other recent research in this area: is facebook growth coming to an end. usage is down in teenager demographics for example (which is quite notable because its initial rise was due to teenagers). several recent studies/experts confirm this. therefore, looking at demographic group shifts is also key to understanding social network usage trends. also, facebook is attempting to expand internationally after "saturating" in US & there the barriers are things like fewer internet networks, cellphones/computers, etc... –  vzn yesterday 
add comment
The question isn't "if" but "when".
I take umbrage with the use of the SIR model. It comes with assumptions.
One of the assumptions is that eventually everyone is "recovered". Infections are not perpetual, while technology adoption can be (consider the automobile for example).
If the business is doomed to eventually die, then when going through death throes the relationships between susceptible, infected, and recovered might be adequately modeled by a particular SIR model. This does not mean the model is descriptive of any of the seasons before end-of-life. It does not take into account other forces - the context. Facebook was part of the context of end of "Myspace" and so while an SIR was appropriate for Myspace-only use, it was not for Social-Network use because many users had accounts on both, and switched to FB-dominant usage.
I dug through the zombie-model, and even through some non-zombie SIR fits, and a time and population punctuated-windowed SIR is more appropriate there. It is not a universal model, and it has strengths and weaknesses. That means that the SIR is imperfect even for the systems that it was engineered to model. Such fundamental imperfection for its target suggests that without careful use, application outside the target area can be, ceteris paribus, more problematic than other model.
shareimprove this answer
add comment
If we take a look at the map of social networks, there are some cases that epidemic model applies.
The article could have some other examples (Friendster and Orkut are a good example of massive declination of its users) and also taking into account the fact that normally people migrate to other social network that offers better or new services.
Facebook inovates the way people comunicate. Comparing with Orkut, an user needed to enter another person profile to see their updates. On the other hand on facebook the feeds are now on his own timeline. That's a major change.
This model and logic may have worked for MySpace, but is it valid for any social network?
IMHO, people don't leave Social Network. They migrate, based on a better service, functionality or experience.
The question is: Will there be a better Social Network ? Maybe Google +.
shareimprove this answer
2 
This answer does not appear to address the questions, which are (1) a statistical one about possibly confusing correlation with causation and (2) whether a predictive model can be expected to apply universally. If I am misunderstanding, perhaps it is because it is not at all apparent what the referent of "this" is in the first sentence. –  whuber Jan 24 at 16:25
2 
@whuber This answer says there is no correlation as long as people still need social networks. Unless there is a better alternative to Facebook (which the paper in the question does not take into account), then Facebook will be king. Statistically, the "Social network" need has only grown, and people have simply migrated from one social network to another. The use of social networks has only grown so far. – Tiberiu-IonuÈ› Stan Jan 24 at 21:32 
2 
@Tiberiu-IonuÈ› Stan Your comment might be correct but it consists only of unsupported remarks about social networks; it does not seem to contain any statistical reasoning nor to throw any additional light on the question. In particular, I still cannot see any specific reference in this particular answer to correlation or causation. Remember, we're not here to debate the future of Facebook or the quality of social networks, but rather we have been asked to evaluate the statistical arguments in the paper in question. –  whuber Jan 24 at 22:05
 
@whuber I'm trying to evaluate the statistical arguments of the paper showing the reasons behind the results. The paper do not take account others OSNs and emerging trents, only numbers. I'm just adding information. IMO this is the same as technical and fundamental analysis in stock market (both are ok). I'm trying to explain the facts behind the change, not only numbers and graphs. –  edubriguenti Jan 24 at 22:21 
add comment
To answer your question
This model and logic may have worked for MySpace, but is it valid for any social network?
Probably not. Historical data can only predict future events if the 'environment' is similar. This paper assumes that the total of Google users and queries is a constant, which of course it is not. Now this article may say more about Google than about Facebook.
However, based on the rapid rise and fall of many other social networks like MySpace and others I think one can safely assume that there is a big chance Facebook will no longer be the dominant social network in 5 years.
shareimprove this answer
 
Predictions don't depend entirely on environmental similarity (depending on what you mean by "environment" of course). Nonetheless, your answer seems internally inconsistent. It is not safe to assume Facebook's future will resemble other social networks' courses on that basis alone, much less within such a short time frame. – Nick Stauner 2 days ago
 
My prediction on Facebooks future is an opinion for which i use one argument. My opinion is clearly not based on statistics or models. The prediction in the paper discussed here is based on statistics and models with historical reference. I do not see why my answer is internally inconsistent. –  Nebu 2 days ago
 
Your opinion's one supporting argument sounds an awful lot like the logic behind the model that you criticize for that very same logic. If the (generational?) environment isn't similar enough now to when the model fit Myspace, why is it similar enough to base any opinion on Myspace's history? Furthermore, is Facebook really just another social network that will behave like every other? It's different enough for doubt in plenty of ways, as are the times, so again, I don't see how it's safe to assume its chance of a similar fate is big within such a short time frame. –  Nick Stauner 2 days ago 

No comments :