Being a Big Data Realist

We live in the second machine age, according to Erik Brynjolfsson, the director of the MIT Initiative on the Digital Economy. What the steam engine did in replacing muscle power, computers are doing for replacing mental power.

Part of this is made possible by the vast amounts of data being generated –2.5 quintillion bytes every day according to various sources. Per IBM industry insights, 90% of the data that exists today was created in the last two years. This incredible growth is analogous to waking up in two years and discovering another ocean –an entire new world of potential value and opportunity.

Enter big data, or the process through which this ocean is made valuable. Big data depends on mining massive amounts of information for trends and insights that lead to better decision making and greater efficiency. And while this new abundance is attracting $130 billion a year in eager investment, there are icebergs to avoid and rip currents to navigate. As optimists we see the immediate potential, but we should not be blind to lurking dangers.

Not all data is equally valuable, and no data is valuable without analysis – the human way.

Big data is valuable, in part, due to its sheer volume that allows for granular trends to be identified within a firehose of information. Volume in and of itself is a strong (if erroneous) indicator of credibility – how can you disagree with 11,232 tweets or 1,782 reviews? The evidence seems overwhelming.

One of the key challenges anyone faces when trying to create value from big data is verifying quality –and not just overall, but with regards to the specific question you are trying to answer. Here are some important questions to ask of your data:

  • Is this data derived from a representative population, or does it suffer from selection bias? For example, people who take the time to review a restaurant or hotel tend to have strong feelings – either positive or negative – and can skew any results towards the extreme. Extremists of every stripe are much louder in digital form.
  • Is this data authentically human? Seemingly obvious, but unfortunately no longer certain. Twitter confirmed that over 50,000 Russian bots reached nearly 150 million users during the 2016 Presidential election. While their impact should not be discounted, does this make Twitter an accurate reflection of public sentiment?
  • What are my assumptions in collecting this data to answer my question? Every use of big data relies on hypotheses. For instance, the city of New York is trying to understand patterns of homeless squatting through the use of cellphone use-location data. How many homeless have cellphones?

There are many icebergs lurking in this data ocean, ready to capsize any research findings. If the data your insights rest on is subject to manipulation (human or otherwise) and faulty assumption, then they will at the least be worthless and at the most dangerous, lead to false findings and wrong decisions. Never has it been so easy to be so wrong.

The quality and value of the data you have will always provide the limits to the types of questions you can answer, but quality data alone provides no value. Like electrons, they are only valuable when harnessed and used to power something. Data powers decision making, but it needs to be analyzed and turned into actionable insights.

The rewards for having the capacity to do so are real. A 400-company survey conducted by Bain & Company in 2013 found that only 4% of companies were strong at big data analytics: “these are the companies that are already using analytics insights to change the way they operate or to improve their products and services”. These same companies were twice as likely to be in the top 25% of financial performance in their industries, three times more likely to execute “decisions as intended,” and five times more likely to make these decisions faster.

Building an analytical capability is crucial to leveraging the potential of big data. Without it, you can be pulled about by strong currents within the data, instead of using skepticism and empiricism to guide you. The more we as a society focus on leveraging the abundance of information around us, the greater the need for these traits.

At Handshake, we are pioneering new methods of leveraging data to drive actionable insights and finding ways to model intangibles like influence and reputation. With humility, we know we’ll only be as strong as our data and as insightful as our own expertise. We depend on both to provide value to our clients.

As Erik Brynjolfsson tells it, “Pablo Picasso once berated computers saying, well, they’re not very interesting. All they do is provide answers. And, you know, he had a point, that the really interesting and important part of work is asking the right questions”. Asking the right questions and using the right data to answer them is still an essential human task -one which we’ll sometimes fail at and hopefully learn from as we continue exploring this new ocean.