myths about “big data”

If I could borrow the neuralyzer for just one day, I would erase the phrase “Big Data” from the memories of all people.

Since you are reading this, I assume that you’ve definitely heard of what Big Data is and if you are like most, you have more questions than answers. People selling Big Data solutions make a lot of (false) promises and build a lot of hype around it, so this is an attempt to shed some light on the reality.

Myth #1: Big Data is an actual thing, a tool, a piece of software that you can install or do something with it.

Myth buster #1:  It is not a tool, or a platform or an API or a database. Big Data is actually a kind of a problem that specifically affects people in the data world (database admins, data warehouse admins, data governance people, IT infrastructure folks). When they have a situation where the data that their organisation is generating (or consuming) is bigger (or coming in faster) than they can efficiently and cost effectively handle, then they have a “Big Data” problem on their hands. That’s all there is to it. Nothing more.

Myth #2: If you setup a Hadoop (Big Data) platform in your organisation, it will automatically spit out insights that will transform your business.

Myth buster #2: Hadoop or any other so-called Big Data platforms are software tools that solve a specific class of problems. i.e., they make it easy for you to store and manipulate data on multiple smaller servers instead of just one massive server (like traditional database servers). What this gives you (as a person in the data world) is that you can scale horizontally across multiple servers as your organisation grows. They don’t have any inbuilt capabilities to spit out business insights if you throw more data at them. That’s all there is to it. Nothing more.

Myth #3: Everyone has a Big Data problem and everyone at the CXO level should be worried about it.

Myth buster #3: As I said in myth buster #1, the Big Data situation is a problem in the data folks’ realm. It should never be a concern at the CXO level. Unfortunately, the hype starts at the top of the organisation and is forced down the hierarchy.

Myth #4: A Big Data platform is a cheap alternative to traditional data warehouse solutions.

Myth buster #4: In the slide decks of people who sell these platforms, you will most definitely see a slide that shows that Hadoop (being open source) is so much cheaper compared to proprietary platforms. But if you add up all the infrastructure costs, cloud hosting or on-premise costs, support license costs, hiring external consultants, up-skilling your existing team, development costs, etc., the difference could be insignificant.

Myth #5: A Big Data platform (aka, Data Lake) is where you dump all your data and worry about it later.

Myth buster #5: Only insane people will even think of doing this. Why would anyone spend thousands of dollars creating a brand new data repository, only to aimlessly dump today’s data so that it might be useful tomorrow? Data is the most precious commodity that an organisation has. And it has to be treated with utmost care and upfront planning. It is very common to hear statements like “just dump everything you have in a data lake, because its cheap, and you can worry about how to use the data later”. I implore you to please don’t fall for this theory. Always, always start with a plan about what you want to do with your data upfront.

Myth #6: Big Data is a new problem

Myth buster #6: The problem of insufficient storage and processing capacity for the amount of data you have (or want to have in the future) is not a new problem. Organisations’ data needs have always been playing catch-up with the available capacity ever since computers were invented. Technologies like Hadoop make it possible to scale horizontally, assuming you are willing to spend time, effort and sweat.

Myth #7: Big Data platform will make the Data Analysts’ life easy

Myth buster #7: This is probably the worst myth I always hear. There are two distinct kinds of people in an organisation…

1.  Folks who are responsible for collecting, storing and managing data. (These are the people I talk about in Myth #1 above).

2. The ones that use the data and do something with it. I’ll put these people in the broad bucket called Data Analysts (aka, Data Scientists). These are the ones that take data from one or more sources, do some data wrangling, munging and massaging in order to dig out answers to business questions about “how to save money” or “how to make more money”.

The number of problems that the Data Analysts solve, that need huge amounts of data is so minuscule, that they hardly ever think in terms of big data. The (statistical and machine learning) algorithms that they use often don’t need huge data.

If you understand these myths, you might then ask, “There seems to be so much buzz around this, so there has to be something here that we can use. What is it?”

Yes, there is something!!

For an organisation, data is often the most under utilised asset. If you really want to see how data can be useful in your organisation, you have to embrace a new/different paradigm. It is the paradigm of becoming a “Data Driven Organisation”. Big data is just a small challenge, once you start thinking about how data can be the new oil that runs your organisation.

I’ll write in a future post about my thoughts on how to become a Data Driven Organisation.

Hope this is useful !!


Image credit: http://load-tv.com/12-awesome-movie-technologies-we-want-science-to-actually-create

Leave a Reply

Your email address will not be published. Required fields are marked *