Whether it’s the annual ‘hype cycle’ or fog in the users’ mind, Big Data has bumped off cloud computing to top the charts. Werner Vogels, chief technology officer of Amazon and one of the Big Data generators and solution providers, was in Bangalore recently and spoke to Forbes India
Q. A lot of businesses use analytics and some of them confuse it with Big Data. What is the difference, according to you?
A: Big Data is different [from analytics] because in traditional business intelligence, you knew beforehand what kind of questions you want to ask, and to answer those questions what kind of data you needed, and then collect the data to answer those questions. In the world of Big Data, this is turned on its head. We don’t really know what exactly we need to ask. You just collect as much information as possible; you may combine that with other data streams. You don’t really have data model.
Why don’t I start looking at customers in different segments, geographies; or customers that only spend time researching for books versus customers who are looking at DVDs? The data model is created on the fly. We ask, for example, how many steps a customer takes before he buys something or if the customer comes in and types in ‘Sony’, does that mean he knows what he has come to buy or is he in the research stage. This kind of understanding of the customer may help you understand that if the customer is doing these two things, he is going to buy. So you make the process easy and seamless for him. If he takes these two actions, he is on a research path—he is comparing products and so make [the comparison] easier for him. What we have today and what we didn’t have in the past is the location information. Whether a customer is at home or if he is on the road talking about books with his friends or whether he is at a store looking at DVDs—his actions are different depending on where he is. So, understanding the customers’ behaviour is a big driver of Big Data.
Q. That is from a business point of view. What needs to be done on the technology front?
A: In reality, Big Data is not just analytics. It’s actually a whole pipeline of things. How do we collect the data, how do we get the data into this pipeline, where does this data come from: From our website, from the brick-and-mortar stores, from the devices? How do we get it to the place where we want to operate on it and how do we store it? We are talking about large amounts of information here. Does it go into a relational database, does it go into a key value store? All these different options, as a technologist, involve issues we have to think about. Then we get to the next step, how to organise the data—there are multiple data streams that come from different places, different devices, different applications. And the last piece of the pipeline is: How do we share this data? This whole pipeline requires a lot of innovation.
Q. Which segment of this pipeline has most challenges?
A: I think it’s the analytics; we are still looking for more tools. The more technology savvy companies have figured out themselves but that cannot be said of a majority of enterprises. We see a whole new set of tools coming up this year and next. We will see tremendous innovation happening there.
We will see more platforms targeted at a particular segment. A good one in our case is a platform for life sciences companies to do Big Data. All these life sciences companies use more or less the same style of processing. A company like Unilever makes use of platforms to develop toothpastes and deodorants. You will think deodorants are just research into what smells nice but it turns out that that’s not the case. There is a lot of genomics research that is going on to understand the microbes and how they interact with particular genes and how they trigger sweat production. Things like these are actually Big Data genomics problem.
Again, in gum decay, it turns out there are genetics that play a role in the interaction between the toothpaste and microbes. Making toothpaste is genomics problem these days. In the past, companies like Unilever used to take two to three years. They could leisurely develop new products. But the new economy is so competitive that two to three years is no longer a possibility; so they are looking to speed up every possible part of product development.
Q. Tell us how are automotive companies, especially in India, looking at Big Data?
A: Let’s take Tata Motors. They are instrumenting all their trucks with GPS, with sensors, with telecom equipment to stream the data packs to Amazon cloud for analytics to understand not only where the trucks are but also the health of the trucks. Should we be doing preventive maintenance? If we can avoid truck breakdowns, the cost the company saves is tremendous. And so here we have thousands of trucks and that’s a Big Data problem.
Q. Do businesses understand what they can do with Big Data?