Showing posts with label HP Vertica. Show all posts
Showing posts with label HP Vertica. Show all posts

Wednesday, August 05, 2015

How Localytics Uses Big Data to Improve Mobile App Development and Marketing

Transcript of a BriefingsDirect discussion on how big data helps an analytics company improve data-driven marketing on a variety of platforms.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big data case study interview highlights how Localytics uses data and associated analytics to help providers of mobile applications improve their applications -- and also allow them to better understand the uses for their apps and dynamic customer demands.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
To learn more about how big data helps mobile application developers better their products and services, please join me in welcoming our guest, Andrew Rollins, Founder and Chief Software Architect at Localytics, based in Boston. Welcome, Andrew.

Andrew Rollins: Thank you for having me.

Gardner: Tell us about your organization. You founded it to do what?

Rollins: We founded in 2008, two other guys and I. We set out initially to make mobile apps. If you remember back in 2008, this is when the iPhone App Store launched. So there was a lot of excitement around mobile apps at that time.

Rollins
We initially started looking at different concepts for apps, but then, over a period of a couple months, discovered that there really weren't a whole lot of services out there for mobile apps. It was basically a very bare ecosystem, kind of like the Wild, Wild West. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

We ended up focusing on whether there was a services play in this industry and we settled on analytics, which we then called Localytics. The analogy we like to use is, at the time it was a little bit of a gold rush, and we want to sell the pickaxes. So that’s what we did.

Gardner: That makes a great deal of sense, and it has certainly turned into a gold rush. For those folks who do the mining, creating applications, what is it that they need to know?

Analytics and marketing

Rollins: That’s a good question. Here's a little back story on what we do. We do analytics, but we also do marketing. We're a full-service solution, where you can measure how your application is performing out in the wild. You can see what your users are doing. You can do anything from funnel analysis to engagement analysis, things like that.

From there, we also transition into the marketing side of things, where you can manage your push notifications, your in/out messaging.

For people who are making mobile apps, often they want to look at key metrics and then how to drive those metrics. That means a lot of A/B testing, funnel analysis, and engagement analysis.

It means not only analyzing these things, but making meaningful interactions, reaching out to customers via push notifications, getting them back in the app when they are not using the app, identifying points of drop-off, and messaging them at the right time to get them back in.

An example would be an e-commerce app. You've abandoned the shopping cart. Let’s get you back in the application via some sort of messaging. Doing all of that, measuring the return on investment (ROI) on that, measuring your acquisition channels, measuring what your users are doing, and creating that feedback loop is what we advocate mobile app developers do.

Gardner: You're able to do data-driven marketing in a way that may not have been very accessible before, because everything that’s done with the app is digital and measurable. There are logs, servers -- and so somewhere there's going to be a trail. It’s not so much marketing as it is science. We've always thought of marketing as perhaps an art and less of a science. How do you see this changing the very nature of marketing?

Everything ultimately that you are doing really does need to be data-driven. It's very hard to work off just intuition alone.
Rollins: Everything ultimately that you are doing really does need to be data-driven. It's very hard to work off of just intuition alone. So that's the art and science. You come out with your initial hypothesis, and that’s a little bit more on the craft or art side, where you're using your intuition to guide you on where to start.

From there, you have to use the data to iterate. I'm going to try this, this, and this, and then see which works out. That would be like a typical multivariate kind of testing.

Determine what works out of all these concepts that you're trying, and then you iterate on that. That's where measuring anything you do, any kind of interaction you have with your user, and then using that as feedback to then inform the next interaction is what you have to be doing.

Gardner: And this is also a bit revolutionary when it comes to software development. It wasn't that long ago that the waterfall approach to development might leave years between iterations. Now, we're thinking about constantly updating, iterating, getting a feedback loop, and condensing the latency of that feedback loop so that we really can react as close to real-time as possible.

What is it about mobile apps that's allowed for a whole different approach to this notion of connectedness and feedback loops to an app audience?

Mobile apps are different

Rollins: This brings up a good point. A lot of people ask why we have a mobile app analytics company. Why did we do that? Why is typical web analytics not good enough? It kind of speaks to something that you're talking about. Mobile apps are a little bit different than the regular web, in the sense that you do have a cycle that you can push apps out on.

You release to, let’s say, the iPhone App Store. It might take a couple of weeks before your app goes out there. So you have to be really careful about what you're publishing, because your turnaround time is not that of the web. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

However, there are certain interactions you can have, like on the messaging side, where you have an ability to instantly go back and forth. Mobile apps are a different kind of market. It requires a little different understanding than the traditional approach.

... We consume the data in a real-time pipeline. We're not doing background batch processing that you might see in something like Hadoop. We're doing a lot of real-time pipeline stuff, such that you can see results within a minute or two of it being uploaded from a device. That's largely where HP Vertica comes in, and why we ended up using Vertica, because of its real-time nature. It’s about the scale.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: If I understand correctly, you have access to the data from all these devices, you are crunching that, and you're offering reports and services back to your customers. Do they look to you as also a platform provider or just a data-service provider? How do the actual hosting and support services for these marketing capabilities come about?

Rollins: We tend to cater more toward the high end. A lot of our customers are large app publishers that have an ongoing application, let’s say a shopping application or news application.

In that sense, when we bring people on board, oftentimes they tend to be larger companies that aren’t necessarily technically savvy yet about mobile, because it's still new for some people. We do offer a lot of onboarding services to make sure they integrate their application correctly, measure it correctly, and are looking at the right metrics for their industry, as compared to other apps in that industry.

Then, we keep that relationship open as they go along and as they see data. We iterate on that with them. Because of the newness of the industry it does require education.

Gardner: And where is HP Vertica running for you? Do you run it on your own data center? Are you using cloud? Is there a hybrid? Do you have some other model?

Running in the cloud

Rollins: We run it in the cloud. We are running on Amazon Web Services (AWS). We've thought a lot about whether we should run it in a separate data center, so that we can dictate the hardware, but presently we are running it in AWS.

Gardner: Let’s talk about what you can do when you do this correctly. Because you have a capacity to handle scale, you've developed speed, and you understand the requirements in the market, what are your customers getting from the ability to do all this?

Rollins: It really depends on the customer. Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.

Another application, like news, which I mentioned, will look at something different, usually something more along the lines of engagement. How long are they reading an article for? That matters to them, so that they can give those numbers to advertisers.

So the answer to that largely depends on who you are and what your app is. Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.
Something like an e-commerce app is going to look heavily at things like where users are dropping off and what's preventing them from making that purchase.

Gardner: I suppose another benefit of developing these insights, as specific and germane as they might be to each client, is the ability to draw different types of data in. Clearly, there's the data from the App Store and from the app itself, but if we could join that data with some other external datasets, we might be able to determine something more about why they drop-off or why they are spending more, or time doing certain things.

So is there an opportunity, and do you have any examples of where you've been able to go after more datasets and then be able to scale to that?

Rollins: This is something that's come up a lot recently. In the past year, we have our own products that we're launching in this space, but the idea of integrating different data types is really big right now.

You have all these different silos -- mobile, web, and even your internal server infrastructure. If you're a retail company that has a mobile app, you might even have physical stores. So you're trying to get all this data in some collective view of your customer.

You want to know that Sally came to your store and purchased a particular kind of item. Then, you want to be able to know that in your mobile app. Maybe you have a loyalty card that you can tie across the media and then use that to engage with her meaningfully about stuff that might interest her in the mobile app as well.

"We noticed that you bought this a month ago. Maybe you need another one. Here is a coupon for it."

Other datasets

That's a big thing, and we're looking at a lot of different ways of doing that by bringing in other datasets that might not be from just a mobile app itself.

We're not even focused on mobile apps any more. We're really just an app analytics company, and that means the web and desktop. We ship in Windows, for example. We deal with a lot of Microsoft applications. Tying together all of that stuff is kind of the future. [Register for the upcoming HP Big Data Conference in Boston on Aug. 10-13.]

Gardner: For those organizations that are embarking on more of a data-driven business model, that are looking for analytics and platforms and requirements, is there anything that you could offer in hindsight having traveled this path and worked with HP Vertica. What should they keep in mind when they're looking to move into a capability, maybe it's on-prem, maybe it's cloud. What advice could you offer them?

At scale, you have to know what each technology is good at, and how you bring together multiple technologies to accomplish what you want.
Rollins: The journey that we went through was with various platforms. At the end of day, be aware of what the vendor of the big-data platform is pitching, versus the reality of it.

A lot of times, prototyping is very easy, but actually going to large scale is fairly difficult. At scale, you have to know what each technology is good at, and how you bring together multiple technologies to accomplish what you want.

That means a lot of prototyping, a lot of stress testing and benchmarking. You really don’t know until you try it with a lot of these things. There are a lot of promises, but the reality might be different.

Gardner: Any thoughts about Vertica’s track record, given your length of experience?

Rollins: They're really good. I'm both impressed with the speed of it as compared to other things we have looked at, as well as the features that they release. Vertica 7 has a bunch of great stuff in it. Vertica 6, when it came out, had a bunch of great stuff in it. I'm pretty happy with it.
Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition
Gardner: I'm afraid we will have to leave it there. We've been learning about how Localytics uses big data to improve data-driven marketing for a variety of mobile application creators and distributors.

I'd like to thank our guest, Andrew Rollins, Founder and Chief Software Architect at Localytics, based in Boston. Thank you, Andrew.

Rollins: Thank you very much for having me.

Gardner: And thanks to you, our audience, for joining as well. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for joining, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how big data helps an analytics company improve data-driven marketing on a variety of platforms. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, May 26, 2015

Big Data Helps Conservation International Proactively Respond to Species Threats in Tropical Forests

Transcript of a BriefingsDirect discussion on how a conservation group, partnering with HP, brings real-time environmental data into the hands of environmental policy makers.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people's lives.

Gardner
Once again, we're focusing on how companies are adapting to the new style of IT to improve IT performance, gain new insights and deliver better user experiences, as well as better overall business results.

Our next innovation case study interview highlights how Conservation International (CI) in Arlington, Virginia uses new technology to pursue more data about what's going on in tropical forests and other ecosystems around the world.

As a non-profit, they have a goal of a sustainable planet, but we're going to learn how they've learned to measure what was once unmeasurable -- and then to share that data to promote change and improvement.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
To learn how, we're joined by Eric Fegraus, Director of Information Systems at Conservation International. Welcome, Eric.

Eric Fegraus: Thank you, Dana. It’s a pleasure to be here.

Gardner: First, tell us the relationship with technology. Conservation International recently announced HP Earth Insights. What is that all about?

Fegraus: HP Earth Insights is a partnership between Conservation International and HP and it's really about using technology to accelerate the work and impact of some of the programs within Conservation International. What we've been able to do is bring the analytics and a data-driven approach to build indices of wildlife communities in tropical forests and to be able to monitor them in near-real-time.

Fegraus
Gardner: I'm intrigued by this concept of being able to measure what was once unmeasurable. What do you mean by that?

Fegraus: This is really a telling line. We really don’t know what’s happening in tropical forests. We know some general things. We can use satellite imagery and see how forests are increasing or decreasing from year to year and from time period to time period. But we really don't know the finer scale measurements. We don't know what's happening within the forest or what animal species are increasing or are decreasing.

There's some technology that we have out in the field that we call camera traps, which take images or photos of the animals as they pass by. There are also some temperature sensors in them. Through that technology and some of the data analytics, we're able to actually evaluate and monitor those species over time.

Inference points

Gardner: One of the interesting concepts that we've seen is that for a certain quantity of data, let's say 10,000 data points, you can get magnitude of order more inference points. How does that work for you, Eric? Even though you're getting a lot of data, how does that translate into even larger insights?

Fegraus: We have some of the largest datasets in our field in terms of camera trapping data and wildlife communities. But within that, you also have to have a modeling approach to be able to utilize that data, use some of the best statistics, transform that into meaningful data products, and then have the IT infrastructure to be able to handle it and store it. Then, you need the data visualization tools to have those insights pop out at you.

Gardner: So, not only are you involved with HP in terms of the Earth Insights Project, but you're a consumer of HP technology. Tell us a little bit about Vertica and HP Haven, if that also is something you are involved with?

Fegraus: Yes. All of our servers are HP ProLiant servers. We've created an analytical space within our environment using the HP ProLiant servers, as well as HP Vertica. That's really the backbone of our analytical environment. We're also using R and we're now exploring with Distributed R within the Vertica context.

We’re using the HP Cloud for data storage and back up and we’re working on making the cloud a centerpiece for data exchange and analysis for wildlife monitoring. In terms of Haven, we're exploring other parts of Haven, in particular HP Autonomy, and a few other concepts, to help with unstructured data types.
What we want to do is get the best available data at the right spatial and temporal scales, the best science, and the right technology.

Gardner: Eric, let’s talk a little bit about what you get when you do good data analytics and how it changes the game in a lot of industries, not just conservation. I'm thinking about being able to project into people’s understanding of change.

So for someone to absorb an understanding that things need to happen in order for things to improve, there is a sense of convincing. What is big data bringing to the table for you when you go to governments or companies and try to promulgate change in these environments?

Fegraus: From our perspective, what we want to do is get the best available data at the right spatial and temporal scales, the best science, and the right technology. Then, when we package all this together, we can present unbiased information to decision makers, which can lead to hopefully good sustainable development and conservation decisions.

These decision makers can be public officials setting conservation policies or making land use decisions. They can be private companies seeking to value natural capital or assess the impacts of sourcing operations in sensitive ecosystems.

Of course, you never have control over which way legislation and regulations can go, but our goal is to bring that kind of factual information to the people that need it.

Astounding results

Gardner: And one of the interesting things for me is how people are using different data sets from areas that you wouldn't think would have any relationship to one another, but then when you join and analyze those datasets, you can come up with astounding results. Is this the case with you? Are you not only gathering your own datasets but finding the means to jibe that with other data and therefore come up with other levels of empirical analysis?

Fegraus: We are. A lot of the analysis today has been focused on the data that we've collected within our network. Obviously, there are a lot of other kinds of big data sets out there, for example, provided by governments and weather services, that are very relevant to what we're doing. We're looking at trying to utilize those data sets as best we can.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Of course, you also have to be careful. One of the key things we want to do is look for patterns, but we want to make sure that the patterns we're seeing, and the correlations we detect, all make sense within our scientific domain. You don’t want to create false correlations and improbable correlations.

Gardner: And among those correlations that you have been able to determine so far, about 12 percent of species are declining in the tropical forest. This information is thanks to your Tropical Ecology Assessment and Monitoring (TEAM) and HP Earth Insights. And there are many cases not yet perceived as being endangered. So maybe you could just share some of the findings, some of the outcome from all this activity.

Fegraus: We've actually worked up a paper, and that’s one of the insights. It’s telling, because species are ranked by “whether they are considered endangered or not.” So species that are considered “least concerned” according to the International Union for the Conservation of Nature (IUCN), we assume that they are doing okay.

So you wouldn’t expect to find that those species are actually declining. That can really serve as an early warning, a wake-up call, to protected-area managers and government officials in charge of those areas. There are actually some unexpected things happening here. The things that we thought were safe are not that safe.
Whether we are in the Amazon or whether we're in a forest in Asia or Indonesia, we can have results that are important locally

Gardner: And, for me, another telling indicator was that on an aggregate basis, some species are being measured and there isn’t any sense of danger or problem, but when you go localized, when you look at specific regions and ecosystems, you develop a different story. Was there an ability for your data gathering to give you more a tactical and insights that are specific?

Fegraus: That’s one of the really nice things about the TEAM Network, a partnership between Conservation International, the Wildlife Conservation Society and the Smithsonian Institution. In a lot of the work that TEAM does, we really work across the globe. Even though we're using the same methodologies, the same standards, whether we are in the Amazon or whether we're in a forest in Asia or Indonesia, we can have results that are important locally.

Then, as you aggregate them through sub-national level efforts, national-levels, or even continental levels, that's where we're trying to have the data flow up and down those spatial scales as needed.

For example, even though a particular species may be endangered worldwide we may find that locally, in a particular protected area, that species is stable. This provides important information to the protected area manager that the measures that are in place seem to be working for that species. It can really help in evaluating practices, measuring conservation goals and establishing smart policy.

Sense of confidence

Gardner: I've also spoken to some folks who express a sense of relief that they can go at whatever data they want and have a sense of confidence that they have systems and platforms that can handle the scale and the velocity of that data. It is sort of a freeing attitude that they don’t have to be concerned at the data level. They can go after the results and then determine the means to get the analysis that they need.

Is that something that you also share, that with your partnership with HP and with others, that this is about the determination of the analysis and the science, and you're not limited by some sort of speeds-and-feeds barrier?
The problem has really been bringing the technology, analytics, and tools to the programs that are mission critical, bringing all of this to business driven programs that are really doing the work.

Fegraus: This gets to a larger issue within the conservation community, the non-profits, and the environmental consulting firms. Traditionally, IT and technology has been all about keeping the lights on and making sure everyone has a laptop. There's a saying that people can share data, but the problem has really been bringing the technology, analytics, and tools to the programs that are mission critical, bringing all of this to business driven programs that are really doing the work.

One of the great outcomes of this is that we've pushed that technology to a program like TEAM and we're getting the cutting-edge technology that a program like TEAM needs into their hands, which has really changed the dynamic, compared to the status quo.

Gardner: So scale really isn't the issue any longer. It's now about your priorities and your requirements for the scientific activity?

Fegraus: Yes. It's making sure that technology meets the requirements in scientific and program objectives. And that's going to vary quite a bit depending on the program and the group that we were talking about, but ultimately it’s about enabling and accelerating the mission critical work of organizations like Conservation International.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Gardner: We've been discussing new data gathering and analysis programs to better determine tropical forest impacts for species and other conservation goals, and we've been learning this from our guest, Eric Fegraus, Director of Information Systems at Conservation International based in Arlington, Virginia. Thanks so much, Eric.

Fegraus: Thank you so much, Dana.

Gardner: And I like to thank our audience as well, for joining us for the special new style of IT discussion.

I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how a conservation group, partnering with HP, brings real-time environmental data into the hands of environmental policy makers. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Wednesday, April 22, 2015

ECommerce Portal Avito Uses Big Data to Master Just-in-Time Ad Fraud Detection

Transcript of a BriefingsDirect discussion on how a Russian ecommerce and search engine site leverages big data analytics to identify fraud.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
Our next innovation case study interview highlights how Avito, a Russian eCommerce site and portal, uses big data technology to improve fraud detection, as well as better understand how their users adapt to new advertising approaches.

Gardner
To learn more about how big data offers new insights to the eCommerce portal user experience, please join me in welcoming Nikolay Golov, Chief Data Warehousing Architect at Avito in Moscow. Welcome.

Nikolay Golov: Hi.

Gardner: Tell us a little bit about your site and your business at Avito. It sounds like the Craigslist of Russia.

Golov: Yes, Avito is a Russian Craigslist. It's a big site and also the biggest search engine for some goods. We at Avito have more searches, for example, from iPhones than Google or Yandex. Yandex is a Russian Google.

Golov
Gardner: Does Avito cover all types of goods, services, business-to-business commerce?

Golov: On Avito, you can sell almost anything that can be bought in the market. You can sell cars, you can sell houses, or rent them, for example. You can even find boats or business jets. We right now have about three business jets listed.

Gardner: So quite a diversity. What are your big data needs? It sounds as if in a country as large as Russia -- with that many goods and services -- you have a high-volume-of-data issue.

Size advantage

Golov: The main advantages of Avito is firstly its size. Everybody in Russia knows that if you want to buy or sell something, the best place for it is Avito. It’s first.

http://www.hp.com/Second is speed. It is very easy to use it. We have a very easy interface. So we must keep these two advantages. But there are also some people who want to use Avito to sell weapons, drugs, and prohibited medicines. It's absolutely critical for Avito to keep it all clean, to prevent such items from appearing in the queries of our visitors.

We're growing very fast, and if we use moderators we'll have to increase our expense on moderation in a linear progressions as we grow. So, the only solution to avoid a linear increase in expenses is to use automation.

Gardner: In order to rapidly decide which should or should not be appearing on your site, you’ve decided to use a data warehouse that provides a streaming real-time data automation effect. Tell me what your requirements are for that technology?

Golov: We have various requirements. For example, we need to be able to perform fast fraud detection. The warehouse has to have very little delay. Hours are not permitted, it must be 10 minutes, no more.
Our data warehouse has to be big. It has to store months, possibly years, of data.

Second, we have to have data for long periods of time to learn our data mining algorithms, to create reports, and to analyze trends. So our data warehouse has to be very big. It has to store months, possibly years, of data. It has to be fast, or only slightly delayed, and it has to be big.

Third, we're developing very quickly. We're adding some new services, and we're integrating with partners. Not long ago, for example, we added information from Google AdWords to optimize banners. So the warehouse must be very flexible. It must be able to grow in all three ways.

Gardner: How long have you been using HP Vertica and how did you come to choose that particular platform?

Golov: Well over a year now. We chose Vertica for two two main advantages. First, speed of load and data. The I/O speed provided by Vertica is awesome.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
Second is its ability to upgrade, thanks to the commodity hardware. So if you have some new requirements that require you to increase performance, you can just buy new hardware -- commodity hardware -- and its power just increases.

It’s great and it can be done really fast. Vertica was the winner.

Measuring the impact

Gardner: Do you have a sense of what the performance and characteristics of Vertica and your data warehouse have gotten for you? Do you have a sense of reduced fraud by X percent or better analytics that have given you a business advantage of some sort? Are there ways to measure the true impact?

Golov: During last year, Avito grew really fast. We have a moderation team of about 250 persons at the beginning of this process. Now, we have the same moderation team, but the number of items has increased two-fold. I suppose that's one of the best measures that can be used.

Gardner: Fair enough. Now, looking to the future, when you're working in a business where your margins, your business, your revenue comes from the ability to provide advertisement placements, improving the performance and the value on the actual distribution of ads and the costs associated is critical.

In addition to rapid fraud detection and protection, is there a value from your analytics that refines the business algorithms and therefore the retail value to your customers?
We're starting few more products. The main aim of them is to create our own tool for optimizing the directions of advertising.

Golov: We're creating more products. The main aim of them is to create our own tool for optimizing the directions of advertising. We have banners, marketing campaigns, and SMS. So we've achieved some results in our reporting and in fraud prevention. We'll continue to work in that direction, and we are planning to add some new types of functionality to our data warehouse.

Gardner: It certainly seems that a data warehouse delivers a tactical benefit but then over time moves to a strategic benefit. The more data, inference, and understanding you have of your processes, the more powerful you can become as a total business.

Golov: Yes. One of my teachers in data warehouses explained the role of data warehouses in an enterprise. It’s like a diesel engine inside a ship. It just works, works, and works, and it’s hot around it. You can create various tools to increase it, to make it better.

But there must always be something deep inside that continuously provides all of the associated tools with power and strong data services from all sides of the business.

Gardner: I wonder for others who are listening to you and saying, "We really need to have that core platform in order to build out these other values over time." Do you have any lessons that you have learned that you might share. That is to say, if you're starting out to develop your own data warehouse and your own business intelligence (BI) and analytics capabilities, do you have any advice?

Be flexible

Golov: First, you have to be flexible. If you ask a business about changing, they'll tell you that they can’t. It will be absolutely this, every time. And in two months, it will still change. If you're not ready to change using your data warehouse to get needed data and analytics, it would be a disaster. That's first.

Second, there always will be errors in data, there will be gaps, and it's absolutely critical to start building a data warehouse together with an automated data quality system that will automatically control and monitor the quality of all the data. This will help you to see the problems when they occur.
If you're not ready to change the ratio of your data warehouse to get such data, it would be a disaster.

Gardner: I'm afraid we'll have to leave it there. We've been discussing how Avito, a large e-commerce portal and super retail site in Moscow, has been deploying a data warehouse and BI capability to not only prevent fraud, but also to grow its business through a better understanding of its customers and processes.

So, a big thank you to our guest, Nikolay Golov, Chief Data Warehousing Architect at Avito. Thank you so much.

Golov: Thanks a lot.
Become a member of MyVertica
Register now
And gain access to the Free HP Vertica Community Edition.
Gardner: And I'd like to thank our audience as well for joining us today for our special big data innovation discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how a Russian ecommerce and search engine site leverages big data analytics. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, March 31, 2015

Novel Consumer Retail Behavior Analysis From InfoScout Relies on Big Data Chops from HP Vertica

Transcript of a BriefingsDirect discussion on how a consumer research and data analysis firm gleans rich marketing data from customers' shared sales receipts.

Listen to the podcast. Find it on iTunes. Download the transcript. Get the mobile app for iOS or Android. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people's lives.

Gardner
Our next big data innovation case study interview highlights how InfoScout in San Francisco gleans new levels of accurate insights into retail buyer behavior by collecting data directly from consumers’ sales receipts.

In order to better analyze actual retail behaviors and patterns, InfoScout provides incentives for buyers to share their receipts, but InfoScout is then faced with the daunting task of managing and cleansing that essential data to provide actionable and understandable insights.

To learn more about how big -- and even messy -- data can be harnessed for near real time business analysis benefits, please join me in welcoming our guests, Tibor Mozes, Senior Vice President of Data Engineering at InfoScout. Welcome, Tibor.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Tibor Mozes: Good morning. Thanks for having us.

Gardner: I'm glad you're with us. We're also joined today by Jared Schrieber, the Co-founder and CEO at InfoScout, based in San Francisco. Welcome, Jared.

Jared Schrieber: Glad to be here.

Gardner: Jared, let’s start with you. We don’t often get the option of choosing how the best data comes to us. In your business, you've been able to uniquely capture strong data, but you need to treat it a lot to use it and you also need a lot of that data in order to get good trend analysis. So the payback is that you get far better information on essential buyer behaviors, but you need a lot of technology to accomplish that.

Tell us why you wanted to get to this specific kind of data and then your novel way of acquiring it, please.

Consumer panels

Schrieber: A quick history lesson is in order. In the market research industry, consumer purchase panels have been around for about 50 years. They started with diaries in people’s homes, where they had to write down exactly every single product that they bought, day-in day-out, in this paper diary and mail it in once a month.

Schrieber
About 20 years ago, with the advent of modems in people’s homes, leading research firms like Nielsen would send a custom barcode scanner into people’s homes and ask them to scan each product they bought and then thumb into the custom scanner the regular price, the sales price, any coupons or deals that they got, and details about the overall shopping trip, and then transfer that electronically. That approach has not changed in the last 20 years.

With the advent of smartphones and mobile apps, we saw a totally new way to capture this information from consumers that would revolutionize how and why somebody would be willing to share their purchase information with a market research company.

Gardner: Interesting. What is it about mobile that is so different from the past, and why does that provide more quality data for your purposes?

Schrieber: There are two reasons in particular. The first is, instead of having consumers scan the barcode of each and every item they purchase and thumb in the pricing details, we're able to simply have them snap a picture of their shopping receipt. So instead of spending 20 minutes after a grocery shopping trip scanning every item and thumbing in the details, it now takes 15 seconds to simply open the app, snap a picture of the shopping receipt, and be done.

Mozes
The second reason is why somebody would be willing to participate. Using smartphone apps we can create different experiences for different kinds of people with different reward structures that will incentivize them to do this activity.

For example, our Shoparoo app is a next-generation school fundraiser akin to Box Tops for Education. It allows people to shop anywhere, buy anything, take a picture of their receipt, and then we make an instant donation to their kid’s school every time.

Another app is more of a Tamagotchi game called Receipt Hog, where if you download the app, you have adopted a virtual runt. You feed it pictures of your receipt and it levels-up into a fat and happy hog, earning coins in a piggy bank along the way that you can then cash-out from at the end of the day.

These kinds of experiences are a lot more intrinsically and extrinsically rewarding to the panelists and have allowed us to grow a panel that’s many times larger than the next largest panel ever seen in the world, tracking consumer purchases on a day-in day-out basis.

Gardner: What is it that you can get from these new input approaches and incentivization through an app interface? Can you provide me some sort of measurement of an improved or increased amount of participation rates? How has this worked out?

Leaps and bounds

Schrieber: It's been phenomenal. In fact, our panel is still growing by leaps and bounds. We now have 200,000 people sharing with us their purchases on a day-in day-out basis. We capture 150,000 shopping trips a day. The next largest panel in America captures just 10,000 shopping trips a day.

In addition to the shopping trip data, we're capturing geolocation information, Facebook likes and interests from these people, demographic information, and more and more data associated with their mobile device and the email accounts that are connected to it.

Gardner: So yet another unanticipated consequence of the mobility trend that’s so important today.

Tibor, let’s go to you. The good news is that Jared has acquired this trove of information for you. The bad news is that now you have to make sense of it. It’s coming in, in some interesting ways, as almost a picture or an image in some cases, and at a great volume. So you have velocity, variability, and volume. So what does that mean for you as the Vice President of Data Engineering?

Mozes: Obviously this is a growing panel. It’s creating a growing volume of data that has created a massive data pipeline challenge for us over the years, and we had to engineer the pipeline so that is capable of processing this incoming data as quickly as possible.
It’s creating a growing volume of data that has created a massive data pipeline challenge for us over the years.

As you can imagine, our data pipeline has gone through an evolution. We started out with a simple solution at the beginning with MySQL and then we evolved it using Elastic Map Reduce and Hive.

But we felt that we wanted to create a data pipeline that’s much faster, so we can bring data to our customers much faster. That’s how we arrived at Vertica. We looked at different solutions and found Vertica a very suitable product for us, and that’s what we're using today.

Gardner: Walk me through the process, Tibor. How does this information come in, how do you gather it, and where does the data go? I understand you're using the HP Vertica platform as a cloud solution in the Amazon Web Services Cloud. Walk me through the process for the data lifecycle, if you will.

Mozes: We use AWS for all of our production infrastructure. Our users, as Jared mentioned, typically download one of our several apps, and after they complete a receipt scan from their grocery purchases, that receipt is immediately uploaded to our back-end infrastructure.

We try to OCR that image of the receipt, and if we can’t, we use Amazon Mechanical Turk to try to make sense of the image and turn that image into text. At the end of the day, when an image is processed, we have a fairly clean version of that receipt in a text format.

Next phase

In the next phase, we have to process the text and try to attribute various items on the receipt and make the data available in our Vertica data warehouse.

Then, our customers, using a business intelligence (BI) platform that we built especially for them, can analyze the data. The BI platform connects to Vertica, so our customers can analyze various metrics of our users and their shopping behavior.

Gardner: Jared, back to you. There's an awful lot of information on a receipt. It’s supposed to be very complex, given not just the date and the place and the type of retail organization, but all the different SKUs, every item that’s possibly being bought. How do you attack that sort of a data problem from a schema and cleansing and extract, transform, load (ETL) and then making it therefore useful?

Schrieber: It’s actually a huge challenge for us. It's quite complex, because every retailer’s receipt is different. The way that they structure the receipt, the level of specificity about the items on the receipt, the existence of product codes, whether they are public product codes like the kind of you see on a barcode for a soda product versus an internal product code that retailers use as a stock keeping unit internally versus just a short description on the receipt.

One of our challenges as a company is to figure out the algorithmic methods that allow us to identify what each one of those codes and short descriptions actually represent in terms of a real world product or category, so that we can make sense of that data on behalf of our client. That’s one of the real challenges associated with taking this receipt-based approach and turning that into useful data for our clients on a daily basis.
One of our challenges as a company is to figure out the algorithmic methods that allow us to identify what each one of those codes and short descriptions actually represent.

Gardner: I imagine this would be of interest to a lot of different types of information and data gathering. Not only are pure data formats and text formats being brought into the mix, as has been the case for many years, but this image-based approach, the non-structured approach.

Any lessons learned here in the retail space that you think will extend to other industries? Are we going to be seeing more and more of this image-based approach to analysis gathering?

Schrieber: We certainly are. As an example, just take Google Maps and Google Street View, where they're driving around in cars, capturing images of house and building numbers, and then associating that to the actual map data. That’s a very simple example.

A lot of the techniques that we're trying to apply in terms of making sense of short descriptions for products on receipts are akin to those being used to understand and perform social-media analytics. When somebody makes a tweet, you try to figure out what that tweet is actually about and means, with those abbreviated words and shortened character sets. It’s very, very similar types of natural language processing and regular expression algorithms that help us understand what these short descriptions for products actually mean on a receipt.

Gardner: So we've had some very substantial data complexity hurdles to overcome. Now we have also the basic blocking and tackling of data transport, warehouse, and processing platform.

Going back to Tibor, once you've applied your algorithms, sliced and diced this information, and made it into something you can apply to a typical data warehouse and BI environment, how did you overcome these issues about the volume and the complexity, especially now that we're dealing with a cloud infrastructure?

Compression algorithms

Mozes: One of the benefits of Vertica, as we went into the discovery process, was the compression algorithms that Vertica is using. Since we have a large volume of data to deal with and build analytics from, it has turned out to be beneficial for us that Vertica is capable of compressing data extremely well. As a result of that, some of our core queries that require a BI solution can be optimized to run super fast.

You also talked about the cloud solution, why we went into the cloud and what is the benefit of doing that. We really like running our entire data pipeline in AWS because it’s super easy to scale it up and down.

It’s easy for us to build a new Vertica cluster, if we need to evaluate something that’s not in production yet, and if the idea doesn’t work, then we can just pull it down. We can scale Vertica up, if we need to, in the cloud without having to deal with any sort of contractual issues.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Schrieber: To put this in context, now we're capturing three times as much data every day as we were six months ago. The queries that we're running against this have probably gone up 50X to a 100X in that time period as well. So when we talk about needing to scale this up quickly, that’s a prime example as to why.

Gardner: What has happened in just last six months that’s required that ramp up? Is it just because of the popularity of your model, the impactfulness and effectiveness of the mobile app acquisition model, or is it something else at work here?

Schrieber: It’s twofold. Our mobile apps have gotten more and more popular and we've had more and more consumers adopt them as a way to raise money for their kid’s school or earn money for themselves in a gamified way by submitting pictures of their receipts. So that’s driven massive growth in terms of the data we capture.

Also, our client base has more than tripled in that time period as well. These additional clients have greater demands of how to use and leverage this data. As those increase, our efforts to answer their business questions multiplies the number of queries that we are running against this data.

Gardner: That, to me, is a real proof point of this whole architectural approach. You've been able to grow by a factor of three in your client base in six months, but you haven’t gone back to them and said, "You'll have to wait for six months while we put in a warehouse, test it, and debug it." You've been able to just take that volume and ramp up. That’s very impressive.

Schrieber: I was just going to say, this is a core differentiator for us in the marketplace. The market research industry has to keep up with the pace of marketing, and that pace of marketing has shifted from months of lead time for TV and print advertising down to literally hours of lead time to be able to make a change to a digital advertising campaign, a social media campaign, or a search engine campaign.

So the pace of marketing has changed and the pace of market research has to keep up. Clients aren’t willing to wait for weeks, or even a week, for a data update anymore. They want to know today what happened yesterday in order to make changes on-the-fly.

Reports and visualization

Gardner: We've spoken about your novel approach to acquiring this data. We've talked about the importance of having the right platform and the right cloud architecture to both handle the volume as well as scale to a dynamic rapidly growing marketplace.

Let’s talk now about what you're able to do for your clients in terms of reports, visualization, frequency, and customization. What can you now do with this cloud-based Vertica engine and this incredibly valuable retail data in a near real-time environment for your clients?

Schrieber: A few things on the client side. Traditional market research providers of panel data have to put a very tight guardrails on how clients can access and run reports against the data. These queries are very complex. The numerators and denominators for every single record of the reports are different and can be changed on-the-fly.

If, all of a sudden, I want to look at anyone who shopped at Walmart in the last 12 months that has bought cat food in the last month and did so at a store other than Walmart, and I want to see their purchase behavior and how they shop across multiple retailers and categories, and I want to do that on-the-fly, that gets really complex. Traditional data warehousing and BI technologies don't support allowing general business-analyst users to be able to run those kinds of queries and reports on-demand, yet that’s exactly what they want.

They want to be able to ask those business questions and get answers. That’s been key to our strategy, which is to allow them to do so themselves, as opposed to coming back to them and saying, "That’s going to be a pretty big project. It will require a few of our engineers. We'll come back to you in a few weeks and see what we can do." Instead, we can hand them the tools directly in a guided workflow to allow them to do that literally on-the-fly and have answers in minutes versus weeks.
They want to be able to ask those business questions and get answers. That’s been key to our strategy.

Gardner: Tibor, how does that translate into the platform underneath? If you're allowing for a business analyst type of skill set to come in and apply their tools, rather than deep SQL queries or other more complex querying tools, what is it that you need from your platform in order to accommodate that type of report, that type of visualization, and the ability to bring a larger set of individuals into this analysis capability?

Mozes: Imagine that our BI platform can throw out very complex SQL queries. Our BI platform essentially is using, under the hood, a query engine that's going to run queries against Vertica. Because, as Jared mentioned, the questions are so complex, some of the queries that we run against Vertica are very different than your typical BI use cases. They're very specialized and very specific.

One of the reasons we went with Vertica is its ability to compute very complex queries at a very high speed. We look at Vertica not as simply another SQL database that scales very well and that’s very fast, but we also look at it as a compute engine.

So as part of our query engine, we are running certain queries and certain data transformations that would be very complicated to run outside Vertica.

We take advantage of the fact that you can create and run custom UDFs that is not part of the ANSI 99 SQL. We also take advantage some of the special functions that are built into Vertica allowing data to be sessionized very easily.

Analyzing behavior

Jared can talk about some of the use cases where we like to analyze user’s entire shopping trips. In order to do that, we have to stitch together different points in time that the user has gone through and shopped at various locations. And using some of the built –in functions in Vertica that’s not standard SQL, we can look at shopping journeys, we call them trip circuits, and analyze user behavior along the trip.

Gardner: Tibor, what other ways can you be using and exploiting the Vertica capabilities in the deliverables for your clients?

Mozes: Another reason we decided to go with Vertica is its ability to optimize very complex queries. As I mentioned, our BI platform is using a query engine under the hood. So if a user asks a very complicated business question, our BI platform turns that question into a very complicated query.

One of the big benefits of using Vertica is to be able to optimize these queries on the fly. It’s easy to do this with running the database optimizer to build custom projections, making queries running much faster than we could do before.
Another reason we decided to go with Vertica is its ability to optimize very complex queries.

Gardner: I always think more impactful for us to learn through an example rather than just hear you describe this. Do you have any specific InfoScout retail client use cases where you can describe how they've leveraged your solution and how some of these both technical and feature attributes have benefited them -- an example of someone using InfoScout and what it's done for them?

Schrieber: We worked with a major retailer this holiday season to track in real time what was happening for them on Thanksgiving Day and Black Friday. They wanted to understand their core shoppers, versus less loyal shoppers, versus non-core shoppers, how these people were shopping across retailers on Thanksgiving Day and Black Friday, so that the retailer could try to respond in more real time to the dynamics happening in the marketplace.

You have to look at what it takes to do that, for us to be able to get those receipts, process them, get them transcribed, get that data in, get the algorithms run to be able to map it to the brands and categories and then to calculate all kinds of metrics. The simplest ones are market share; the most complex ones have to do with what Tibor had mentioned: the shopper journey or the trip circuit.

We tried to understand, when this retailer was the shopper's first stop, what were they most likely to buy at that retailer, how much were they likely to spend, and how is that different than what they ended up buying and spending at other retailers that followed? How does that contrast to situations where that retailer was the second stop or the last stop of the day in that pivotal shopping day that is Black Friday?

For them to be able to understand where they were winning and losing among what kinds of shoppers who were looking for what kinds of products and deals was an immense advantage to them -- the likes of which they never had before.

Decision point

Gardner: This must be a very sizable decision point for them, right? This is going to help you decide where to build new retail outlets, for example, or how to structure the experience of the consumer walking through that particular brick-and-mortar environment.

When we bring this sort of analysis to bear, this isn’t refining at a modest level. This could be a major benefit to them in terms of how they strategize and grow. This could be something that really deeply impacts their bottom line. Is that not the case?

Schrieber: It has implications as to what kinds of categories they feature in their television, display advertising campaigns, and their circulars. It can influence how much space they give in their store to each one of the departments. It has enormous strategic implications, not just tactical day-to-day pricing decisions.

Gardner: Now, that was a retail example. I understand you also have clients that are interesting in seeing how a brand works across a variety of outlets or channels. Is there another example you can provide on somebody who is looking to understand a brand impact at a wider level across a geography for example?
It has enormous strategic implications, not just tactical day-to-day pricing decisions.

Schrieber: I'll give you another example that relates to this. A retailer and a brand were working together to understand why the brand sales were down at this particular retailer during the summer time. To make it clear for you, this is a brand of ice-cream. Ice cream sales should go up during the summer, during the warmer months, and the retailer couldn’t understand why their sales were underperforming for this brand during the summer.

To figure this out, we had to piece-together, along the shopper journey over time, not only in the weeks during the summer months, but year round to understand this dynamic of how they were shopping. What we were able to help the client quickly discover was that during the summer months people eat more ice-cream. If they eat more ice-cream, they're going to want larger pack sizes when they go and buy that ice-cream. This particular retailer tended to carry smaller pack sizes.

So when the summer months came around, even though people has been buying their ice-cream at this retailer in the winter and spring, they now wanted larger pack sizes and they were finding them at other retailers, and switching their spend over to these other retailers.

So for the brand, the opportunity was a selling story to the retailer to give the brand more freezer space and to carry an additional assortment of products to help drive greater sales for that brand, but also to help the retailer grow their ice cream category sales as well.

Idea of architecture

Gardner: So just that insight could really help them figure that out. They probably wouldn’t have been able to do it any other way.

We've seen some examples of how impactful this can be and how much a business can benefit from it. But let’s go back to the idea of the architecture. For me, one of my favorite truths in IT is that architecture is destiny. That seems to be the case with you, using the combination of AWS and HP Vertica.

It seems to me that you don’t have to suffer the costs of a large capital outlay of having your own data center and facilities. You're able to acquire these very advanced capabilities at a price point that's significantly less from a capital outlay and perhaps predictable and adjustable to the demand.

Is that something you then can pass along? Tell me a little bit about the economics of how this architectural approach works for you?

Mozes: One of the benefits of using AWS is that it’s very easy for us to adjust our infrastructure on demand, as we see fit. Jared has referred to some of the examples that we had before. We did a major analysis for a large retailer on Black Friday, and we had some special promotions to our mobile app users going on at that point. Imagine that our data volume would grow tremendously from one day to the next couple of days, and then after when the promotion is over and the big shopping season is over, our volume would come down somewhat.
It’s very cost efficient to run an operation where you can just add additional computing power as you need, and then when you don’t need that anymore, you can scale it down.

When you run an infrastructure in the cloud in combination with online data storage and data engine, it's very easy to scale it up and down. It’s very cost efficient to run an operation where you can just add additional computing power as you need, and then when you don’t need that anymore, you can scale it down.

We did this during a time period, when we had to bring a lot fresh data online quickly. We could just add additional nodes, and we saw very close to linear scalability by increasing our cluster size.

Schrieber: On the business side, the other advantage is we can manage our cash flows quite nicely. If you think about running a startup, cash is king, and not having to do large capital outlays in advance, but being able to adjust up and down with the fluctuations in our businesses, is also valuable.

Gardner: We're getting close to the end of our time. I wonder if you have any other insights into the business benefits from an analytics perspective of doing it this way. That is to say, incentivizing consumers, getting better data, being able to move that data and then analyze it at an on-demand infrastructure basis, and then deliver queries in whole new ways to a wider audience within your client-base.

I guess I'm looking for how this stands up both to the competitive landscape, but also to the past. How new and how innovative is this in marketing? Then we'll talk about where we go next? Let’s try to get a level set as to how new and how refreshing this is, given what the technology enables both at cloud basis and the mobility basis and then the core stuff, the underlying analytics platform basis.

Product launch

Schrieber: We have an example that's going on right now around a major new product launch for a very large consumer goods company. They chose us to help monitor this launch, because they were tired of waiting for six months for any insight in terms of who is buying it, how they were discovering it, how they came about choosing it over the competition, how their experience was with the product, and what it meant for their business.

So they chose to work with us for this major new brand launch, because we could offer them visibility within days or weeks of launching that new product in the market to help them understand who were the people who were buying, was it the target audience that they thought it was going to be, or was it a different demographic or lifestyle profile than they were expecting. If so, they might need to change their positioning or marketing tactics and targeting accordingly.

How are these people discovering the products? We're able to trigger surveys to them in the moment, right after they've made that purchase, and then flow that data back through to our clients to help them understand how these people are discovering it. Was it a TV advertisement? Was it discovered on the shelf or display in the store? Did a friend tell them about it? Was their social media marketing campaign working?
Often, hundreds of millions of dollars spent by major consumer goods companies on new brand launches to get this quick feedback in terms of what’s working and what’s not.

We're also able to figure out what these people were buying before. Were they new to this category of product? Or did they not use this kind of product before and were just giving it a try? Were they buying a different brand and have now switched over from that competitor? And, if so, how did they like it by comparison, and will they repeat purchase? Is this brand going to be successful? Is this meeting needs?

These are enormous decisions. Often, hundreds of millions of dollars spent by major consumer goods companies on new brand launches to get this quick feedback in terms of what’s working and what’s not, who to target with what kind of messaging, and what it’s doing to the marketplace in terms of stealing share from competitors.

Driving new people to the product category can influence major investment decisions along the lines of whether we need to build the new manufacturing facility, do we need to change our marketing campaigns, or should we go ahead and invest in that TV Super Bowl ad, because this really has a chance to go big?

These are massive decisions that these companies can now make in a timely manner, based on this new approach of capturing and making use of the data, instead of waiting six months on a new product launch. They're now waiting just weeks and are able to make the same kinds of decisions as a result.

Gardner: So, in a word it’s unprecedented. You really just haven’t been able to do this before.

Schrieber: It’s not been possible before at all, and I think that’s really what’s fueling the growth in our business.

Look to the future

Gardner: Let’s look to the future quickly. We hear a lot about the Internet of Things. We know that mobile is only partially through its evolution. We're going to see more smart phones in more hands doing more types of transactions around the globe. People will be using their phones for more of what we have thought of as traditional business in commerce. So that opens up a lot more information that’s generated and therefore need to gather and then analyze.

So where do we go next? How does this generate additional novel capabilities, and then where do we go perhaps in terms of verticals? We haven’t even talked about food or groceries, hospitality, or even health care.

So without going too far -- this could be another hour conversation in itself -- maybe we could just tease the listener and the reader with where the potential for this going forward is.

Schrieber: If you think about Internet of Things as it relates to our business, there are a couple of exciting developments. One is the use of things like beacons inside of stores. Now we can know exactly which aisle people have walked down and what shelf they’ve stood in front of, and what product they've interacted with. That beacon is communicating with their smartphone and that smartphone is tied to our user account in a way that we're surveying these individuals or triggering surveys to them, in-the-moment, as they shop.
That will open up entirely new fields of research and consumer understanding about how people shop and make decisions at the shelf.

That’s not something that’s been doable before. It’s something that the Internet of Things, and very specifically beacons linking with smartphones, will allow us to do going forward. That will open up entirely new fields of research and consumer understanding about how people shop and make decisions at the shelf.

The same is true inside the home. We talk about the Internet of Things as it relates to smart refrigerators or smart laundry machines, etc. Understanding daily lifestyle activities and how people make the choice of which product to use and how to use them inside their home is a field of research that is under-served today. The Internet of Things is really going to open up in the years to come.

Gardner: Just quickly, what are other retail sectors or vertical industries where this would make a great deal of sense.

Schrieber: I have a friend who runs an amazing business called Wavemark, which is basically an Internet of Things for medical devices and medical consumables inside of hospitals and care facilities, with the ability to track inventory in real time, tying it to patients and procedures, tying it back to billing and consumption.

Making all of that data available to the medical device manufacturers, so that they can understand how and when their products are being used in the real world in practice, is revolutionizing that industry. We're seeing it in healthcare, and I think we're going to see it across every industry.

Engineering perspective

Gardner: Last word to you, Tibor. Given what Jared just told us about the greater applicability. The model, the architecture comes back to mind for me, the cloud, the mobile device, the data, the engine, the ability to deal with that velocity, volume, and variability at a cost point that is doable and scales up and down. Are there any thoughts about this from an engineering perspective and where we go next?

Mozes: We see that with all these opportunities bubbling up, the amount of data that we have to process on a daily basis is just going to continually grow at an exponential rate. We continue to get additional information on shopping behavior and more data from external data sources. Our data is just going to grow. We will need to engineer everything to be as scalable as possible.

Gardner: Very good. I'm afraid we will have to leave it there. We've been learning about how InfoScout in San Francisco gleans new levels of accurate insights into consumer behavior by collecting data directly from sales receipts.

In order to better analyze that data and use it, we have seen how they have used an architecture based on the AWS public cloud, the infrastructure as a service and data as a service capability, but built on HP Vertica as the engine for analytics and for delivery of the analysis.

InfoScout is faced with the daunting task of managing and cleansing this data and they've been able to scale very impressively over the past six months using Vertica in the cloud.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
To learn more, we've been here with our two guests, and I’d really like to thank them. Tibor Mozes, Senior Vice President of Data Engineering at InfoScout. Thank you so much, Tibor.

Mozes: Thank you.

Gardner: And also Jared Schrieber, Co-founder and CEO at InfoScout. Thank you so much, Jared.

Schrieber: Pleasure, Dana. Thank you.

Gardner: And a big thank you as well to our audience for joining us for this special new style of big data discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for joining, and don’t forget to come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Get the mobile app for iOS or Android. Sponsor: HP.


Transcript of a BriefingsDirect discussion on how a consumer research and data analysis firm gleans rich marketing data from customers' shared sales receipts. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: