Showing posts with label Tableau. Show all posts
Showing posts with label Tableau. Show all posts

Wednesday, November 18, 2015

Big Data Enables Top User Experiences and Extreme Personalization for Intuit TurboTax

Transcript of a BriefingsDirect discussion on how TurboTax uses big data analytics to improve performance despite high data volumes during peak usage.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HPE Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big data innovation case study highlights how Intuit uses deep-data analytics to gain a 360-degree view of its TurboTax application's users’ behavior and preferences. Such visibility allows for rapid applications improvements and enables the TurboTax user experience to be tailored to a highly detailed degree.

Here to share how analytics paves the way to better understanding of end-user needs and wants, we're joined by Joel Minton, Director of Data Science and Engineering for TurboTax at Intuit in San Diego. Welcome to Briefings Direct, Joel.

Joel Minton: Thanks, Dana, it’s great to be here.
HP Vertica Community Edition
Start Your Free Trial Now
Gain Access to All Features and Functionality
Gardner: Let’s start at a high-level, Joel, and understand what’s driving the need for greater analytics, greater understanding of your end-users. What is the big deal about big-data capabilities for your TurboTax applications?

Minton: There were several things, Dana. We were looking to see a full end-to-end view of our customers. We wanted to see what our customers were doing across our application and across all the various touch points that they have with us to make sure that we could fully understand where they were and how we can make their lives better.

Minton
We also wanted to be able to take that data and then give more personalized experiences, so we could understand where they were, how they were leveraging our offerings, but then also give them a much more personalized application that would allow them to get through the application even faster than they already could with TurboTax.

And last but not least, there was the explosion of available technologies to ingest, store, and gain insights that was not even possible two or three years ago. All of those things have made leaps and bounds over the last several years. We’ve been able to put all of these technologies together to garner those business benefits that I spoke about earlier.

Gardner: So many of our listeners might be aware of TurboTax, but it’s a very complex tax return preparation application that has a great deal of variability across regions, states, localities. That must be quite a daunting task to be able to make it granular and address all the variables in such a complex application.

Minton: Our goal is to remove all of that complexity for our users and for us to do all of that hard work behind the scenes. Data is absolutely central to our understanding that full end-to-end process, and leveraging our great knowledge of the tax code and other financial situations to make all of those hard things easier for our customers, and to do all of those things for our customers behind the scenes, so our customers do not have to worry about it.

Gardner: In the process of tax preparation, how do you actually get context within the process?

Always looking

Minton: We're always looking at all of those customer touch points, as I mentioned earlier. Those things all feed into where our customer is and what their state of mind might be as they are going through the application.

To give you an example, as a customer goes though our application, they may ask us a question about a certain tax situation.

When they ask that question, we know a lot more later on down the line about whether that specific issue is causing them grief. If we can bring all of those data sets together so that we know that they asked the question three screens back, and then they're spending a more time on a later screen, we can try to make that experience better, especially in the context of those specific questions that they have.

As I said earlier, it's all about bringing all the data together and making sure that we leverage that when we're making the application as easy as we can.

Gardner: And that's what you mean by a 360-degree view of the user: where they are in time, where they are in a process, where they are in their particular individual tax requirements?

Minton: And all the touch points that they have with not only things on our website, but also things across the Internet and also with our customer-care employees and all the other touch points that we use try to solve our customers’ needs.
During our peak times of the year during tax season, we have billions and billions of transactions.

Gardner: This might be a difficult question, but how much data are we talking about? Obviously you're in sort of a peak-use scenario where many people are in a tax-preparation mode in the weeks and months leading up to April 15 in the United States. How much data and how rapidly is that coming into you?

Minton: We have a tremendous amount of data. I'm not going to go into the specifics of the complete size of our database because it is proprietary, but during our peak times of the year during tax season, we have billions and billions of transactions.

We have all of those touch points being logged in real-time, and we basically have all of that data flowing through to our applications that we then use to get insights and to be able to help our customers even more than we could before. So we're talking about billions of events over a small number of days.

Gardner: So clearly for those of us that define big data by velocity, by volume, and by variety, you certainly meet the criteria and then some.

Unique challenges

Minton: The challenges are unique for TurboTax because we're such a peaky business. We have two peaks that drive a majority of our experiences: the first peak when people get their W-2s and they're looking to get their refunds, and then tax day on April 15th. At both of those times, we're ingesting a tremendous amount of data and trying to get insights as quickly as we can so we can help our customers as quickly as we can.

Gardner: Let’s go back to this concept of user experience improvement process. It's not just something for tax preparation applications but really in retail, healthcare, and many other aspects where the user expectations are getting higher and higher. People expect more. They expect anticipation of their needs and then delivery of that.

This is probably only going to increase over time, Joel. Tell me a little but about how you're solving this issue of getting to know your user and then being able to be responsive to an entire user experience and perception.

Minton: Every customer is unique, Dana. We have millions of customers who have slightly different needs based on their unique situations. What we do is try to give them a unique experience that closely matches their background and preferences, and we try to use all of that information that we have to create a streamlined interaction where they can feel like the experience itself is tailored for them.
So the most important thing is taking all of that data and then providing super-personalized experience based on the experience we see for that user and for other users like them.

It’s very easy to say, “We can’t personalize the product because there are so many touch points and there are so many different variables.” But we can, in fact, make the product much more simplified and easy to use for each one of those customers. Data is a huge part of that.

Specifically, our customers, at times, may be having problems in the product, finding the right place to enter a certain tax situation. They get stuck and don't know what to enter. When they get in those situations, they will frequently ask us for help and they will ask how they do a certain task. We can then build code and algorithms to handle all those situations proactively and be able to solve that for our customers in the future as well.

So the most important thing is taking all of that data and then providing super-personalized experience based on the experience we see for that user and for other users like them.

Gardner: In a sense, you're a poster child for a lot of elements of what you're dealing with, but really on a significant scale above the norm, the peaky nature, around tax preparation. You desire to be highly personalized down to the granular level for each user, the vast amount of data and velocity of that data.

What were some of your chief requirements at your architecture level to be able to accommodate some of this? Tell us a little bit, Joel, about the journey you’ve been on to improve that architecture over the past couple of years?

Lot of detail

Minton: There's a lot of detail behind the scenes here, and I'll start by saying it's not an easy journey. It’s a journey that you have to be on for a long time and you really have to understand where you want to place your investment to make sure that you can do this well.

One area where we've invested in heavily is our big-data infrastructure, being able to ingest all of the data in order to be able to track it all. We've also invested a lot in being able to get insights out of the data, using Hewlett Packard Enterprise (HPE) Vertica as our big data platform and being able to query that data in close to real time as possible to actually get those insights. I see those as the meat and potatoes that you have to have in order to be successful in this area.

On top of that, you then need to have an infrastructure that allows you to build personalization on the fly. You need to be able to make decisions in real time for the customers and you need to be able to do that in a very streamlined way where you can continuously improve.

We use a lot of tactics using machine learning and other predictive models to build that personalization on-the-fly as people are going through the application. That is some of our secret sauce and I will not go into in more detail, but that’s what we're doing at a high level.

Gardner: It might be off the track of our discussion a bit, but being able to glean information through analytics and then create a feedback loop into development can be very challenging for a lot of organizations. Is DevOps a cultural parallel path along with your data-science architecture?
HP Vertica Community Edition
Start Your Free Trial Now
Gain Access to All Features and Functionality
I don’t want to go down the development path too much, but it sounds like you're already there in terms of understanding the importance of applying big-data analytics to the compression of the cycle between development and production.

Minton: There are two different aspects there, Dana. Number one is making sure that we understand the traffic patterns of our customer and making sure that, from an operations perspective, we have the understanding of how our users are traversing our application to make sure that we are able to serve them and that their performance is just amazing every single time they come to our website. That’s number one.

Number two, and I believe more important, is the need to actually put the data in the hands of all of our employees across the board. We need to be able to tell our employees the areas where users are getting stuck in our application. This is high-level information. This isn't anybody's financial information at all, but just a high-level, quick stream of data saying that these people went through this application and got stuck on this specific area of the product.

We want to be able to put that type of information in our developer’s hands so as the developer is actually building a part of the product, she could say that I am seeing that these types of users get stuck at this part of the product. How can I actually improve the experience as I am developing it to take all of that data into account?

We have an analyst team that does great work around doing the analytics, but in addition to that, we want to be able to give that data to the product managers and to the developers as well, so they can improve the application as they are building it. To me, a 360-degree view of the customer is number one. Number two is getting that data out to as broad of an audience as possible to make sure that they can act on it so they can help our customers.

Major areas

Gardner: Joel, I speak with HPE Vertica users quite often and there are two major areas that I hear them talk rather highly of the product. First, has to do with the ability to assimilate, so that dealing with the variety issue would bring data into an environment where it can be used for analytics. Then, there are some performance issues around doing queries, amid great complexity of many parameters and its speed and scale.

Your applications for TurboTax are across a variety or platforms. There is a shrink-wrap product from the legacy perspective. Then you're more along the mobile lines, as well as web and SaaS. So is Vertica something that you're using to help bring the data from a variety of different application environments together and/or across different networks or environments?

Minton: I don't see different devices that someone might use as a different solution in the customer journey. To me, every device that somebody uses is a touch point into Intuit and into TurboTax. We need to make sure that all of those touch points have the same level of understanding, the same level of tracking, and the same ability to help our customers.

Whether somebody is using TurboTax on their computer or they're using TurboTax on their mobile device, we need to be able to track all of those things as first-class citizens in the ecosystem. We have a fully-functional mobile application that’s just amazing on the phone, if you haven’t used it. It's just a great experience for our customers.

From all those devices, we bring all of that data back to our big data platform. All of that data can then be queried, because you want to understand, many questions, such as when do users flow across different devices and what is the experience that they're getting on each device? When are they able to just snap a picture of their W-2 and be able to import it really quickly on their phone and then jump right back into their computer and finish their taxes with great ease?
You need to be able to have a system that can handle that concurrency and can handle the performance that’s going to be required by that many more people doing queries against the system.

We need to be able to have that level of tracking across all of those devices. The key there, from a technology perspective, is creating APIs that are generic across all of those devices, and then allowing those APIs to feed all of that data back to our massive infrastructure in the back-end so we can get those insights through reporting and other methods as well.

Gardner: We've talked quite a bit about what's working for you: a database column store, the ability to get a volume variety and velocity managed in your massive data environment. But what didn't work? Where were you before and what needed to change in order for you to accommodate your ongoing requirements in your architecture?

Minton: Previously we were using a different data platform, and it was good for getting insights for a small number of users. We had an analyst team of 8 to 10 people, and they were able to do reports and get insights as a small group.

But when you talk about moving to what we just discussed, a huge view of the customer end-to-end, hundreds of users accessing the data, you need to be able to have a system that can handle that concurrency and can handle the performance that’s going to be required by that many more people doing queries against the system.

Concurrency problems

So we moved away from our previous vendor that had some concurrency problems and we moved to HPE Vertica, because it does handle concurrency much better, handles workload management much better, and it allows us to pull all this data.

The other thing that we've done is that we have expanded our use of Tableau, which is a great platform for pulling data out of Vertica and then being able to use those extracts in multiple front-end reports that can serve our business needs as well.

So in terms of using technology to be able to get data into the hands of hundreds of users, we use a multi-pronged approach that allows us to disseminate that information to all of these employees as quickly as possible and to do it at scale, which we were not able to do before.
There's always going to be more data that you want to track than you have hardware or software licenses to support.

Gardner: Of course, getting all your performance requirements met is super important, but also in any business environment, we need to be concerned about costs.

Is there anything about the way that you were able to deploy Vertica, perhaps using commodity hardware, perhaps a different approach to storage, that allowed you to both accomplish your requirements, goals in performance, and capabilities, but also at a price point that may have been even better than your previous approach?

Minton: From a price perspective, we've been able to really make the numbers work and get great insights for the level of investment that we've made.

How do we handle just the massive cost of the data? That's a huge challenge that every company is going to have in this space, because there's always going to be more data that you want to track than you have hardware or software licenses to support.

So we've been very aggressive in looking at each and every piece of data that we want to ingest. We want to make sure that we ingest it at the right granularity.

Vertica is a high-performance system, but you don't need absolutely every detail that you’ve ever had from a logging mechanism for every customer in that platform. We do a lot of detail information in Vertica, but we're also really smart about what we move into there from a storage perspective and what we keep outside in our Hadoop cluster.

Hadoop cluster

We have a Hadoop cluster that stores all of our data and we consider that our data lake that basically takes all of our customer interactions top to bottom at the granular detail level.

We then take data out of there and move things over to Vertica, in both an aggregate as well as a detail form, where it makes sense. We've been able to spend the right amount of money for each of our solutions to be able to get the insights we need, but to not overwhelm both the licensing cost and the hardware cost on our Vertica cluster.

The combination of those things has really allowed us to be successful to match the business benefit with the investment level for both Hadoop and with Vertica.

Gardner: Measuring success, as you have been talking about quantitatively at the platform level, is important, but there's also a qualitative benefit that needs to be examined and even measured when you're talking about things like process improvements, eliminating bottlenecks in user experience, or eliminating anomalies for certain types of individual personalized activities, a bit more quantitative than qualitative.
We're actually performing much better and we're able to delight our internal customers to make sure that they're getting the answers they need as quickly as possible.

Do you have any insight, either anecdotal or examples, where being able to apply this data analytics architecture and capability has delivered some positive benefits, some value to your business?

Minton: We basically use data to try to measure ourselves as much as possible. So we do have qualitative, but we also have quantitative.

Just to give you a few examples, our total aggregate number of insights that we've been able to garner from the new system versus the old system is a 271 percent increase. We're able to run a lot more queries and get a lot more insights out of the platform now than we ever could on the old system. We have also had a 41 percent decrease in query time. So employees who were previously pulling data and waiting twice as long had a really frustrating experience.

Now, we're actually performing much better and we're able to delight our internal customers to make sure that they're getting the answers they need as quickly as possible.

We've also increased the size of our data mart in general by 400 percent. We've massively grown the platform while decreasing performance. So all of those quantitative numbers are just a great story about the success that we have had.

From a qualitative perspective, I've talked to a lot of our analysts and I've talked to a lot of our employees, and they've all said that the solution that we have now is head and shoulders over what we had previously. Mostly that’s because during those peak times, when we're running a lot of traffic through our systems, it’s very easy for all the users to hit the platform at the same time, instead of nobody getting any work done because of the concurrency issues.

Better tracking

Because we have much better tracking of that now with Vertica and our new platform, we're actually able to handle that concurrency and get the highest priority workloads out quickly, allow them to happen, and then be able to follow along with the lower-priority workloads and be able to run them all in parallel.

The key is being able to run, especially at those peak loads, and be able to get a lot more insights than we were ever able to get last year.
HP Vertica Community Edition
Start Your Free Trial Now
Gain Access to All Features and Functionality
Gardner: And that peak load issue is so prominent for you. Another quick aside, are you using cloud or hybrid cloud to support any of these workloads, given the peak nature of this, rather than keep all that infrastructure running 365, 24×7? Is that something that you've been doing, or is that something you're considering?

Minton: Sure. On a lot of our data warehousing solutions, we do use cloud in points for our systems. A lot of our large-scale serving activities, as well as our large scale ingestion, does leverage cloud technologies.

We don't have it for our core data warehouse. We want to make that we have all of that data in-house in our own data centers, but we do ingest a lot of the data just as pass-throughs in the cloud, just to allow us to have more of that peak scalability that we wouldn’t have otherwise.
The faster than we can get data into our systems, the faster we're going to be able to report on that data and be able to get insights that are going to be able to help our customers.

Gardner: We're coming up toward the end of our discussion time. Let’s look at what comes next, Joel, in terms of where you can take this. You mentioned some really impressive qualitative and quantitative returns and improvements. We can always expect more data, more need for feedback loops, and a higher level of user expectation and experience. Where would you like to go next? How do you go to an extreme focus even more on this issue of personalization?

Minton: There are a few things that we're doing. We built the infrastructure that we need to really be able to knock it out of the park over the next couple of years. Some of the things that are just the next level of innovation for us are going to be, number one, increasing our use of personalization and making it much easier for our customers to get what they need when they need it.

So doubling down on that and increasing the number of use cases where our data scientists are actually building models that serve our customers throughout the entire experience is going to be one huge area of focus.

Another big area of focus is getting the data even more real time. As I discussed earlier, Dana, we're a very peaky business and the faster than we can get data into our systems, the faster we're going to be able to report on that data and be able to get insights that are going to be able to help our customers.

Our goal is to have even more real-time streams of that data and be able to get that data in so we can get insights from it and act on it as quickly as possible.

The other side is just continuing to invest in our multi-platform approach to allow the customer to do their taxes and to manage their finances on whatever platform they are on, so that it continues to be mobile, web, TVs, or whatever device they might use. We need to make sure that we can serve those data needs and give the users the ability to get great personalized experiences no matter what platform they are on. Those are some of the big areas where we're going to be focused over the coming years.

Recommendations

Gardner: Now you've had some 20/20 hindsight into moving from one data environment to another, which I suppose is equivalent of keeping the airplane flying and changing the wings at the same time. Do you have any words of wisdom for those who might be having concurrency issues or scale, velocity, variety type issues with their big data, when it comes to moving from one architecture platform to another? Any recommendations you can make to help them perhaps in ways that you didn't necessarily get the benefit of?

Minton: To start, focus on the real business needs and competitive advantage that your business is trying to build and invest in data to enable those things. It’s very easy to say you're going to replace your entire data platform and build everything soup to nuts all in one year, but I have seen those types of projects be tried and fail over and over again. I find that you put the platform in place at a high-level and you look for a few key business-use cases where you can actually leverage that platform to gain real business benefit.

When you're able to do that two, three, or four times on a smaller scale, then it makes it a lot easier to make that bigger investment to revamp the whole platform top to bottom. My number one suggestion is start small and focus on the business capabilities.

Number two, be really smart about where your biggest pain points are. Don’t try to solve world hunger when it comes to data. If you're having a concurrency issue, look at the platform you're using. Is there a way in my current platform to solve these without going big?

Frequently, what I find in data is that it’s not always the platform's fault that things are not performing. It could be the way that things are implemented and so it could be a software problem as opposed to a hardware or a platform problem.
HP Vertica Community Edition
Start Your Free Trial Now
Gain Access to All Features and Functionality
So again, I would have folks focus on the real problem and the different methods that you could use to actually solve those problems. It’s kind of making sure that you're solving the right problem with the right technology and not just assuming that your platform is the problem. That’s on the hardware front.

As I mentioned earlier, looking at the business use cases and making sure that you're solving those first is the other big area of focus I would have.

Gardner: I'm afraid we will have to leave it there. We've been learning about how Intuit uses deep-data analytics to gain a 360-degree view of its TurboTax applications user behavior and preferences. And we have heard about how such visibility allows for rapid applications improvements, providing an extreme personalization level and enabling the user of TurboTax to experience a higher degree of customization, something tailored directly for their situation.

So join me in thanking Joel Minton, Director of Data Science and Engineering for TurboTax at Intuit in San Diego. Thanks so much, Joel.

Minton: Thank you, Dana. I really enjoyed it.

Gardner: And I'd also like to thank our audience for joining this big-data innovation case study discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how TurboTax uses big data analytics to improve performance despite high data volumes during peak usage. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Monday, July 20, 2015

How Big Data Powers GameStop to Gain Retail Advantage and Deep Insights into its Markets

Transcript of a BriefingsDirect discussion on how a gaming retailer uses big data to gather insights into sales trends and customer wants and needs.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we're focusing on how companies are adapting to the new style of IT to improve IT performance and deliver better user experiences, as well as better business results.

Our next innovation case study interview highlights how GameStop, based in Grapevine, Texas uses big data to improve how it conducts its business and serve its customers. To learn more about how they deploy big data and use the resulting analytics, we are joined by John Crossen, Data Warehouse Lead at GameStop. Welcome, John.

John Crossen: Thank you for having me.
Become a member of myVertica today
Register now
Access the FREE HP Vertica Community Edition
Gardner: Tell us a little bit about GameStop. Most people are probably familiar with the retail outlets that they see, where you can buy, rent, trade games, and learn more about games. Why is big data important to your organization?

Crossen: We wanted to get a better idea of who our customers are, how we can better serve our customers and what types of needs they may have. With prior reporting, we would get good overall views of here’s how the company is doing or here’s how a particular game series is selling, but we weren’t able to tie that to activities of individual customers and possible future activity of future customers, using more of a traditional SQL-based platform that would just deliver flat reports.

Crossen
So, our goal was to get s more 360-degree view of our customer and we realized pretty quickly that, using our existing toolsets and methodologies, that wasn’t going to be possible. That’s where Vertica ended up coming into play to drive us in that direction.

Gardner: Just so we have a sense of this scale here, how many retail outlets does GameStop support and where are you located?

Crossen:  We're international. There are approximately 4,200 stores in the US and another 2,200 international.

Gardner: And in terms of the type of data that you are acquiring, is this all internal data or do you go to external data sources and how do you to bring that together?

Internal data

Crossen: It's primarily internal data. We get data from our website. We have the PowerUp Rewards program that customers can choose to join, and we have data from individual cash registers and all those stores.

Gardner: I know from experience in my own family that gaming is a very fast-moving industry. We’ve quickly gone from different platforms to different game types and different technologies when we're interacting with the games.

It's a very dynamic changeable landscape for the users, as well as, of course, the providers of games. You are sort of in the middle. You're right between the users and the vendors. You must be very important to the whole ecosystem.

Crossen: Most definitely, and there aren’t really many game retailers left anymore. GameStop is certainly the preeminent one. So a lot of customers come not just to purchase a game, but get information from store associates. We have Game Informer Magazine that people like to read and we have content on the website as well.

Gardner: Now that you know where to get the data and you have the data, how big is it? How difficult is it to manage? Are you looking for real-time or batch? How do you then move forward from that data to some business outcome?

Crossen: It’s primarily batch at this point. The registers close at night, and we get data from registers and loads that into HP Vertica. When we started approximately two years ago, we didn't have a single byte in Vertica. Now, we have pretty close to 24 terabytes of data. It's primarily customer data on individual customers, as well Weblogs or mobile application data.
Become a member of myVertica today
Register now
Access the FREE HP Vertica Community Edition
Gardner: I should think that when you analyze which games are being bought, which ones are being traded, which ones are price-sensitive and move at a certain price or not, you're really at the vanguard of knowing the trends in the gaming industry -- even perhaps before anyone else. How has that worked for you, and what are you finding?

Crossen: A lot of it is just based on determining who is likely to buy which series of games. So you won't market the next Call of Duty 3 or something like that to somebody who's buying your children's games. We are not going to ask people buy Call of Duty 3, rather than My Little Pony 6.

The interesting thing, at least with games and video game systems, is that when we sell them new, there's no price movement. Every game is the same price in any store. So we have to rely on other things like customer service and getting information to the customer to drive game sales. Used games are a bit of a different story.

Gardner: Now back to Vertica. Given that you've been using this for a few years and you have such a substantial data lake, what is it about Vertica that works for you? What are learning here at the conference that intrigues you about the future?

Quick reports

Crossen: The initial push with HP Vertica was just to get reports fast. We had processes that literally took a day to run to accumulate data. Now, in Vertica, we can pull that same data out in five minutes. I think that if we spend a little bit more time, we could probably get it faster than half of that.

The first big push was just speed. The second wave after that was bringing in data sources that were unattainable before, like web-click data, a tremendous amount of data, loading that into SQL, and then being able to query it out of SQL. This wasn't doable before, and it’s made it do that. At first, it was faster data, then acquiring new data and finding different ways to tie different data elements together that we haven’t done before.

Gardner: How about visualization of these reports? How do you serve up those reports and do you make your inference and analytics outputs available to all your employees? How do you distribute it? Is there sort of an innovation curve that you're following in terms of what they do with that data?
We had processes that literally took a day to run to accumulate data. Now, in Vertica, we can pull that same data out in five minutes.

Crossen: As far as a platform, we use Tableau as our visualization tool. We’ve used a kind of an ad-hoc environment to write direct SQL queries to pull data out, but Tableau serves the primary tool.

Gardner: In that data input area, what integration technologies are you interested in? What would you like to see HP do differently? Are you happy with the way SQL, Vertica, Hadoop, and other technologies are coming together? Where would you like to see that go?

Crossen: A lot of our source systems are either SQL-server based or just flat files. For flat files, we use the Copy Command to bring data, and that’s very fast. With Vertica 7, they released the Microsoft SQL Connector.

So we're able to use our existing SQL Server Integration Services (SSIS) data flows and change the output from another SQL table to direct me into Vertica. It uses the Copy Command under the covers and that’s been a major improvement. Before that, we had to stage the data somewhere else and then use the Copy Command to bring it in or try to use Open Database Connectivity (ODBC) to bring it in, which wasn’t very efficient.

20/20 hindsight

Gardner: How about words of wisdom from your 20/20 hindsight? Others are also thinking about moving from a standard relational database environment towards big data stores for analytics and speed and velocity of their reports. Any advice you might offer organizations as they're making that transition, now that you’ve done it?

Crossen: Just to better understand how a column-store database works, and how that's different from a traditional row-based database. It's a different mindset, everything from how you are going to lay out data modeling.
Become a member of myVertica today
Register now
Access the FREE HP Vertica Community Edition
For example, in a row database you would tend to freak out if you had a 700-column table. In the column stores, that doesn’t really matter. So just to get in the right mindset of here’s how a column-store database works, and not try to duplicate row-based system in the column-store system.

Gardner: Great. I am afraid we’ll have to leave it there. I’d like to thank our guest, John Crossen, the Data Warehouse Lead at GameStop in Grapevine, Texas. I appreciate your input.

Crossen: Thank you.

Gardner: And also thank to our audience for joining us for this special new style of IT discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on how a gaming retailer uses big data to gather insights into sales trends and customer wants and needs. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Friday, May 29, 2015

How Tableau Software and Big Data Come Together: Strong Visualization Embedded on an Agile Analytics Engine

Transcript of a BriefingsDirect discussion on the interaction between a high-performance data analytics engine and insights presentation software that together gives users an unprecedented view into their businesses.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people's lives.

Gardner
Our next big data innovation discussion interview highlights how Tableau Software and HP Vertica come together to provide visualization benefits for those seeking more than just big-data analysis. They're looking for ways to improve their businesses effectively and productively.

So, in order to learn more, we're joined by Paul Lilford, Global Director of Technology Partners for Tableau Software, based in Seattle. Welcome, Paul.

Paul Lilford: Thanks, Dana. It’s great to be here.

Gardner: We're also here with Steve Murfitt, Director of Technical Alliances at HP Vertica. Welcome, Steve.

Steve Murfitt: Thank you. Great to be here.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Gardner: Why is the tag-team between Tableau and Vertica so popular. Every time I speak with some one using Vertica, they inevitably mention that they're delivering their visualizations through Tableau. This seems to be a strong match.

Lilford: We’re a great match primarily because Tableau’s mission is to help people see and understand data. We're made more powerful by getting to large data, and Vertica is one of the best at storing that. Their columnar format is a natural format for end users, because they don’t think about writing SQL and things like that. So, Tableau, as a face to Vertica, empowers business users to self serve and deliver on a depth of analytics that is unmatched in the market.

Lilford
Gardner: Now, we can add visualization to a batch report just as well as a real-time. streamed report. What is it about visualization that seems to be more popular in the higher-density data and a real-time analysis environment?

Lilford: The big thing there, Dana, is that batch visualization will always common. What’s a bigger deal is data discovery, the new reality for companies. It leads to becoming data driven in your organization, and making better-informed decisions, rather than taking a packaged report and trying to make a decision that maybe tells you how bad you were in the past or how good you might think you could be in the future. Now, you can actually have a conversation with your data and cycle back and forth between insights and decisions.

The combination of our two technologies allows users to do that in a seamless drag-and-drop environment. From a technical perspective, the more data you have, the deeper you can go. We’re not limiting a user to any kind of threshold. We're not saying, this is the way I wrote the report, therefore you can go consume it.

We’re saying, "Here is a whole bunch of data that may be a subject area or grouping of subject areas, and you're the finance professional or the HR professional. Go consume it and ask the questions you need answered." You're not going to an IT professional to say, "Write me this report and come back three months from now and give it to me." You’re having that conversation in real time in person, and that interactive nature of it is really the game changer. 

Win-win situation

Gardner:  And the ability for the big data analysis to be extended across as many consumer types in the organization as possible makes the underlying platform more valuable. So this, from HP's perspective must be a win-win. Steve?

Murfitt: It definitely is a win-win. When you have a fantastic database that performs really well, it's kind of uninteresting to show people just tables and columns. If you can have a product like Tableau and you can show how people can interact with that data, deliver on the promise of the tools, and try to do discovery, then you’re going to see the value of the platform.

Murfitt
Gardner: Let’s look to the future. We've recently heard about some new and interesting trends for increased volume of data with the Internet of Things, mobile, apps being more iterative and smaller, therefore, more data points.

As the complexity kicks in and the scale ramps up, what do you expect, Paul, for visualization technology and the interactivity that you mentioned? What do you think we're approaching? What are some of the newer aspects of visualization that makes this powerful, even as we seek to find more complexity?

Lilford: There are a couple of things. Hadoop, if you go back a year-and-a-half or so, has been moving from a cold-storage technology to more to a discovery layer. Some of the trends in visualization are predictive content being part of the everyday life.

Tableau democratizes business intelligence (BI) for the business user. We made it an everyday thing for the business user to do that. Predictive is in a place that's similar to where BI was a couple years ago, going to the data scientist to do it. Not that the data scientist's value wasn’t there, but it was becoming a bottleneck to doing things because you have to run it through a predictive model to give it to someone. I think that's changing.

So I think that predictive element is more and more part of the continuum here. You're going to see more forward-looking, more forecast-based, more regression-based, more statistical things brought into it. We’ll continue to innovate with some new visuals, but the standard visual is unstructured data.

This is the other big key, because 80 percent of the world's data is unstructured. How do you consume that content? Do you still structure it or can you consume it where it sits, as it sits, where it came in and how it is? Are there discoverers that can go do that?

You’re going to continue see those go. The biggest green fields in big data are predictive and unstructured. Having the right stores like Vertica to scale that is important, but also allowing anyone to do it is the other important part, because if you give it to a few technical professionals, you really restrict your ability to make decisions quickly.

Gardner: Another interesting aspect, when I speak to companies, is the way that they're looking at their company more as an analytics and data provider internally and externally. The United States Postal Service  view themselves in that fashion as an analytics entity, but also looking for business models, how to take data and analysis of data that they might be privy to and make that available as a new source of revenue.

I would think that visualization is something that you want to provide to a consumer of that data, whether they are internal or external. So we're all seeing the advent of data as a business for companies that may not have even consider that, but could.

Most important asset

Lilford: From our perspective, it's a given that it is a service. Data is the most important asset that most companies have. It’s where the value is. Becoming data driven isn’t just a tagline that we talk about or people talk about. If you want to make decisions and decisions that move your business, so being a data provider.

The best example I can maybe give you, Dana, is healthcare. I came from healthcare and when I started, there was a rule -- no social. You can't touch it. Now, you look at healthcare and nurses are tweeting with patients, "Don’t eat that sandwich. Don't do this."
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Data has become a way to lower medical costs in healthcare, which is the biggest expense. How do you do that? They use social and digital data to do that now, whereas five, seven years ago, we couldn't do it. It was a privacy thing. Now, it's a given part of government, of healthcare, of banking, of almost every vertical. How do I take this valuable asset I’ve got and turn it into some sort of product, market, or market advantage, whatever that is?

Gardner: Steve, anything more to offer on the advent or acceleration of the data-as-a-business phenomena?

Murfitt: If you look at what companies have been doing for such a long time, they have been using the tools to look at historical data to measure how they're doing against budget. As people start to make more data available, what they really want to do is compare themselves to their peers.
As people start to make more data available, what they really want to do is compare themselves to their peers.

If you're doing well against your budget, it doesn't mean to say you gaining or losing market share or how well you’re doing. So as more data is shared and more data is available, being able to compare to peers, to averages, to measure yourself not only internally, but externally, is going to help with people making their decisions.

Gardner: Now for those organizations out there that have been doing reports in a more of a traditional way that recognize the value of their data and the subsequent analysis, but are yet to dabble deeply into visualization, what are some good rules of the road for beginning a journey towards visualization?

What might you consider in terms of how you set up your warehouse or you set up your analysis engine, and then make tools available to your constituencies? What are some good beginning concepts to consider?

Murfitt: One of the most important things is start small, prove it, and scale it from there. The days of boiling the ocean to try come up with analytics only to find out it didn’t work are over.

Organizations want to prove it, and one of the cool things about doing that visually is now the person who knows the data the best can show you what they're trying to do, rather than trying to push a requirement out to someone and ask "What is it you want?" Inevitably, something’s lost in translation when that happens or the requirement changes by the time it's delivered.

Real-time conversation

You now have a real-time, interactive, iterative conversation with both the data and business users. If you’re a technical professional, you can now focus on the infrastructure that supports the user, the governance, and security around it. You're not focused on the report object anymore. And that report object is expensive.

It doesn’t mean that for compliance things the financial reports go away, it means you've right sized that work effort. Now, the people who know the data the best deliver the data, and the people who support the infrastructure the best support that infrastructure and that delivery.

It’s a shift. Technologies today do scale Vertica as a great scalable database. Tableau is a great self-service tool. The combination of the two allows you to do this now. If you go back even seven years, it was a difficult thing. I built my career being a data warehouse BI guy. I was the guy writing reports and building databases for people, and it doesn’t scale. At some point, you’re a bottleneck for the people who need to do their job. I think that's the biggest single thing in it.

Gardner: Another big trend these days is people becoming more used to doing things from a mobile device. Maybe it’s a “phablet,” a tablet, or a smartphone. It’s hard to look at a spreadsheet on those things more than one or two cells at a time. So visualizations and exercising your analytics through a mobile tier seem to go hand in hand. What should we expect there? Isn't there a very natural affinity between mobile and analysis visualization?
Most visuals work better on a tablet. Right-sizing that for the phone is going to continue to happen.

Lilford: We have mobile apps today, but I think you're going to see a fast evolution in this. Most visuals work better on a tablet. Right-sizing that for the phone is going to continue to happen, scaling that with the right architecture behind it, because devices are limited in what they can hold themselves.

I think you'll see a portability element come to it, but at the same time, this is early days. Machines are generating data, and we're consuming it at a rate at which it's almost impossible to consume. Those devices themselves are going to be the game changer.

My kids use iPads, they know how to do it. There’s a whole new workforce in the making that knows this and things like this. Devices are just going to get better at supporting it. We're in the very early phases of it. I think we have a strong offering today, and it's only going to get stronger in the future.

Gardner: Steve, any thoughts about the interception between Vertica, big data, and the mobile visualization aspect of that?

Murfitt: The important thing is having the platform that can provide the performance. When you're on a mobile device, you still want the instant access, and you want it to be real-time access. This is the way the market is going. If you go with the old, more traditional platforms that can’t perform when you're in the office, they're not going to perform when you are remote.

It’s really about building the infrastructure, having the right technology to be able to deliver that performance and that response and interactivity to the device wherever they are.

Working together

Gardner: Before we close, I just wanted to delve a little bit more into the details of how HP Vertica and Tableau software work. Is this an OEM, a partnership, co-selling, co-marketing? How do you define it for those folks out there who either use one or the other or neither of you? How should they progress to making the best of a Vertica and Tableau together?

Lilford:  We're a technology partnership. It’s a co-selling relationship, and we do that by design. We're a best-in-breed technology. We do what we do better than anyone else. Vertica is one of the best databases and they do what they do better than anyone else. So the combination of the two, providing customers options to solve problems, the whole reason we partner is to solve customer issues.

We want to do it as best-in-breed. That’s a lot what the new stack technologies are about, it’s no longer a single vendor building a huge solution stack. It's the best database, with the best Hadoop storage, with the best visualization, with the best BI tools on top of it. That's where you're getting a better total cost of ownership (TCO) over all, because now you're not invested in one player that can deliver this. You're invested in the best of what they do and you're delivering in real-time for people.
It's the best database, with the best Hadoop storage, with the best visualization, with the best BI tools on top of it.

Gardner: Last question, Steve, about the degree of integration here. Is this something that end user organizations can do themselves, are there professional services organizations, what degree of integration between Vertica and Tableau visualization is customary.

Murfitt: Tableau connects very easily to Vertica. There is a dropdown on the database connector saying, "Connect to Vertica.” As long as they have the driver installed, it works. And the way their interface works, they can start query and getting value from the data straight away.

Gardner: Very good. I'm afraid we will have to leave it there. We've been learning about how Tableau software and HP Vertica come together to provide a strong visualization capability on top of a highly scaling, agile, in near-real time analytics engine. I'd like to thank our guests, Paul Lilford, Global Director of Technology Partners at Tableau Software in Seattle. Thank you, Paul.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Lilford: Thank you.

Gardner: And we have been here with Steve Murfitt, Director of Technical Alliances at HP Vertica. Thank you, Steve.

Murfitt: Thank you.

Gardner: And a big thank you also to our audience for joining the discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect discussion on the interaction between a high-performance data analytics engine and insights presentation software that together gives users an unprecedented view into their businesses. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: