Showing posts with label Hadoop. Show all posts
Showing posts with label Hadoop. Show all posts

Wednesday, October 30, 2013

Learn How Visible Measures Tracks an Expanding Universe of Video and Viewer Use Big Data

Transcript of a BriefingsDirect podcast on how one company is able to track video viewing on the Internet in real time, despite massive amounts of data flowing in continuously.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time we’re coming to you directly from the recent HP Vertica Big Data Conference in Boston.

Our next innovation case study interview examines how video advertising solutions provider Visible Measures delivers impactful metrics on video use and patterns. To learn more about how Visible Measures measures, please join me now in welcoming our guest, Chris Meisl, Chief Technology Officer at Visible Measures Corp., based in Boston. Welcome. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Chris Meisl: Thanks for having me, Dana.

Gardner: Tell us a little bit about video metrics. It seems that this is pretty straightforward, isn't it? You just measure the number of downloads and you know how many people are watching a video -- or is there more to it?

Meisl: You'd think it would be that straight-forward. Video is probably the fastest growing component of the Internet right now. Video consumption is accelerating unbelievably. When you measure a video, not only you are looking at did someone view the video but how far they are into the video. Did they rewind it, stop it, or replay certain parts? What happened at the end? Did they share it?

Meisl
There are all kinds of events that can happen around a video. It's not like in the display advertising business, where you have an impression and you have a click. With video, you have all kinds of interactions that happen.

You can really measure engagement in terms of how much people have actually watched the video, and how they've interacted with a video while it's playing.

Gardner: This is an additional level of insight beyond what happened traditionally with television, where you need a Nielsen box or some other crude, if I could use that term, way of measuring. This is much more granular and precise.

Census based

Meisl: Exactly. The cable industry tried to do this on various occasions with various set-up boxes that would "phone home" with various information. But for the most part, like Nielsen, it's panel-based. On the Internet, you can be more census-based. You can measure every single video, which we do. So we now know about over half a billion videos and we've measured over three trillion video events.

Because you have this very deep census data of everything that's happened, you can use standard and interesting statistical processes to figure out exactly what's happening in that space, without having to extend a relatively small panel. You know what everyone is doing.

Gardner: And of course, this extends not only to programming or entertainment level of video, but also to the advertising videos that would be embedded or precede or follow from those. Right?

Meisl: Exactly. Advertising and video are interesting, because it's not just standard television-style advertising. In standard television advertising, there are 30-second spots that are translated into the Internet space as pre-roll, post-roll, mid-roll, or what have you. You're watching the content that you really want to watch, and then you get interrupted by these ads. This is something that we at Visible Measures didn't like very much.

We're promoting this idea of content marketing through video, and content marketing is a very well-established area. We're trying to encourage brands to use those kinds of techniques using the video medium.
The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

That means that brands will tell more extensive stories in maybe three- to five-minute video segments -- that might be episodic -- and we then deliver that across thousands of publishers, measure the engagement, measure the brand-lift, and measure how well those kinds of video-storytelling features really help the brand to build up the trust that they want with their customers in order to get the premium pricing that that brand has over something much more generic.

Gardner: Of course, the key word there was "measures." In order to measure, you have to capture, store, and analyze. Tell us a little bit about the challenges that you faced in doing that at this scale with this level of requirements. It sounds as if even the real-time elements of being able to feed back that information to the ad servers is important, too.

Meisl: Right. The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

Visible Measure started with measuring all video that’s out there. Everywhere we can, we work with publishers to instrument their video players so that we get signals while people are watching videos on their site.

For the publishers that don't want to allow us to instrument their players, then we can use more traditional Google spidering techniques to capture information on the view count, comment count, and things like that. We do that on a regular basis, a few times a day or at least once a day, and then we can build up metrics on how the video is growing on those sites.

Massive database

So we ended up building this massive database of video -- and we would provide information, or rather insight, based on that data, to advertisers on how well their campaigns were performing.

Eventually, advertisers started to ask us to just deliver the campaign itself, instead of giving just the insight that they would then have to try to convince various other ad platforms to use in order to get a more effective campaign. So we started to shift a couple of years ago into actual campaign delivery.

Now, we have to do more of a real-time analysis, because as you mentioned, you want to, in real time, figure out the best ways to target the best sites to send that video to, and the best way to tune that campaign in order to get the best performance for the brand.

Gardner: And so faced with these requirements, I assume you did some proofs of concept (POCs). You looked around the marketplace for what’s available and you’ve come up with some infrastructure that is so far meeting your needs.

Meisl: Yes. We started with Hadoop, because we had to build this massive database of video, and we would then aggregate the information in Hadoop and pour that into MySQL.
There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

We quickly got to the point where it would take us so long to load all that information into MySQL that we were just running out of hours in the day. It took us 11 hours to load MySQL. We couldn’t actually use the MySQL. It was a sharded MySQL cluster. We couldn’t actually use it while it was being loaded. So you’d have to have two banks of it.

You only have a 12-hour window. Otherwise, you’ve blown your day. That's when we started looking around for alternate solutions for storing this information and making it available to our customers. We elected to use HP Vertica -- this was about four years ago -- because that same 11-hour load took two hours in Vertica. And we're not going to run out of money buying hard drives, because they compress it. They have impressive compression.

Now, as we move more into the campaign delivery for the brands that we represent, we have to do our measurement in real-time. We use Storm, which is a real-time stream processing platform and that writes to Vertica as the events happen.

So we can ask questions of Vertica as they happen. That allows our ad service, for example, to have much more intelligence about what's going on with campaigns that are in-flight. It allows us to do much more sophisticated fraud detection. There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

Gardner: Clearly if a load takes 11 hours, you're well into the definition of big data. But I'm curious, for you, what constitutes big data? Where does big data begin from medium or non-big data?

Several dimensions

Meisl: There are several dimensions to big data. Obviously, there's the size of it. We process what we receive, maybe half a billion events per day, and we might peak at near a million events a minute. There is quite a bit of lunchtime video viewing in America, but typically in the evening, there is a lot more.

The other aspect of big data is the nature of what's in that data, the unstructured nature, the complexity of it, the unexpectedness of the data. You don't know exactly what you're going to get ahead of time.

For information that’s coming from our instrumented players, we know what that’s going to be, because we wrote the code to make that. But we receive feeds from all kinds of social networks. We know about every video that's ever mentioned on Twitter, videos that are mentioned on Facebook, and other social arenas.

All of that's coming in via all kinds of different formats. It would be very expensive for us to have to fully understand those formats, build schemas for them, and structure it just right.

So we have an open-ended system that goes into Hadoop and can process that in an open-ended way. So to me, big data is really its volume plus the very open-ended, unknown payloads in that data.
We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that.

Gardner: How do you know you're succeeding here? Clearly, going from 11 hours to two hours is one metric. Are there other metrics of success that you look to -- they could be economic, performance, or concurrent query volumes?

Tell me what you define as a successful analytics platform.

Meisl: At the highest level, it's going to be about revenue and margin. But in order to achieve the revenue and margin goals that we have, obviously we need to have very efficient processes for doing the campaign delivery and the measurement that we do.

As a measurement company, we measure ourselves and watch how long it takes to generate the reports that we need, or for how responsive we are to our customers for any kind of ad-hoc queries that they want or special custom reports that they want.

We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that. We have corporate goals to improve our optimization quarter-over-quarter.

In order to do that, you have to keep coming up with new things to measure and new ways to interpret the data, so you can figure out exactly which video you want to deliver to the right person, at the right time, in the right context.

Looking down the road

Gardner: Chris, we're here at the Big Data Conference for HP Vertica and its community. Looking down the road a bit, what sort of requirements do you think you are going to need later? Are there milestones or is there a road map that you would like to see Vertica and HP follow in order to make sure that you don't run out of runaway again sometime?

Meisl: Obviously, we want HP and Vertica to continue to scale up, so that it is still a cost-effective solution as the volume of data will inexorably rise. It's just going to get bigger and bigger and bigger. There's no going back there.

In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we want Vertica, in particular, to be very efficient at the kinds of queries that it needs to do and proficient at loading the data and of accommodating asking questions of it.
In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we would want Vertica.

In addition to that, what's particularly interesting about Vertica is its analytic functions. It has a very interesting suite of analytic functions that extends beyond the normal standard SQL analytic functions based on time series and pattern matching. This is very important to us, because we do fraud detection, for example. So you want to do pattern matching on that. We do pacing for campaigns, so you want to do time series analysis for that.

We look forward to HP and Vertica really pushing forward on new analytic capabilities that can be applied to real-time data as it flows into the Vertica platform.

Gardner: I'm afraid we'll have to leave it there. We've been learning about how Visible Measures measures and how they put together an analytic capability for video at some of the highest scales I've heard of. We've also learned how they have deployed HP Vertica as their analytics platform to provide better analytics and deliver better insights to their customers.

So, a big thank you to our guest, Chris Meisl, Chief Technology Officer at Visible Measures. Thank you, sir.

Meisl: Thank you, Dana.

Gardner: And thanks also to our audience for joining us for this special HP Discover Performance podcast, coming to you directly from the recent HP Vertica Big Data Conference in Boston.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for joining, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how one company is able to track video viewing on the Internet in real time, despite massive amounts of data flowing in continuously. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:

Thursday, October 17, 2013

Democratic National Committee Leverages Big Data to Turn Politics into Political Science

Transcript of a BriefingsDirect podcast on how a political campaign used big data to better understand and predict voter behavior and what was going on on the ground during the 2012 national elections.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time, we're coming to you directly from the recent HP Vertica Big Data Conference in Boston.

Our next innovation case study interview focuses on the big-data problem in the realm of political science. We'll learn how the Democratic National Committee (DNC) leveraged big data to better understand and predict voter behavior and alliances in the 2012 U.S. national elections.

To learn more about how the DNC pulled vast amounts of data together to predict and understand voter preferences and understanding of the issues, please join me in welcoming Chris Wegrzyn, Director of Data Architecture at the DNC, based in Washington, DC. Welcome, Chris. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Chris Wegrzyn: Hello. Thank you for having me.

Gardner: Like a lot of organizations, you had different silos of data and information, and you weren't able to do the analysis properly because of the distributed nature of the data and information. What did you do that allowed you to bring all that data together, and then also get the data assembled to bring out better analysis?

Wegrzyn: In 2008, we received a lot of recognition at that time for being a data-driven campaign and making some great leaps in how we improved efficiency by understanding our organization.

Wegrzyn
Coming out of that, those of us on the inside were saying this was great, but we have only really skimmed the surface of what we can do. We focused on some sets of data, but they're not connected to what people were doing on our website, what people were doing on social media, or what our donors were doing. There were all of these different things, and we weren’t looking at them.

Really, we couldn’t look at them. We didn't have the staff structure, but we also didn't have the technology platform. It’s hard to integrate data and do it in a way that is going to give people reasonable performance. That wasn't available to us in 2008.

So, fast forward to where we were preparing for 2012. We knew that we wanted to be able to look across the organization, rather than at individual isolated things, because we knew that we could be smarter. It's pretty obvious to anybody. It isn’t a competitive secret that, if somebody donates to the campaign, they're probably a good supporter. But unless you have those things brought together, you're not necessarily pushing that information out to people, so that they can understand.

We were looking for a way that we could bring data together quickly and put it directly into the hands of our analysts, and HP Vertica was exactly that kind of solution for us. The speed and the scalability meant that we didn't have to worry about making sure that everything was properly transformed and didn't have to spend all of this time structuring data for performance. We could bring it together and then let our analysts figure it out using SQL, which is very powerful, but pretty simple to learn.

Better analytic platform

Gardner: Until the fairly recent past, it wasn't practical, both from a cost and technology perspective, to try to get at all the data. But it has gotten to that point now. So when you are looking at all of the different data that you can bring to bear on a national election, in a big country of hundreds of millions of people, what were some of the issues you faced?

Wegrzyn: We hadn’t done it before. We had to figure it out as we were going along. The most important realization that we made was that it wasn't going to be a huge technology effort that was going to make this happen. It was going to be about analysts. That’s a really generic term. Maybe it's data scientists or something, but it's about people who were going to understand the political challenges, understand something about the data, and go in and find answers.

We structured our organization around being analyst-centric. We needed to build those tools and platforms, so that they could start working immediately and not wait on us on the technology side to build the best system. It wasn’t about building the best system, but it was about getting something where we could prototype rapidly.

Nothing that we did was worth doing if we couldn't get something into somebody's hands in a week and then start refining it. But we had to be able to move very, very quickly, because we were just under a constant time-crunch.
That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

Gardner: I would imagine that in the final two months and weeks of an election, things are happening very rapidly. To have a better sense of what the true situation on the ground is gives you an opportunity to best react to it.

It seems that in the past, it was a gut instinct. People were very talented and were paid very good money to be able to try to distill this insight from a perspective of knowledge and experience. What changed when you were able to bring the HP Vertica platform, big data, and real-time analysis to the function of an election?

Wegrzyn: Just about everything. There isn't a part of the campaign that was untouched by us, and in a lot of those places where gut ruled, we were able to bring in some numbers. This came down from the top campaign manager, Jim Messina. Out of the gate, he was saying that we have to put analytics in every part of the organization and we want to measure everything. That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

But the campaign was driven. We tested emails relentlessly. A lot of our program was driven by trying to figure out what works and then quantify that and go out and do more. One of our big successes is the most traditional of the areas of campaigns nowadays, media buying.

More valuable

There have been a bunch of articles that have come up recently talking about what the campaign did. So I'm not giving anything away. We were able to take what we understood about the electorate and who we wanted to communicate with. Rather than taking the traditional TV buying approach, which was we're going to buy this broad demographic band, buy a lot of TV news, and we are going to buy a lot of the stuff that's expensive and has high ratings amongst the big demographics. That’s a lot of wasted money.

We were able to know more precisely who the people are that we want to target, which was the biggest insight. Then, we were able to take that and figure out -- not the super creepy "we know exactly what you are watching" level -- but at an aggregate level, what the people we want to target are watching. So we could buy that, rather than buying the traditional stuff. That's like an arbitrage opportunity. It’s cheaper for us, but it's way more valuable.

So we were able to buy the right stuff, because we had this insight into what our electorate was like, and I think it made a big difference in how we bought TV.

Gardner: The results of your big data activities are apparent. As I recall, Governor Romney's campaign, at one point, had a larger budget for media, and spent a lot of that. You had a more effective budget with media, and it showed.

Another indication was that on election night, right up until the exit polls were announced, the Republican side didn't seem to know very clearly or accurately what the outcome was going to be. You seemed to have a better sense. So the stakes here are extremely high. What’s going to be the next chapter for the coming elections, in two, and then four years along the cycle?
How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Wegrzyn: That’s a really interesting question, and obviously it's one that I have had to spend a lot of time thinking about. The way that I think about the campaign in 2012 was one giant fancy office tower. We call it the Obama Campaign. When you have problems or decisions that have to be made, that goes up to the top and then back down. It’s all a very controlled process.

We are tipping that tower on its side now for 2014. Instead of having one big organization, we have to try to do this to 50, 100, maybe hundreds of smaller organizations that are going to have conflicting priorities. But the one thing that they have in common now is they saw what we did on the last campaign and they know that that's the future.

So what we have to do is take that and figure out how we can take this thing that worked very well for this one big organization, one centralized organization, and spread it out to all of these other organizations so that we can empower them.

They're going to have smaller staffs. They're going to have different programs. How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Gardner: It’s interesting, there are parallels between what you're facing as a political organization, with federation, local districts for Congress, races in the state level, and then of course to the national offices as well. This is a parallel to businesses. Many businesses have a large centralized organization and they also have distributed and federated business units, perhaps in other countries for global companies.

Feedback loop

Is there a feedback loop here, whereby one level of success, like you well demonstrated in 2012, leads to more of the federated, on-the-ground, distributed gathering and utilization of data that also then feeds back to the larger organization, so that there's a virtual adoption pattern that will benefit across the ecosystem? Is that something you are expecting?

Wegrzyn: Absolutely. Even within the campaign, once people knew that this tool was available, that they could go into HP Vertica and just answer any question about the campaign's operation, it transformed the way that people were thinking about it. It increased people's interest in applying that to new areas. They were constantly coming at us with questions like, "Hey, can we do this?" We didn't know. We didn’t have enough staff to do that yet.

One of our big advantages is that we've already had a lot of adoption throughout campaigns of some of the data gathering. They understand that we have to gather this data. We don't know what we are going to do with it, but we have them understanding that we have to gather it. It's really great, because now we can start doing smart things with it.

And then they're going to have that immediate reaction like, "Wow, I can go in there now and I can figure out something smart about all of the stuff that I put in and all of the stuff that I have been collecting. Now I want more." So I think we're expecting that it will grow. Sometimes I lose sleep about how that’s going to just grow and grow and grow.

Gardner: We think about that virtuous adoption cycle, more-and-more types of data, all the data, if possible, being brought to bear. We saw at the Big Data Conference some examples and use cases for the HAVEn approach for HP, which includes Vertica, Hadoop, Autonomy IDOL, Security, and ArcSight types of products and services. Does that strike a chord with you that you need to get at the data, but now that definition of the data is exploding and you need to somehow come to grips with that?
Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Wegrzyn: That's something that we only started to dabble in, things like text analysis, like what Autonomy can with that unstructured data, stuff that we only started to touch on on the campaign, because it’s hard. We make some use of Hadoop in various parts of our setup.

We're looking to a future, where we bring in more of that unstructured intelligence, that information from social media, from how people are interacting with our staff, with the campaign in trying to do something intelligent with that. Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Gardner: Well, great. I'm afraid we'll have to leave it there. We've been learning about how big data problems were handled in a handy fashion in the realm of political science. In fact, making it more scientific.

We've seen how the Democratic National Committee leveraged big data to better understand and predict voter behavior and what was going on on the ground during the 2012 national elections. We have seen how they've deployed HP Vertica analytics platform to better provide analytics and insights for their various analysts and the participants in the campaign.

So a big thank you to our guest, Chris Wegryzn, Director of Data Architecture for the DNC in Washington, DC. Thanks so much, Chris.

Wegrzyn: Thank you.

Gardner: And thanks also to our audience for joining this special HP Discover Performance Podcast coming to you from the recent HP Vertica Big Data Conference in Boston. 

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for joining, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how a political campaign used big data to better understand and predict voter behavior and what was going on on the ground during the 2012 national elections. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:

Tuesday, August 06, 2013

HP Vertica General Manager Sets Sights on Next Generation of Anywhere Analytics Platform

Transcript of a BriefingsDirect podcast on how HP Vertica is evolving to meet the needs of enterprises as data continues to grow.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time we’re coming to you directly from the HP Vertica Big Data Conference in Boston and we're delighted to welcome the General Manager of HP Vertica to his debut on BriefingsDirect.

Please join me in welcoming Colin Mahony, General Manager at HP Vertica. Good to have you with us, Colin. [Follow Colin on Twitter.] [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Colin Mahony: Thanks, Dana. It’s great to be here. I appreciate you having me.

Gardner: Well, it's been well over two years since HP acquired Vertica and, as we begin the inaugural 2013 Big Data Conference, how would you best characterize how Vertica has evolved since its founding back in 2005?

Mahony: Oh, wow. We’ve evolved quite a bit. It’s been a busy couple of years here, certainly post the acquisition. But I think at a high level, we’ve really shifted and expanded from being an MPP column store, very narrowly-focused database company, really into an analytic platform company.

With that comes several developments, obviously on the product side, but also as an organization, going through that maturation in terms of being able to operate at a global scale across the spectrum of what you would expect an analytics provider to offer.

Gardner: And how do you characterize the difference between a store and a platform? Are there many ecosystem players or is this an organic evolution of your capabilities or both?

Mahony: It’s both, the ecosystem and the tools that you interact with. And of course, we support a very rich and vibrant ecosystem of business-intelligencve (BI) tools, extract, transform and load (ETL) tools, and other types of management tools. Not just the ecosystem around it, but also looking within our own products.

Mahony
So it's adding a lot of the capabilities like backup and recovery, additional analytics capabilities beyond just standard SQL with the SDKs that Vertica supports, the ability to run both the procedural and the other types of code within the product, being able to express things like MapReduce beyond what a traditional database system would do.

Since the founding of the company, we've tried to take the best part of the database world and the best parts of the SQL world, but address the most challenging issues that traditional databases have had. So whether it is scalability or it’s being able to run things beyond SQL or it’s just the performance, those are all the things that we have taken into account while we built Vertica, and I think we have always been on the fast track to a platform.

We knew it would be a journey and we knew that building a product and a platform from the bottom up is not an easy thing, but we also knew that once we got there, once we sort of crossed that chasm, if you will, then all those decisions that made in the beginning about this product and building an engine from the bottom up would pay off.

Platform modularity

For probably the last year, that's where we’ve been. Right now, we're seeing that it’s easy to add functionality to the platform because of the modularity of the platform, and we can add that functionality without giving up any of the performance.

For me, it’s probably the most exciting time. Being part of HP offers us so many things that make it a lot easier to become a platform, not only on the development side, but a much greater ecosystem, a global scale, being able to support customers globally 24/7.

Gardner: This is a large conference. I'm pretty impressed with the attendance, but for our audience, this might be an introduction. Tell our listeners and readers a bit more about yourself and your background?

Mahony: I've been with Vertica since the beginning. In fact, long before Vertica, my background has always been databases. I've always loved computer science, and had a minor in computer science in my undergraduate degree. In my first job out of school, I was taking databases -- it's one of our competitors now, so I won't name them -- but I was using their database, and working with civilian US Government clients, and getting a lot of information published up to the web in the earliest days of the web.

I had a couple of other roles, but they were always very technology focused. Then I got my MBA on the business side and went into venture capital for seven years. That's where I met Mike Stonebraker, the founder of Vertica.
Those are all the things that we have taken into account while we built Vertica, and I think we have always been on the fast track to a platform.

I just loved the idea, everything I knew about databases and the challenges of traditional database and everything I knew about the new world order of information -- at the time we didn’t even talk about the term big data -- it just seemed to align really well.

So I decided to leave the dark side of venture capital and I jumped into something that I have been incredibly passionate about. If you look at that lifecycle even my own background with Vertica and where we’ve come, it’s just been a great. The timing was great and as always it takes a lot more than just great technology and great people.

There is definitely a lot of luck and timing, and I had the fortune of stepping into the right market at the right time, being part of a great team, and learning from a lot of great people along the way.

This is our first user conference. It’s ironic that we've never had one before, but I think also this is a testament to that scale I was referring to with what HP can bring. We have wanted a user conference since the beginning. Obviously, it takes some critical mass to get there which we now have, but also it takes the support of an organization that knows how to do these conferences and understand the value of them.

So it's just wonderful to be here. It’s wonderful to see all of these partners, customers, employees and friends of Vertica and HP here in Boston, of course Vertica’s hometown, so truly exciting.

Gardner: You mentioned the marketplace and the timing. I have to go back to that because in 2005, while scale and performance were very important. This whole notion of big data being so prevalent in the market really hadn't happened yet. What’s the state of the union, if you will, with this marketplace? Do more and more IT functions and business functions begin and end with Big data? It seems to be at the center of so many things.

Exponential growth

Mahony: It is. To go back to the founding of Vertica, I remember when Mike Stonebraker was giving the early presentations on the need for it. He talked a lot about the exponential growth of data and how that was outpacing any laws like Moore’s law or other hardware laws. So much information was being created, there was no way that just using more paralyzed hardware was going to be able to address the issue.

The state of the union back then was, just as you said, there was no such thing as big data, but I think Mike, as a visionary, knew what was going to happen in the industry. And it has happened.

It wasn’t a long time ago, but I remember that I was trying to find our first sample dataset that was over a terabyte and we had a difficult time finding it. When we would talk to the early customers, they looked at us like we were crazy when we were asking about a terabyte.

We have an easy time now finding terabytes of data. The state of the union today is that what's driving so much around big data is that you have obviously the volume, variety, and velocity that we talk about often, but what's really driving those three things is human information, whether it's social media, tweets, or expressive content that’s just so prevalent right now, as well machine information.

If you look at the traditional structured database market by any number, it’s a small percentage of the amount of data that’s out there. The strength of Vertica, and really the strength of HP overall, is that we have the best assets for the unstructured human information in Autonomy, as well as the best assets when it comes to machine information and large data.
When we would talk to the early customers, they looked at us like we were crazy when we were asking about a terabyte.

That has some structure. It’s semi-structured information, but it’s not your traditional transaction system. The power of all of that data comes together when you can have an engine that applies some structure to it and then is able to deliver the analytics that the organization needs. It's both IT as well as line of business, and even this new category we often talk about, which is the data scientist.

One of the great things about this show here is that we’ve got Billy Beane of Moneyball fame as our keynote speaker. The reason that we wanted Billy to come speak here is that Moneyball is exactly what’s happening right now in the world when it comes to big data.

You have the data scientist or the statistician, you have the line of business folks, and you have IT. They all have a part to play in the success of how information is used in companies. By bringing them together and by making the software that much easier for them to come together and solve these problems, you can create very real and differentiated value within organization.

So Moneyball is exactly what’s happening, certainly in corporate America, but also in government and in many other institutions that want to leverage information to be more efficient and create a competitive advantage.

Gardner: Before we delve into the latest and greatest with Vertica, let’s put some context around this. It’s only been a few months since the HP Discover 2013 Conference in Las Vegas where the HAVEn Initiative was announced. This puts Vertica in a very prominent place among other HP properties, technologies, platforms and approaches to solving this big data issue. Recap for us, if you would, what HAVEn is and why Vertica formed such an important pillar for this larger HP initiative?

Big-data lake

Mahony: What companies are looking for is this notion of the big-data lake. To me, it can mean many different things, but at the end of the day, companies want to take all the information assets that they have and they want to put them into a safe place, but a place where access to that information can be used by many different constituencies, whether it's IT, line of business, or data scientist.

So the notion of having a safe place, a harbor, or a port is what we announced as HP HAVEn, which is HP’s big data platform. It is primarily for analytics, but it can be used for just about anything when it comes to information and data.

What's so important about information right now is that there are different constituencies in the companies that want to take the information. First of all they want to capture all the information, not just structured, not just unstructured, but 100 percent of their information.

They want to get it to a place where they can leverage it and use it for a lot of different use cases, but the first part is get that information into the right place. For us, that is one of three components of HAVEn, which is the connectors.

We have over 700 connectors as part of HAVEn coming from Autonomy, coming from our Enterprise Security Group, the ArcSight core Logger and those connectors. That can be human information, extreme log information, or traditional database structured information.
They're driven by vast volumes of information and they close the loop, meaning that the experiences that are happening with an application.

Step one is the connectors to get these components. Step two is to put that data into the best engine for that data. Vertica obviously is one component, but you also have the Autonomy IDOL Engine, you have the ArcSight Logger engine, and also open-source technologies like Hadoop, which is actually the HP HAVEn. So we’ve got a place to put the information.

Step three is any N number of applications. What I'm seeing happening in the industry right now is just like we went from mainframe to client-server, and client-server to LAN, we're in a period now where applications are being developed. They're certainly web-based and distributed, but they're also analytical in nature.

They're driven by vast volumes of information and they close the loop, meaning that the experiences that are happening with an application, if you're driving a car, or whatever it might be, information is being passed, closed loop, back to a system that can then optimize the experience. That is creating a new class of applications.

For that new class of applications, you need the platform to be able to drive those. What we're bringing together in HAVEn is Hadoop, Autonomy, Vertica, Enterprise Security, core assets, and the N number of applications.

At Discover, we announced some of our own internal applications, which are powered by the HAVEn platforms. We announced our HP Analytics offering, which is built using Hadoop, Vertica, Enterprise Security, and Autonomy assets.

About community

We're making some of our own applications, but this is about the community and getting people to be able to build new set of applications that can use these components to really change how people are interacting with their data.

That’s HAVEn, and I am always careful to point out to people that HAVEn itself is not a product, but it's a platform and it’s a broader platform than the one that is just Vertica, Autonomy, or Enterprise Security. It’s a platform where 1+1+1+1+1, instead of equaling 5, should equal 8 or 10 or 12, and that's the goal. Of course, it's also a roadmap into areas that each of these components are working on to bring those closer together. So it’s exciting.

Gardner: Let’s look a bit more specifically at Vertica and try to factor why it’s differentiated in the market, but then also get a sense of where it’s going.

One of the things that strikes me about the market nowadays is that there seems to be a sense of tradeoffs going on when organizations are trying to pick their data engine or their platform. They have a set of value on one side, but it’s opposed by value on the other. They can’t have everything. One size does not fit all.

So how are you at Vertica able to help people deal with these tradeoffs that they're facing when it comes to a next-generation data platform?
Vertica was founded on the premise that one size does not fit all.

Mahony: Before I explain the tradeoffs, I couldn’t agree with you more, Dana. In fact, Vertica was founded on the premise that one size does not fit all. Using a single OLTP transactional database to do everything, including analytics, just doesn't make a lot of sense.

If you think about the areas that the people have to trade off, usually it’s scale for performance or analytics functionality for performance. One of things that I've spent a lot of time looking at is, especially over the last couple of years, is just some of the alternative platforms, not just for analytics, but for all of the different data needs.

You can take something like Hadoop as an example. Hadoop really is a distributed file system and has capabilities to run rudimentary analytics and transform processed data. But I think what people love about Hadoop is that it's really easy to load data into Hadoop. You don't have to define the schema or anything.

Instead of schema on write or load time, it’s schema on read time. People like that. They also like at least the perception that it is free and the scalability of it. On the database side, what people love about the database is that you're going to get really good performance, because the data is structured. If you're using a NexGen MPP platform like Vertica, you'll get the performance of the scalability.

So what we’re trying to do and what we've always done a pretty good job of at Vertica is look at the things that would make sense for Vertica to do. We look at expanding the platform in ways that, number one, we have the expertise and the capability to do, not only from the development standpoint, but from the support standpoint. And number two, we have the ability to create something differentiated. If we don't, or it’s not core, then we won’t do it, sticking to the purity of one size doesn’t fit all.

Hadoop-like

We've been doing a lot of work in areas like making it easier to get the data into the platform, doing more with it, making it seem much more like a Hadoop-like environment. You can look at our past releases and see that there's been a lot of work done on that and we continue to make those investments.

One thing has been consistent at Vertica since the beginning. What we focus on is to make it really easy for people to get information onto the platform. Then, we make sure we continue to deliver new capabilities, performance, and functionality within the platform.

We make sure we’re enabling our customers and partners to deploy Vertica anywhere and everywhere, whether it’s cloud appliances, software, or the like. Those are the three tenets of the company. It’s all around this notion of making data matter and help people make better decisions that lead to better outcomes with superior information.

There's so much that can be done in this space, but I think the key for us is to focus on the things that we know we do really well. The good news is that it's such a large space with so many demands that we know we can make a huge impact without trying to take on the world. We know we can make a huge impact in what we’re doing.

I think you'll continue to see some interesting developments along the lines of what I'm describing, and it's very much in line with where we've been.
No matter what on-ramp they take, they tend to find a lot of the other capabilities once they get on.

Gardner: While we're at the user conference, there are some great use cases and some examples. It's one of my favorite points of communication that it's always better to show than to tell.

Of the various user organizations and use cases here, are there are any that jump at you personally when you think about what Vertica started out as and what it became? Are there any ways that some users are putting this to work to really capture, "This is what we intended, and this is what we went through those paces to allow, to encourage, and to now see the fruits of?"

So, from all of the happenings here with the conference, what sort of gets your blood flowing?

Mahony: One thing I've certainly noticed over the years with our customers is that the shiny object of why a customer chooses Vertica may look very different across our customers. For some, it's the price. For some, it's the performance and the scale, massive volumes. For some it's a particular analytic function or several pattern matching capabilities. And for others, it's something entirely different.

But what's so exciting, especially about this conference, is that no matter what on-ramp they take, they tend to find a lot of the other capabilities once they get on. Hopefully, here at the conference, we're going to accelerate some of that just by getting our customers and our partners together in an environment where they can share stories.

Partners and customers

In fact, if you look at the agenda for the conference, it's very light on Vertica presentations. It's very heavy on partner and customer presentations, because this is the time that we want our partners and our customers to learn from each other. We want them to talk about how they are using it.

To answer your question directly, what gets me most jazzed up is when a customer is taking advantage of nearly everything that we do. Again, it's a cycle. It's not something that can happen immediately.

There are so many customers here that have been with us for four or five years and had just been great partners for the Vertica organization in terms of the feature we are developing and the direction that we are taking the product. They tend to be the ones who are using just about every feature in the product. So it gets me really excited.

I have got a customer that's got massive volumes of information, lot of diversity in the information, many different lines of business constituents who are accessing the information, data scientists, DBAs, programmers, different people who are creating applications and keeping the system up and through all that change in the organization.

Sometimes it's not only change in the organization, but potentially change in the industry and changing the way that people are interacting with data and may be changing healthcare outcomes, or drastically improving the quality of mobile phone service or other types of services.
It is about the connection between our customers and our partners, so that they can talk to each other.

So there isn't any one customer of whom I'd say, "You have to go see these guys." The reality is that you should see all of our customers and hear what they have to say. For me, that's the most important part of this conference.

It is about the connection between our customers and our partners, so that they can talk to each other. We can just be a fly on the wall and listen to some of the things that they're saying, good, bad, or ugly -- hopefully very good. But we can even hear things that they want us to improve. That's an important part of any company, certainly a software company, and that's what we're hoping to get out of it. For our customers and partners, they're going to get a lot of out of this just by talking to each other.

Gardner: Colin, what about the notion of business transformation. We've been hearing about this for 30 years. It's been big part of the academic work in business schools. Process re-engineering has evolved into balanced scorecards, and the flavor of the day is about how to change the nature of companies.

But it strikes me that this whole greater than the sum of the parts that you alluded to earlier, where data and analytics is made more available across easier applications to morph that, is inside the company that can then access more types of information across the boundaries of the organization into supply chain and ecosystems.

Getting more detailed information in real time about the customers and the marketplace probably has as much or more of a opportunity to transform businesses than just about anything else that's happened, with the possible exception of the Internet itself, over the past 20 years.

More than technology

So without going too much into a hype curve, the interest of the incredible amount of attention paid to big data in the past few years is about more than the technology. It's really about an empirical data-driven approach, a cultural shift if you will, within businesses. How you have been seeing that manifest itself here at the conference?

Mahony: It's an enormous opportunity for business transformation and definitely the  whole is greater than the sum of the parts. What makes companies really successful with information is not trying to boil the ocean, not trying to do a traditional enterprise data warehouse project that's going to take 24 months, if you're lucky, 36 most likely.

They’ll end up with some monolithic inflexible platform that will probably be outdated by the time it gets deployed. What is making a lot of companies successful is they find a particular use, they find a problem area that they want to drill down on, and they mobilize to do it.

For that, they need a solution that is quickly deployed, but also has that capability to become something much larger. Whether it's Vertica, Talend, or any of the other portfolios that we offer, we strive to make sure that somebody can get up and running quickly, whether it's Autonomy and human information analytics, Vertica and machine data or other types of transactional structured data.

The most important thing is that you find that business case, you focus on it, and prove very quickly. There's something we refer to as “Time to Terabyte,” which is less than a month, typically for Vertica. You get a return on investment (ROI) in less than a month for the investments that you made. If you prove that out, then everybody in the organization is happy, the line of business, the technology folks in IT, even the statisticians, data scientists.
It's not just about faster speeds and feeds. It's about fundamentally stepping back and asking how we're running this business.

From there, you start expanding the project, and that's exactly how we win most of our customers. We very rarely go in and say, "Buy an enterprise license for our product across the company." We certainly do those, but more typically we get into a business unit, we find the acute pain, and we solve that problem.

What they're betting on is the ability for us to expand and for them to expand in this platform. That's why we are, on the one hand, all about the platform and the integration, but on the other hand, not about to lose the flexibility and the modularity of what we do, because that's also a huge differentiator for HP's portfolio.

I think that this is a wonderful time in the world of business transformation, and I think, unlike what has been talked about for the last 30 years, you now have the data that can back it up and prove it in real-time to the organization.

That's the big difference. You gave the balanced scorecard as an example. If you look at the balance scorecard methodology, you can take that methodology and drill down into a thousand fields of detail and be able to get that information in real time. That's the opportunity here, and that's I think why this market is so huge.

It's not just about faster speeds and feeds. It's about fundamentally stepping back and asking how we're running this business. What assets, especially information assets, do we have that could dramatically boost the productivity to the same extent that computers, when they were first introduced, boosted productivity. That's the goal that everybody is looking for when it comes to information.

Cloud and hybrid

Gardner: For our last item today, I wonder if we could take out our crystal ball apparatus and try to do a little blue-sky thinking. One of the other big trends these days of course is cloud computing and hybrid models for the distribution of workloads for applications, but also for data. I'm wondering, as we go down this journey over the next year or two, how do big data and cloud computing come together?

There's this notion of an analytics platform-as-a-service (PaaS) deploy for developers, but now maybe more for data scientists and for those that are doing BI and other analytic chores. How do you foresee some of this whole greater than the sum of the parts extending beyond the technical capabilities into the deployment models and what is that portend, for  additional paybacks or payoffs?

Mahony: As I mentioned in terms of the three things that we are focused on, number one is make it easy to get data into the platform. Number two is do a lot more with the platform, so that there is better analytic capabilities, better pattern matching, and better analytics packs on top of it.

Number three is make sure you can deploy Vertica everywhere, and in the everywhere and anywhere categories, the cloud is certainly the first name that comes to mind. That is absolutely the future of computing. In some ways, I guess, it's the past, but it's interesting how the past repeats itself.
All these activities that are happening up on the cloud are generating a lot of information, information that will be analyzed, I'm sure, in many different ways.

We do run Vertica on hosted environments like Amazon cloud. We're in a private beta on the HP Cloud Service. So there are definitely offerings and developments that that has been underway here at Vertica for a while.

We embrace that, and to us, it's not mutually exclusive. What you described in the hybrid environment where you can run certain things locally. You can burst up to the cloud to do other workloads, especially if you're looking to pull some quick processing power and storage. That's going to be the future and that's the way, just like any other utilities, that we're going to consume some of these capabilities.

This is one of the strengths of a company the size and scale of HP. We have these offerings, whether it's software only, appliance, or cloud. We have the ability to deliver however the customer wants it, and we can also provide not only the flexible technologies, but the flexible business capabilities to make that happen with a lot of ease.

It's an exciting time. If you look at the pillars of the HP, we have cloud, mobility, big data, and security. All four of those pillars tie well into one another, because they're all related. Of course, all these activities that are happening up on the cloud are generating a lot of information, information that will be analyzed, I'm sure, in many different ways.

So it's something that kind of feeds on itself, the same way the mobility does. All of that is a good thing for the analytic space, wherever it is. The final thing I would say is that  the most important thing about analytics is that you do want it embedded into the various applications, just like when you are driving a car, you just want the GPS system to tell you where you are going.

Analytics is the same. You want it within the context of whatever it is that you are doing. Given that so many things are going to be served off the cloud, it's natural that that's the place that will host some of the analytics as well.

So it's an incredibly exciting time, and we're looking forward to having many more of these User Conferences and are certainly going to enjoy the rest of the show this week.

Gardner: Well great. I'm afraid we will have to leave it there. We've been learning more about the ongoing evolution of the HP Vertica platform and its capabilities, and we've developed better understanding about Vertica's growing role and making among the most challenging big data analytic chores more successful and impactful.

So, join me in extending a huge thank you to our special guest Colin Mahony, General Manager at HP Vertica. Thanks so much.

Mahony: Thank you, Dana. [Follow Colin on Twitter.]

Gardner: And also thank you to our audience for joining us for this special HP Discover Performance podcast, coming to you from the HP Vertica Big Data Conference in Boston.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions; your host for this ongoing series of HP sponsored discussions. Thanks again for listening and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how HP Vertica is evolving to meet the needs of enterprises as data continues to grow. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:


Monday, July 22, 2013

HP Vertica Architecture Gives Massive Performance Boost to Toughest BI Queries for Infinity Insurance

Transcript of a BriefingsDirect podcast on how a major insurance company is using improved data architecture to gain a competitive advantage.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we're focusing on how IT leaders are improving their services' performance to deliver better experiences and payoffs for businesses and end users alike, and this time we're coming to you directly from the HP Discover 2013 Conference in Las Vegas.

Our next innovation case study interview highlights how Infinity Insurance Companies in Birmingham, Alabama has been deploying a new data architecture to improve productivity for their analysis and business intelligence (BI). [Learn more about the upcoming Vertica conference in Boston Aug. 5.]

To learn more about how they are improving their performance and their results for their business activities, please join me in welcoming our guest, Barry Ralston, Assistant Vice President for Data Management at Infinity Insurance Companies. Welcome, Barry. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Barry Ralston: Thanks for having me, Dana.

Gardner: You're welcome. Tell me a bit about the need for change. What was it that you've been doing with your BI and data warehousing that prompted you to seek an alternative?

Ralston: Like many companies in our space, we have constructed an enterprise data warehouse deployed to a row-store technology. In our case, it was initially Oracle RAC and then, eventually, the Oracle Exadata engineered hardware/software appliance.

Ralston
We were noticing that analysis that typically occurs in our space wasn’t really optimized for execution via that row store. Based on my experience with Vertica, we did a proof of concept with a couple of other alternative and analytic store-type databases. We specifically chose Vertica to achieve higher productivity and to allow us to focus on optimizing queries and extracting value out of the data.

Gardner: Before we learn more about how that’s worked out for you, maybe you could explain for our listeners’ benefit, what Infinity Insurance Companies does. How big are you, and how important is data and analysis to you?

Ralston: We are billion-dollar property and casualty company, headquartered in Birmingham, Alabama. Like any insurance carrier, data is key to what we do. But one of the things that drew me to Infinity, after years of being in a consulting role, was the idea of their determination to use data as a strategic weapon, not just IT as a whole, but data specifically within that larger IT as a strategic or competitive advantage.

Vertica environment

Gardner: You have quite a bit of internal and structured data. Tell me a bit what happened when you moved into a Vertica environment, first to the proof of concept and then into production?

Ralston: For the proof of concept, we took the most difficult or worst-performing queries from our Exadata implementation and moved that entire enterprise data warehouse set into a Vertica deployment on three Dual Hex Core, DL380 type machines. We're running at the same scale, with the same data, with the same queries.

We took the top 12 worst-performing queries or longest-running queries from the Exadata implementation, and not one of the proof of concept queries ran less than 100 times faster. It was an easy decision to make in terms of the analytic workload, versus trying to use the row-store technology that Oracle has been based on.

Gardner: Let’s dig into that a bit. I'm not a computer scientist and I don’t claim to fully understand the difference between row store, relational, and the column-based approach for Vertica. Give us the quick "Data Architecture 101" explanation of why this improvement is so impressive? [Learn more about the upcoming Vertica conference in Boston Aug. 5.]

Ralston: The original family of relational databases -- the current big three are  Oracle, SQL Server and DB2 -- are based on what we call row-storage technologies. They store information in blocks on disks, writing an entire row at a time.

If you had a record for an insured, you might have the insured's name, the date the policy went into effect, the date the policy next shows a payment, etc. All those attributes were written all at the same time in series to a row, which is combined into a block.
It’s an optimal way of storing data for transaction processing.

So storage has to be allocated in a particular fashion, to facilitate things like updates. It’s an optimal way of storing data for transaction processing. For now, it’s probably the state-of-the-art for that. If I am running an accounting system or a quote system, that’s the way to go.

Analytic queries are fundamentally different than transaction-processing queries. Think of the transaction processing as a cash register. You ring up a sale with a series of line items. Those get written to that row store database and that works well.

But when I want to know the top 10 products sold to my most profitable 20 percent of customers in a certain set of regions in the country, those set-based queries don’t perform well without major indexing. Often, that relates back to additional physical storage in a row-storage architecture.

Column store databases -- Vertica is a native column store database -- store data fundamentally differently than those row stores. We might break down a record into an entire set of columns or store distinctly. This allows me to do a couple of different things from an architectural level.

Sort, compress, organize

First and foremost, I can sort, compress, and organize the data on disk much more efficiently. Compression has been recently added to row-storage architectures, but in a row-storage database, you largely have to compress at the entirety of a row.

I can’t choose an optimal compression algorithm for just a date, because in that row, I will have text, numbers, and dates. In a column store, I can apply specific compression algorithm to the data that's in that column. So date gets one algorithm, a monotone increasing key like a surrogate key you might have in a dimensional data warehouse, has a different encoding algorithm, etc.

This is sorting. How data gets retrieved is fundamentally different, another big point for row-storage databases at query time. I could say, "Tell me all the customers that bought a product in California, but I only want to know their last name."

If I have 20 different attributes, a row-storage database actually has to read all the attributes off of disk. The query engine eliminates the ones I didn’t ask for in the eventual results, but I've already incurred the penalty of the input-output (I/O). This has a huge impact when you think of things like call detail records in telecom which have a 144-some odd columns.

If I'm only asking against a column store database, "Give me all the people who have last names, who bought a product in California," I'm essentially asking the database to read two columns off disk, and that’s all that’s happening. My I/O factors are improved by an order of 10 or in the case of the CDR, 1 in 144.
The great question is what ends up being the business value.

Gardner: Fundamentally it’s the architecture that’s different. You can’t just go back and increase your I/O improvements in those relational environments by making it in-memory or cutting down on the distance between the data and the processing. That only gets you so far, and you can only throw hardware at it so much. Fundamentally, it’s about the architecture.

Ralston: Absolutely correct. You've seen a lot of these -- I think one of the fun terms around this is "unnatural acts with data," as to how data gets either scattered or put into a cache or other things. Every time you introduce one of these mechanisms, you're putting another bottleneck between near real-time analytics and getting the data from a source system into a user’s hands for analytics. Think of a cache. If you’re going to cache, you’ve got to warm that cache up to get an effect.

If I'm streaming data in from a sensor, real-time location servers, or something like that, I don’t get a whole lot of value out of the cache to start until it gets warmed up. I totally agree with your point there, Dana, that it’s all about the architecture.

Gardner: So you’ve gained on speed and scale, and you're able to do things you couldn’t do differently when it comes to certain types of data. That’s all well and good for us folks who are interested in computers. What about the people who are interested in insurance? What were you able to bring back to your company that made a difference for them and their daily business that’s now allowed you to move beyond your proof of concept into wider production?

Ralston: The great question is what ends up being the business value. In short, leveraging Vertica, the underlying architecture allows me to create a playfield, if you will, for business analysts. They don’t necessarily have to be data scientists to enjoy it and be able to relate things that have a business relationship between each other, but not necessarily one that’s reflected in the data model, for whatever reason.

Performance suffers

Obviously in a row storage architecture, and specifically within dimensional data warehouses, if there is no index between a pair of columns, your performance begins to suffer. Vertica creates no indexes and it’s self-indexing the data via sorting and encoding.

So if I have an end user who wants to analyze something that’s never been analyzed before, but has a semantic relationship between those items, I don’t have to re-architect the data storage for them to get information back at the speed of their decision.

Gardner: You've been able to apply the Vertica implementation to some of your existing queries and you’ve gotten some great productivity benefits from that. What about opening this up to some new types of data and/or giving your users the folks in the insurance company the opportunity to look to external types of queries and learn more about markets, where they can apply new insurance products and grow the bottom line rather than just repay cowpaths?

Ralston: That's definitely part of our strategic plan. Right now, 100 percent of the data being leveraged at Infinity is structured. We're leveraging Vertica to manage all that structured data, but we have a plan to leverage Hadoop and the Vertica Hadoop connectors, based on what I'm seeing around HAVEn, the idea of being able to seamlessly structured, non-structured data from one point.
Then, I’ve delivered what my CIO is asking me in terms of data as a competitive advantage.

Insurance is an interesting business in that, as my product and pricing people look for the next great indicator of risk, we essentially get to ride a wave of that competitive advantage for as long a period of time as it takes us to report that new rate to a state. The state shares that with our competitors, and then our competitors have to see if they want to bake into their systems what we’ve just found.

So we can use Vertica as a competitive hammer, Vertica plus Hadoop to do things that our competitors aren’t able to do. Then, I’ve delivered what my CIO is asking me in terms of data as a competitive advantage.

Gardner: Well, great. I'm afraid we will have to leave it there. We've been learning about how Infinity Insurance Companies has been deploying HP Vertica technology and gaining scale and speed benefits. And now also setting themselves up for perhaps doing types of queries that they hadn’t been able to do before.

I’d like to thank our guest for joining us, Barry Ralston, Assistant Vice President for Data Management at Infinity Insurance company. Thank so much, Barry. [Learn more about the upcoming Vertica conference in Boston Aug. 5.]

Ralston: Thank you very much.

Gardner: I’d like to thank our audience as well for joining us for this special HP Discover Performance Podcast, coming to you from the HP Discover 2013 Conference in Las Vegas.

I'm Dana Gardner, Principle Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how a major insurance company is using improved data architecture to gain a competitive advantage. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in: