Wednesday, October 30, 2013

Learn How Visible Measures Tracks an Expanding Universe of Video and Viewer Use Big Data

Transcript of a BriefingsDirect podcast on how one company is able to track video viewing on the Internet in real time, despite massive amounts of data flowing in continuously.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time we’re coming to you directly from the recent HP Vertica Big Data Conference in Boston.

Our next innovation case study interview examines how video advertising solutions provider Visible Measures delivers impactful metrics on video use and patterns. To learn more about how Visible Measures measures, please join me now in welcoming our guest, Chris Meisl, Chief Technology Officer at Visible Measures Corp., based in Boston. Welcome. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Chris Meisl: Thanks for having me, Dana.

Gardner: Tell us a little bit about video metrics. It seems that this is pretty straightforward, isn't it? You just measure the number of downloads and you know how many people are watching a video -- or is there more to it?

Meisl: You'd think it would be that straight-forward. Video is probably the fastest growing component of the Internet right now. Video consumption is accelerating unbelievably. When you measure a video, not only you are looking at did someone view the video but how far they are into the video. Did they rewind it, stop it, or replay certain parts? What happened at the end? Did they share it?

Meisl
There are all kinds of events that can happen around a video. It's not like in the display advertising business, where you have an impression and you have a click. With video, you have all kinds of interactions that happen.

You can really measure engagement in terms of how much people have actually watched the video, and how they've interacted with a video while it's playing.

Gardner: This is an additional level of insight beyond what happened traditionally with television, where you need a Nielsen box or some other crude, if I could use that term, way of measuring. This is much more granular and precise.

Census based

Meisl: Exactly. The cable industry tried to do this on various occasions with various set-up boxes that would "phone home" with various information. But for the most part, like Nielsen, it's panel-based. On the Internet, you can be more census-based. You can measure every single video, which we do. So we now know about over half a billion videos and we've measured over three trillion video events.

Because you have this very deep census data of everything that's happened, you can use standard and interesting statistical processes to figure out exactly what's happening in that space, without having to extend a relatively small panel. You know what everyone is doing.

Gardner: And of course, this extends not only to programming or entertainment level of video, but also to the advertising videos that would be embedded or precede or follow from those. Right?

Meisl: Exactly. Advertising and video are interesting, because it's not just standard television-style advertising. In standard television advertising, there are 30-second spots that are translated into the Internet space as pre-roll, post-roll, mid-roll, or what have you. You're watching the content that you really want to watch, and then you get interrupted by these ads. This is something that we at Visible Measures didn't like very much.

We're promoting this idea of content marketing through video, and content marketing is a very well-established area. We're trying to encourage brands to use those kinds of techniques using the video medium.
The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

That means that brands will tell more extensive stories in maybe three- to five-minute video segments -- that might be episodic -- and we then deliver that across thousands of publishers, measure the engagement, measure the brand-lift, and measure how well those kinds of video-storytelling features really help the brand to build up the trust that they want with their customers in order to get the premium pricing that that brand has over something much more generic.

Gardner: Of course, the key word there was "measures." In order to measure, you have to capture, store, and analyze. Tell us a little bit about the challenges that you faced in doing that at this scale with this level of requirements. It sounds as if even the real-time elements of being able to feed back that information to the ad servers is important, too.

Meisl: Right. The first part that you have to do is have a really comprehensive understanding of what's going on in the video space.

Visible Measure started with measuring all video that’s out there. Everywhere we can, we work with publishers to instrument their video players so that we get signals while people are watching videos on their site.

For the publishers that don't want to allow us to instrument their players, then we can use more traditional Google spidering techniques to capture information on the view count, comment count, and things like that. We do that on a regular basis, a few times a day or at least once a day, and then we can build up metrics on how the video is growing on those sites.

Massive database

So we ended up building this massive database of video -- and we would provide information, or rather insight, based on that data, to advertisers on how well their campaigns were performing.

Eventually, advertisers started to ask us to just deliver the campaign itself, instead of giving just the insight that they would then have to try to convince various other ad platforms to use in order to get a more effective campaign. So we started to shift a couple of years ago into actual campaign delivery.

Now, we have to do more of a real-time analysis, because as you mentioned, you want to, in real time, figure out the best ways to target the best sites to send that video to, and the best way to tune that campaign in order to get the best performance for the brand.

Gardner: And so faced with these requirements, I assume you did some proofs of concept (POCs). You looked around the marketplace for what’s available and you’ve come up with some infrastructure that is so far meeting your needs.

Meisl: Yes. We started with Hadoop, because we had to build this massive database of video, and we would then aggregate the information in Hadoop and pour that into MySQL.
There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

We quickly got to the point where it would take us so long to load all that information into MySQL that we were just running out of hours in the day. It took us 11 hours to load MySQL. We couldn’t actually use the MySQL. It was a sharded MySQL cluster. We couldn’t actually use it while it was being loaded. So you’d have to have two banks of it.

You only have a 12-hour window. Otherwise, you’ve blown your day. That's when we started looking around for alternate solutions for storing this information and making it available to our customers. We elected to use HP Vertica -- this was about four years ago -- because that same 11-hour load took two hours in Vertica. And we're not going to run out of money buying hard drives, because they compress it. They have impressive compression.

Now, as we move more into the campaign delivery for the brands that we represent, we have to do our measurement in real-time. We use Storm, which is a real-time stream processing platform and that writes to Vertica as the events happen.

So we can ask questions of Vertica as they happen. That allows our ad service, for example, to have much more intelligence about what's going on with campaigns that are in-flight. It allows us to do much more sophisticated fraud detection. There are all kinds of possibilities that you can only do if you have access to the data as soon as it was generated.

Gardner: Clearly if a load takes 11 hours, you're well into the definition of big data. But I'm curious, for you, what constitutes big data? Where does big data begin from medium or non-big data?

Several dimensions

Meisl: There are several dimensions to big data. Obviously, there's the size of it. We process what we receive, maybe half a billion events per day, and we might peak at near a million events a minute. There is quite a bit of lunchtime video viewing in America, but typically in the evening, there is a lot more.

The other aspect of big data is the nature of what's in that data, the unstructured nature, the complexity of it, the unexpectedness of the data. You don't know exactly what you're going to get ahead of time.

For information that’s coming from our instrumented players, we know what that’s going to be, because we wrote the code to make that. But we receive feeds from all kinds of social networks. We know about every video that's ever mentioned on Twitter, videos that are mentioned on Facebook, and other social arenas.

All of that's coming in via all kinds of different formats. It would be very expensive for us to have to fully understand those formats, build schemas for them, and structure it just right.

So we have an open-ended system that goes into Hadoop and can process that in an open-ended way. So to me, big data is really its volume plus the very open-ended, unknown payloads in that data.
We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that.

Gardner: How do you know you're succeeding here? Clearly, going from 11 hours to two hours is one metric. Are there other metrics of success that you look to -- they could be economic, performance, or concurrent query volumes?

Tell me what you define as a successful analytics platform.

Meisl: At the highest level, it's going to be about revenue and margin. But in order to achieve the revenue and margin goals that we have, obviously we need to have very efficient processes for doing the campaign delivery and the measurement that we do.

As a measurement company, we measure ourselves and watch how long it takes to generate the reports that we need, or for how responsive we are to our customers for any kind of ad-hoc queries that they want or special custom reports that they want.

We're continuously looking at how well we optimize delivery of campaigns and we're continuously improving that. We have corporate goals to improve our optimization quarter-over-quarter.

In order to do that, you have to keep coming up with new things to measure and new ways to interpret the data, so you can figure out exactly which video you want to deliver to the right person, at the right time, in the right context.

Looking down the road

Gardner: Chris, we're here at the Big Data Conference for HP Vertica and its community. Looking down the road a bit, what sort of requirements do you think you are going to need later? Are there milestones or is there a road map that you would like to see Vertica and HP follow in order to make sure that you don't run out of runaway again sometime?

Meisl: Obviously, we want HP and Vertica to continue to scale up, so that it is still a cost-effective solution as the volume of data will inexorably rise. It's just going to get bigger and bigger and bigger. There's no going back there.

In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we want Vertica, in particular, to be very efficient at the kinds of queries that it needs to do and proficient at loading the data and of accommodating asking questions of it.
In order to be able to do the kind of processing that we need to do without having to spend a fortune on server farms, we would want Vertica.

In addition to that, what's particularly interesting about Vertica is its analytic functions. It has a very interesting suite of analytic functions that extends beyond the normal standard SQL analytic functions based on time series and pattern matching. This is very important to us, because we do fraud detection, for example. So you want to do pattern matching on that. We do pacing for campaigns, so you want to do time series analysis for that.

We look forward to HP and Vertica really pushing forward on new analytic capabilities that can be applied to real-time data as it flows into the Vertica platform.

Gardner: I'm afraid we'll have to leave it there. We've been learning about how Visible Measures measures and how they put together an analytic capability for video at some of the highest scales I've heard of. We've also learned how they have deployed HP Vertica as their analytics platform to provide better analytics and deliver better insights to their customers.

So, a big thank you to our guest, Chris Meisl, Chief Technology Officer at Visible Measures. Thank you, sir.

Meisl: Thank you, Dana.

Gardner: And thanks also to our audience for joining us for this special HP Discover Performance podcast, coming to you directly from the recent HP Vertica Big Data Conference in Boston.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for joining, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how one company is able to track video viewing on the Internet in real time, despite massive amounts of data flowing in continuously. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:

Tuesday, October 22, 2013

Complex Carrier Network Performance Data on Vertica Yields Performance and Customer Metrics Boon for Empirix

Transcript of a BriefingsDirect podcast on how Empirix has leveraged HP Vertica to help customers derive value from ever-expanding data sets.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time we’re coming to you directly from the recent HP Vertica Big Data Conference in Boston.

Our next innovation case study interview explores how network testing, monitoring, and analytics provider Empirix required and found unique and powerful data processing capabilities. We'll learn how Empirix chose the HP Vertica analytics platform for its analytics engine to continuously and proactively evaluate carrier network performance and customer experience metrics to automatically identify issues as they emerge.

To learn more about how a combination of large-scale, real-time performance and data access make Vertica stand out to support such demands, please join me in welcoming our guest, Navdeep Alam, Director of Engineering, Analytics and Prediction at Empirix, based in Billerica, Mass. Welcome to the show. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Navdeep Alam: Thank you for having me.

Gardner: It strikes me that the amount of data that's being generated on these networks is phenomenal, a rapid creation of events. This is sort of the "New York" of data analysis ... "If you can do it there, you can do it anywhere." Tell us a bit about what Empirix does, and why you have such demanding requirements for data processing and analysis.

Alam: What we do, as you mentioned, is actively and passively monitor networks. When you're in a network as a service provider, you have the opportunity to see the packets within that network, both on the control plane and on the user plane. That just means you're looking at signaling data and also user plane data -- what's going on with the behavior; what's going at the data layer. That’s a vast amount of data, especially with mobile, and most people doing stuff on their devices with data.

Alam
When you're in that network and you're tapping that data, there is a tremendous amount of data -- and there's a tremendous amount of insights about not only what's going on in the network, but what's going on with the subscribers and users of that network.

Empirix is able to collect this data from our probes in the network, as well as being able to look at other data points that might help augment the analysis. Through our analytics platform we're able to analyze that data, correlate it, mediate it, and drive metrics out of that data.

That’s a service for our customers, increasing value from that data, so that they can turn around a return on investment (ROI) and understand how they can leverage their networks better to increase operations and so forth. They can understand their customers better and begin to analyze, slice and dice, and visualize data of this complex network.

They can use our platform, as well to do proactive and predictive analysis, so that we can create even better ROI for our customers by telling them what potentially might go wrong and what might be the solution to get around that to avoid a catastrophe.

New opportunities

Gardner: It’s interesting that not only is this data being used for understanding the performance on the network itself, but it's giving people business development and marketing information about how people are using it and where the new opportunities might be.

Is that something fairly new? Were you able to do that with data before, or is it the scale and ability to get in there and create analysis in near-real-time that’s allowed for such a broad-based multilevel approach to data and analysis?

Alam: This is something we've gotten into. We definitely tried to do it before with success, but we knew that in order to really tackle mobile and the increasing demands of data, we really had to up the ante.

Our investment with HP Vertica and how we've introduced that in our new analytics platform, Empirix IntelliSight 1.0, that recently came out, is about leveraging that platform -- not only for scalability and our ability to ingest and process data, but to look at data in its more natural format, both as discrete data, and also as aggregate data. We allow our customers to view that data ad hoc and analyze that data.

It positioned us very well. Now that we have a central point from which all this data is being processed and analyzed, we now run analytics directly at this data, increasing our data locality and decreasing the data latency. This definitely ups our ante to do things much faster, in near real time.
We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

Gardner: Obviously, the sensors, probes, agents, and the ability to pull in the information from the network needs to reside or be at close proximity to the network, but how are you actually deployed? Where does the infrastructure for doing the data analysis reside? Is it in the networks themselves, or is there a remote site? Maybe you could just lay out the architecture of how this is set up.

Alam: We get installed on site. Obviously, the future could change, but right now we're an on-premise solution. We're right where the data is being generated, where it’s flowing, and because of that we're able to gain access to the data in real-time.

One of the things we learned is that this is a tremendous amount of data. It doesn't make sense for us to just hold it and assume that we will do something interesting with it afterward.

The way we've approached our customers is to say, "What kind of value do you seen in this data? What kind of metrics or key performance indicators (KPIs), or what do you think is valuable in this data? We then build a framework that defines the value that they can gain from data -- what are the metrics and what kind of structure they want to apply to this data. We're not just calculating metrics, but we're also applying some sort of model that gives this data some structure.

As they go through what we call the Empirix Intelligent Data Mediation and Correlation (IDMC) system, it's really an analytics calculator. It's putting our data into the Vertica system, so that at that point we have meaningful, actionable data that can be used to trigger alarms, to showcase thresholds, to give customers great insight to what's going on in their network.

Growing the business

From that, they can do various things, such as solve problems proactively, reach out to the customers to deal with those issues, or to make better investments with their technology in order to grow their business.

Gardner: How long have you been using Vertica and how did that come to be the choice that you made? Perhaps you could also tell us a little bit about where you see things going in terms of other capabilities that you might need or a roadmap for you?

Alam: We've been using Vertica for a few years, at least three or four, even before I came on-board. And we're using Vertica primarily for its ability to input and read data very quickly. We knew that, given our solutions, we needed to load a lot of data into the system and then read a lot of data out of it fast and to do it at the same time.

At that time, the database systems we used just couldn't meet the demands for the ever-growing data. So we leveraged Vertica there, and it was used more as an operational data store. When I came on board about a year-and-a-half ago, we wanted to evolve our use of Vertica to be not just for data warehousing, but a hybrid, because we knew that in supporting a lot of different types of data, it was very hard for us to structure all of those types of data.

We wanted to create a framework from which we can define measures and metrics and KPIs and store it in a more flat system from which we can apply various models to make sense of that data.
Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

That really presented us a lot of challenges, not only in scalability, but our ability to work and play with data in various ways. Ultimately, we wanted to allow customers to play with this data at will and to get response in seconds, not hours or minutes.

It required us to look at how we could leverage Vertica as an intelligent data-storage system from which we could process data, store it, and then get answers out of that data very, very quickly. Again, we were looking for responses in a second or so.

Now that we've put all of our data in the data basket, so to speak, with Vertica, we wanted to take it to the next level. We have all this data, both looking at the whole data value chain from discrete data to aggregate data all in one place, with conforming dimensions, where the one truth of that data exists in one system.

We want to take it to the next step. Can we increase our analytical capabilities with the data? Can we find that signal from the noise now that we have all this data? Can we proactively find the patterns in the data, what's contributing to that problem, surface that to our customers, and reduce the noise that they are presented with.?

Solving problems

Instead of showing them that 50 things are wrong, can I show them that 50 things are wrong, but that these one or two issues are actually impacting your network or your subscribers the most? Can we proactively tell them what might be the cause or the reason toward that and how to solve it?

The faster we can load this data, the faster we can retrieve the value out of this data and find that needle in the haystack. That’s where the future resides for us.

Gardner: Clearly, you're creating value and selling insight to the network to your customers, but I know other organizations have also looked at data as a source of revenue in itself. The analysis could be something that you could market. Is there an opportunity with the insight you have in various networks -- maybe in some aggregate fashion -- to create analysis of behavior, network use, or patterns that would then become a revenue source for you, something that people would subscribe to perhaps?

Alam: That's a possibility. Right now, our business has been all about empowering our customers and giving them the ability to leverage that data for their end use. You can imagine, as a service provider, having great insight into their customers and the over-the-top applications that are being leveraged on their network.

Could they then use our analytics and the metadata that we're generating about their network to empower their business systems and their operations to make smarter decisions? Can they change their marketing strategy or even their APIs about how they service customers on their network to take advantage of the data that we are providing them?
The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

The opportunity to grow other business opportunities from this data is tremendous, and it's going to be exciting to see what our customers end up doing with their data.

Gardner: Are there any metrics of success that are particularly important for you. You've mentioned, of course, scale and volume, but things like concurrency, the ability to do queries from different places by different people at the same time is important. Help me understand what some of the other important elements of a good, strong data-analysis platform would be for you?

Alam: Concurrency is definitely important. For us it's about predictability or linear scalability. We know that when we do reach those types of scenarios to support, let’s say, 10 concurrent users or a 100 concurrent users, or to support a greater segmentation of data, because we have gone from 10 terabytes to 30 terabytes, we don't have to change a line of code. We don't have to change how or what we are doing with our data. Linear scalability, especially on commodity hardware, gives us the ability to take our solution and expand it at will, in order to deal with any type of bottlenecks.

Obviously, over time, we'll tune it so that we get better performance out of the hardware or virtual hardware that we use. But we know that when we do hit these bottlenecks, and we will, there is a way around that and it doesn't require us to recompile or rebuild something. We just have to add more nodes, whether it’s virtual or hardware.

Gardner: Well, great. I am afraid we'll have to leave it there. We've been learning about how network testing, monitoring, and analytics provider Empirix found unique and powerful data-processing capabilities. And we've seen how they deployed the HP Vertica Analytics Platform to provide better analytics to their customers in the network provider space.

So a big thank you to our guest, Navdeep Alam, Director of Engineering, Analytics, and Prediction at Empirix. Thank you, Navdeep.

Alam: Thank you.

Gardner: And thanks also to our audience for joining us for this special HP Discover Performance Podcast coming to you from the recent HP Vertica Big Data Conference in Boston.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP

Transcript of a BriefingsDirect podcast on how Empirix has leveraged HP Vertica to help customers derive value from ever-expanding data sets. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:

Thursday, October 17, 2013

Democratic National Committee Leverages Big Data to Turn Politics into Political Science

Transcript of a BriefingsDirect podcast on how a political campaign used big data to better understand and predict voter behavior and what was going on on the ground during the 2012 national elections.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time, we're coming to you directly from the recent HP Vertica Big Data Conference in Boston.

Our next innovation case study interview focuses on the big-data problem in the realm of political science. We'll learn how the Democratic National Committee (DNC) leveraged big data to better understand and predict voter behavior and alliances in the 2012 U.S. national elections.

To learn more about how the DNC pulled vast amounts of data together to predict and understand voter preferences and understanding of the issues, please join me in welcoming Chris Wegrzyn, Director of Data Architecture at the DNC, based in Washington, DC. Welcome, Chris. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Chris Wegrzyn: Hello. Thank you for having me.

Gardner: Like a lot of organizations, you had different silos of data and information, and you weren't able to do the analysis properly because of the distributed nature of the data and information. What did you do that allowed you to bring all that data together, and then also get the data assembled to bring out better analysis?

Wegrzyn: In 2008, we received a lot of recognition at that time for being a data-driven campaign and making some great leaps in how we improved efficiency by understanding our organization.

Wegrzyn
Coming out of that, those of us on the inside were saying this was great, but we have only really skimmed the surface of what we can do. We focused on some sets of data, but they're not connected to what people were doing on our website, what people were doing on social media, or what our donors were doing. There were all of these different things, and we weren’t looking at them.

Really, we couldn’t look at them. We didn't have the staff structure, but we also didn't have the technology platform. It’s hard to integrate data and do it in a way that is going to give people reasonable performance. That wasn't available to us in 2008.

So, fast forward to where we were preparing for 2012. We knew that we wanted to be able to look across the organization, rather than at individual isolated things, because we knew that we could be smarter. It's pretty obvious to anybody. It isn’t a competitive secret that, if somebody donates to the campaign, they're probably a good supporter. But unless you have those things brought together, you're not necessarily pushing that information out to people, so that they can understand.

We were looking for a way that we could bring data together quickly and put it directly into the hands of our analysts, and HP Vertica was exactly that kind of solution for us. The speed and the scalability meant that we didn't have to worry about making sure that everything was properly transformed and didn't have to spend all of this time structuring data for performance. We could bring it together and then let our analysts figure it out using SQL, which is very powerful, but pretty simple to learn.

Better analytic platform

Gardner: Until the fairly recent past, it wasn't practical, both from a cost and technology perspective, to try to get at all the data. But it has gotten to that point now. So when you are looking at all of the different data that you can bring to bear on a national election, in a big country of hundreds of millions of people, what were some of the issues you faced?

Wegrzyn: We hadn’t done it before. We had to figure it out as we were going along. The most important realization that we made was that it wasn't going to be a huge technology effort that was going to make this happen. It was going to be about analysts. That’s a really generic term. Maybe it's data scientists or something, but it's about people who were going to understand the political challenges, understand something about the data, and go in and find answers.

We structured our organization around being analyst-centric. We needed to build those tools and platforms, so that they could start working immediately and not wait on us on the technology side to build the best system. It wasn’t about building the best system, but it was about getting something where we could prototype rapidly.

Nothing that we did was worth doing if we couldn't get something into somebody's hands in a week and then start refining it. But we had to be able to move very, very quickly, because we were just under a constant time-crunch.
That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

Gardner: I would imagine that in the final two months and weeks of an election, things are happening very rapidly. To have a better sense of what the true situation on the ground is gives you an opportunity to best react to it.

It seems that in the past, it was a gut instinct. People were very talented and were paid very good money to be able to try to distill this insight from a perspective of knowledge and experience. What changed when you were able to bring the HP Vertica platform, big data, and real-time analysis to the function of an election?

Wegrzyn: Just about everything. There isn't a part of the campaign that was untouched by us, and in a lot of those places where gut ruled, we were able to bring in some numbers. This came down from the top campaign manager, Jim Messina. Out of the gate, he was saying that we have to put analytics in every part of the organization and we want to measure everything. That gave us the mission and the freedom to go in and start thinking how we could change how this operates.

But the campaign was driven. We tested emails relentlessly. A lot of our program was driven by trying to figure out what works and then quantify that and go out and do more. One of our big successes is the most traditional of the areas of campaigns nowadays, media buying.

More valuable

There have been a bunch of articles that have come up recently talking about what the campaign did. So I'm not giving anything away. We were able to take what we understood about the electorate and who we wanted to communicate with. Rather than taking the traditional TV buying approach, which was we're going to buy this broad demographic band, buy a lot of TV news, and we are going to buy a lot of the stuff that's expensive and has high ratings amongst the big demographics. That’s a lot of wasted money.

We were able to know more precisely who the people are that we want to target, which was the biggest insight. Then, we were able to take that and figure out -- not the super creepy "we know exactly what you are watching" level -- but at an aggregate level, what the people we want to target are watching. So we could buy that, rather than buying the traditional stuff. That's like an arbitrage opportunity. It’s cheaper for us, but it's way more valuable.

So we were able to buy the right stuff, because we had this insight into what our electorate was like, and I think it made a big difference in how we bought TV.

Gardner: The results of your big data activities are apparent. As I recall, Governor Romney's campaign, at one point, had a larger budget for media, and spent a lot of that. You had a more effective budget with media, and it showed.

Another indication was that on election night, right up until the exit polls were announced, the Republican side didn't seem to know very clearly or accurately what the outcome was going to be. You seemed to have a better sense. So the stakes here are extremely high. What’s going to be the next chapter for the coming elections, in two, and then four years along the cycle?
How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Wegrzyn: That’s a really interesting question, and obviously it's one that I have had to spend a lot of time thinking about. The way that I think about the campaign in 2012 was one giant fancy office tower. We call it the Obama Campaign. When you have problems or decisions that have to be made, that goes up to the top and then back down. It’s all a very controlled process.

We are tipping that tower on its side now for 2014. Instead of having one big organization, we have to try to do this to 50, 100, maybe hundreds of smaller organizations that are going to have conflicting priorities. But the one thing that they have in common now is they saw what we did on the last campaign and they know that that's the future.

So what we have to do is take that and figure out how we can take this thing that worked very well for this one big organization, one centralized organization, and spread it out to all of these other organizations so that we can empower them.

They're going to have smaller staffs. They're going to have different programs. How do we empower them to use the tools that we used and the innovations that we created to improve their activity? It’s going to be a challenge.

Gardner: It’s interesting, there are parallels between what you're facing as a political organization, with federation, local districts for Congress, races in the state level, and then of course to the national offices as well. This is a parallel to businesses. Many businesses have a large centralized organization and they also have distributed and federated business units, perhaps in other countries for global companies.

Feedback loop

Is there a feedback loop here, whereby one level of success, like you well demonstrated in 2012, leads to more of the federated, on-the-ground, distributed gathering and utilization of data that also then feeds back to the larger organization, so that there's a virtual adoption pattern that will benefit across the ecosystem? Is that something you are expecting?

Wegrzyn: Absolutely. Even within the campaign, once people knew that this tool was available, that they could go into HP Vertica and just answer any question about the campaign's operation, it transformed the way that people were thinking about it. It increased people's interest in applying that to new areas. They were constantly coming at us with questions like, "Hey, can we do this?" We didn't know. We didn’t have enough staff to do that yet.

One of our big advantages is that we've already had a lot of adoption throughout campaigns of some of the data gathering. They understand that we have to gather this data. We don't know what we are going to do with it, but we have them understanding that we have to gather it. It's really great, because now we can start doing smart things with it.

And then they're going to have that immediate reaction like, "Wow, I can go in there now and I can figure out something smart about all of the stuff that I put in and all of the stuff that I have been collecting. Now I want more." So I think we're expecting that it will grow. Sometimes I lose sleep about how that’s going to just grow and grow and grow.

Gardner: We think about that virtuous adoption cycle, more-and-more types of data, all the data, if possible, being brought to bear. We saw at the Big Data Conference some examples and use cases for the HAVEn approach for HP, which includes Vertica, Hadoop, Autonomy IDOL, Security, and ArcSight types of products and services. Does that strike a chord with you that you need to get at the data, but now that definition of the data is exploding and you need to somehow come to grips with that?
Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Wegrzyn: That's something that we only started to dabble in, things like text analysis, like what Autonomy can with that unstructured data, stuff that we only started to touch on on the campaign, because it’s hard. We make some use of Hadoop in various parts of our setup.

We're looking to a future, where we bring in more of that unstructured intelligence, that information from social media, from how people are interacting with our staff, with the campaign in trying to do something intelligent with that. Our future is bringing all of those systems, all of those ideas together, and exposing them to that fleet of analysts and everybody who wants it.

Gardner: Well, great. I'm afraid we'll have to leave it there. We've been learning about how big data problems were handled in a handy fashion in the realm of political science. In fact, making it more scientific.

We've seen how the Democratic National Committee leveraged big data to better understand and predict voter behavior and what was going on on the ground during the 2012 national elections. We have seen how they've deployed HP Vertica analytics platform to better provide analytics and insights for their various analysts and the participants in the campaign.

So a big thank you to our guest, Chris Wegryzn, Director of Data Architecture for the DNC in Washington, DC. Thanks so much, Chris.

Wegrzyn: Thank you.

Gardner: And thanks also to our audience for joining this special HP Discover Performance Podcast coming to you from the recent HP Vertica Big Data Conference in Boston. 

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for joining, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how a political campaign used big data to better understand and predict voter behavior and what was going on on the ground during the 2012 national elections. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:

Wednesday, October 09, 2013

Need for Quality and Speed Powers Sentara's Applications Modernization Journey

Transcript of a BriefingsDirect podcast on how a healthcare provider is deploying and monitoring IT operations and services for better patient care.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we're focusing on how IT leaders are improving their services' performance to deliver better experiences and payoffs for businesses and end users alike, and this time we're coming to you directly from the recent HP Discover 2013 Conference in Las Vegas.

Our next innovation case study interview highlights how Virginia Healthcare provider Sentara Healthcare improve its IT operations and services delivery at higher quality and higher speed.

We'll learn how it’s improving the IT service management (ITSM) maturity, making IT an internal business-service provider, and how that’s helped them in deploying better services, but also monitoring those services to oversee their applications’ activities.

To learn more about how Sentara Healthcare excelled at application and data delivery and has progressed towards an automated lifecycle approach for high performance management, please join me in welcoming our guest, Jason Siegrist, Manager of Enterprise Management Technologies at Sentara. Welcome. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Jason Siegrist: Glad to be here.

Gardner: Let’s paint the picture. Apps, of course, are always important, but in your business, healthcare, getting those apps so the people seems to be more important than in the past. Is there a shift here, where the emphasis is on speed and access to data? How has the notion of an application been changing for your users?

Siegrist: At Sentara Healthcare, and actually most healthcare organizations, the interest has been trying to get to electronic medical records (EMR) to make it easier and to reduce risks associated with caring for patients.

Patients are looking to get access to that data quicker, be able to see lab results in a timely manner, and be able to schedule appointments with doctors. We're trying to make those systems available to them in a secure way so that they're confident that their personal information is safe and protected.

Gardner: Of course, as end users, they just see the apps, but there's a lot going on behind the scenes to make sure that they are performing properly and that they get to where they are supposed to. Tell us why maturity and progressing toward better application culture and behavior has been important for you.

Better healthcare decisions

Siegrist: In healthcare, the face of healthcare is still our doctors, nurses, and technical staff. However, we're trying to make sure we can enable those doctors and nurses to make better healthcare decisions and allow them to work interactively among each other, even when they're not in the same building.

Siegrist
Our environment has grown so significantly, even with things like X-rays being all digital these days. Now, a doctor can go back and review case studies, without having to wait to request those images and have them shipped. If someone is sitting in their office and they have an X-ray, they can go to priors very quickly.

So all these systems -- in Sentara there are about 17 of them -- have to be integrated in such a way that we guarantee that their work being collected and going to the right patient, and at the same time, when they're requesting information, they're getting the right patient data back.

Gardner: Those are the requirements, that’s the goal, but what about inside your IT organization? How have you been able to change and adapt so that you can deliver these and improve? What's the underlying shift internally.

Siegrist: Our big secret isn't really a secret anymore. Previously, every organization always looked at IT as being a very expensive cost center. We've been working very hard internally to change that discussion to be that we're enabling the business.

We've done that by doing some creative and unique processes. We bring in the pharmacist, for example. We make him the owner of the pharmacy app. Now, we have direct buy-in from a pharmacist who is a part of the IT process that selects the application and figures out how to integrate it.
We're trying to make sure we can enable those doctors and nurses to make better healthcare decisions.

Through that process, he's able to act as our champion in the pharmacy space and talk to his fellow pharmacists, saying "We have selected this, and I've been a part of that process." So we're involving them in the process, and at the same time, it's not an IT-focused or IT-forced initiative. We really are enabling business.

Gardner: It’s impressive to me that you're doing this at significant scale. Tell us a little bit about Sentara, how big it is, how many apps you have, and  the fact that you're distributed over fairly large geographic area in Virginia.

Siegrist: In the healthcare space, you measure it by hospitals. I think we're at 11 hospitals these days. We're always looking to expand and grow. We're out on the western edge of Virginia in the Blue Ridge Parkway area, as well as Hampton Roads and up to DC. So, we're in Virginia and a little bit in North Carolina.

Having these maturities in these processes has enabled us to include the business in the IT decisions. As we start building the monitoring, we start building the proactive analysis, in the troubleshooting. Our mean time to repair has gone down. We support larger populations with fewer staff, whether that's with internal systems or internal hardware. We built these automation processes and we built these systems with the idea that we want to be as lean as possible, and at the same time, deliver quality healthcare services.

Maturity roadmap

Gardner: It’s impressive to me too that you have charted out a maturity roadmap for yourselves and you've been in it for several years. Tell me where you evaluate yourself now and where you came from.

Siegrist: Like anybody, this really is an organizational learning process as well as a cultural shift and change. Several years ago, my boss, Betsy Meadows, had started the process about how we want to deploy ITIL. It all started around measuring network performance.

Ultimately, that grew into the idea that in order to do that, we have to do with network monitoring. We have to capture incidents and we have to capture that downtime, and by the way there is downtime that’s legitimate because we are doing maintenance.

Then, we had to think about how to capture maintenance events as downtime? So this process grew and grew. Over the last 8 to 10 years, we went from being very new in the process to where we are today. This is something every company goes through as far as maturation process.
As more and more young people under the workforce, they are coming with a predefined set of skills.

Today there is a scale out there. It says, 1 to 5. I’d say we are solidly 4-point something, if you do the math. But we have adopted a lot of processes at level 5 and at level 4. It’s allowed us to make smart decisions and make smart financial decisions as well.

Gardner: What have been some of the important tools that you've used to get there and what do you look to in terms of getting to that higher level of maturity? What are some of the ways that technology can come to bear on that?

Siegrist: Well, the reality is the workforce. As more and more young people under the workforce, they are coming with a predefined set of skills. I'm still young at 40, but my son can operate an iPad and he is three. He has no problems at all navigating that space.

The reality is that a younger workforce has an expectation of services and delivery. To that end, we're trying to enable our customers to have the ability to go out and do some of these things themselves. It's like an a la carte process, where they can say, "I want this level of monitoring. I want my application monitor this way. I’d like to see this dashboard here."

The application performance management suite that’s available from a software-as-a-service (SaaS) solution, has given us one more tool in our arsenal of solutions that allowed us to pass that out to the customer and say, "If you want to go make your monitor and you have a synthetic transaction or you want diagnostics-level knowledge about your application, here is a delivery channel to do that."

Gardner: You're a big user of HP. Tell us a little bit about the Business Services Management (BSM) suite, your involvement, and also the performance.

Several iterations

Siegrist: Ten years ago, we started out with HP Network Node Management (NNM), which is the network monitoring solution, and then moved into HP Open View (OVO), which is now called Operations Manager. So it’s been through several iterations, but over the last 10 years, we made lots of decisions about what tools to use.

We've always tried to go with best-of-breed where appropriate, and it happens to be that for us, the best-of-breed for us has been the HP solution set. It’s enabled us to get deeper into the applications and given us multiple ways to solve different problems.

Nothing is free in life. So we always want to try and give our customers options for which path they want to take and what level of the knowledge they want in the application space.

To this end, with the APM SaaS solution, it’s an operational expense. They don’t have to buy it in whole. They don’t have to deploy everything. They can just start. So, as I said It's an a-la-carte model. It let’s them just choose just a little or a lot, and then you can bite off the bigger pieces of pie that they're willing to tolerate.

Gardner: How do these tools support your drive towards greater mobility and development of applications so that there is a lifecycle where the development, the deployment, and then the operations can relate to each other for a higher efficiency, productivity, and benefit of the users?
The value is that the face of customer care in healthcare is still doctors and nurses.

Siegrist: Our customer base is interested in trying to have a way to interact with the doctors, and as more-and-more tablets and PCs and smartphones hit the market, we're looking for delivery solutions that provide that.

Our partner for our EMR is Epic. We use their solution for contacting and working with the doctors. It's called MyChart, and that tool gives them the ability to do that. As more-and-more of these devices get out there, the population gets younger. They have an expectation of service delivery through that channel, and Sentara is working to meet that expectation. This gives us the ability to monitor that application to make sure it's working properly.

Gardner: Are the doctors welcoming these technology shifts? Has there been any change because you have been able to do this with delivery, services orientation, and service bureau types of benefits? Do you see a reaction in terms of their acceptance of it?

Siegrist: Well, the value is that the face of customer care in healthcare is still doctors and nurses. Where we often have run into problems is when you start doing things like transcription or prescription order writing.

Today, the doctors are doing those themselves and they are documenting their own notes. There was initially some push-back because it's different than what they were used to. The reality is that they're able to make the notes and to do it very quickly, and they are able to review those.

Perception of savings

In the past, they had to go to a transcriptionist, and transcriptionist would type it. Then, they’d have to validate what the transcriptionists wrote, so they really didn’t save any time through that other process. All they had was the perception of time savings.

The adoption rate has been pretty high. Again, we have younger doctors hitting the market. They're looking for similar types of behaviors, and it allows them to be able to provide better customer service as well.

Gardner: You mentioned earlier that it’s about SaaS and the ability to pick and choose the type of deployment model for your apps, services, and even infrastructure. Do you have any thoughts about where you're heading in terms of more choice in hybrid or cloud models?
We're trying to make sure that, as we move forward with monitoring these things in the data landing in the cloud, we are protecting patient data.

Siegrist: For most health organizations, and I'm probably in line here with my peers as well, there's always a concern about HIPAA. We're trying to make sure that, as we move forward with monitoring these things in the data landing in the cloud, we are protecting patient data. We are moving tentatively into that space and doing a little bit at a time to prevent and avoid any risk associated with patient data loss.

Gardner: Well, great. That makes a good sense, and I appreciate your spending some time with us. We've been learning about how Virginia healthcare provider Sentara Healthcare has improved its IT operations and services delivery for higher quality and speed, and we have seen how Sentara gained an IT service management maturity and deployed monitoring dashboards to better oversee and advance their applications.

Please join me now in thanking our guest, Jason Siegrist, Manager of Enterprise Management Technologies at Sentara. Thanks, Jason.

Siegrist: Thanks, Dana.

Gardner: And thank you too to our audience, for joining us for this special HP Discover Performance podcast, coming to you from the recent HP Discover 2013 Conference in Las Vegas. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions.

Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how a healthcare provider is deploying and monitoring IT operations and services for better patient care. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in:

Wednesday, October 02, 2013

Panel of Business Experts Explores Role and Value of Big Data in Customer Analytics

Transcript of a BriefingsDirect podcast on how firms are using HP Vertica to gain more and faster insight from customer actions and interaction.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Performance Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your moderator for this ongoing discussion of IT innovation and how it’s making an impact on people’s lives.

Gardner
Once again, we’re focusing on how IT leaders are improving their business performance for better access, use and analysis of their data and information. This time we’re coming to you directly from the HP Vertica Big Data Conference in Boston.

Our next innovation case study panel discussion highlights how various organizations are developing the means to develop far better analytics about their customers. To learn more about how high performing and cost-effective big data processing enables a steep learning curve from customers on their wants and preferences, please join me now in welcoming our guests, Rob Winters, the Director of Reporting and Analytics at Spil Games based in Amsterdam. Welcome, Rob.

Rob Winters: How is it going?

Gardner: It’s going great. We're also here with Davide Conforti, Business Intelligence Director at Jobrapido, based in Milan. Welcome, Davide.

Davide Conforti: Thank you, guys. Welcome.

Gardner: And we are also here with Pete Fishman, Director of Analytics at Yammer, based in San Francisco. Welcome. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Pete Fishman: Thanks, Dana.

Gardner: Businesses have been analyzing customers for a long time. This isn’t something new -- needing to know a lot about your customer. What’s different now about truly getting to know your customer? Let’s start with you, Pete.

Fishman
Fishman: I work in the software industry, and our data now on the customers is all living in a central place. We're a cloud software service, and the data is big. By aggregating across companies that are using your software, you can get really significant sample sizes and real inference, both from an economic sense, in terms of measuring the lift, but actually because the sample sizes are big, you can get statistical inference.

That’s the starting point for making analytics valuable and learning about your customers.

Gardner: Rob, what’s different now, in terms of being able to get information, than 10 years ago?

Different problems

Winters: For me, the problem space is extremely different from what I was dealing with a couple of years back.

I was in telecom before this. There, you're dealing with 25 million people, and if you rescore them once a month, that’s fast enough. On a web scale problem, I'm dealing with 200 million customers and I have to rescore them within 10 or 15 minutes. So you're capturing significantly more data. We're looking at billions of records per day coming into our systems. We have to use it as fast as possible, because with the customer experience online, minutes matter.

Gardner: Is this a familiar story to you Davide? How are things different for you in terms of getting to know your customers?

Conforti
Conforti: It’s absolutely the same story. We have about 40 million unique visitors per month now. We've grown by double-digits since our start as a startup in 2006. Now, everything is about user interaction, how our users behave on-site, and how we can engage them more on-site and provide them a tremendous ad-hoc user experiences.

Gardner: So it's not just getting to know your customers. It's following your customers. It’s their actions that you can capture. I suppose that's pretty interesting and new, but let’s start with Spil Games. Tell us about your organization. How did you get such a big audience?

Winters: We've been around for about nine years. We started out as just a Dutch company and then we've acquired other local domain names in a variety of languages. At this point, we have about 50 different platforms, running in about 20 different languages. So we support customers from all over the world. In a given month, we have over 200 countries with traffic onto our sites.

Winters
For us, growth was initially about just getting that organic traffic. Up until a few years ago, if you had a good domain name, you were competing based off of where you ranked in search. Now, the entire business is changing, and you're competing based off that customer experience that you can deliver.

Gardner: Tell us what kind of games, and who are they targeted at?

Winters: We have a couple target audiences: girls, young girls, 8-14; boys; and then women. We're primarily a platform. We do some game development and publishing, but our core business is just being the platform where people can come and find content that’s interesting to them.

Gardner: Let's hear more about Yammer. Tell me, Pete, what Yammer is and does, and how you got to such huge numbers and big data.

Fishman: Yammer is a startup in San Francisco. We were acquired about a year ago by Microsoft and we're part of the larger Office organization. We view ourselves as enterprise social, taking this many-to-many communication model and making communication at your company much more efficient.

It's about surfacing relevant knowledge and experts and making work lives better. I run an analytics team there, and we essentially look at the aggregate customer behaviors and what parts of our tool people are using.

Social networks

Gardner: So, this was interesting for you as a social network within the confines of an enterprise of a business. What goes on in that network is imported data. You can learn tribal knowledge, capture it, and apply it to other problems, which perhaps you can't do on some of the more public or free and open social networks.

Fishman: Exactly. This was a really revolutionary idea that our founders David Sacks and Adam Pisoni had, way back when Facebook wasn't nearly as relevant as it is today. But we've leveraged a lot of the way that people have learned to interact in their social life and bring some of that efficiency of communication.

For example, telling you that I've gotten engaged or I'm having a baby, all these pictures go on Facebook. It's an efficient way of getting many-to-many communication. They saw that these social networks would grow and be relevant in a private, secured context of your business.

Gardner: Let's learn more about Jobrapido. Tell me about your organization and the some of the reasons that there's so much data to analyze.

Conforti: Jobrapido started in 2006 as an entrepreneurial challenge that Vito Lomele, an Italian guy, started in Milan. It's quite a challenge to live in the online market in Italy, because talent pooling isn't as wide as in U.S. or in other countries in Europe. What we do is provide job-seekers the opportunity to find their new job.
What we do is provide jobseekers the opportunity to find their new job.

We're an online job-search engine and we currently operate in 58 different countries with more than 20 languages. We're all in this big headquarters in Milan with a lot of different nationalities, because of course, we provide the service in local languages for most of our customers.

Recently, we have been purchased by the Daily Mail group, a big media group based in London. For us, it's everything from job-seeker acquisition and retention and engagement deals with constant quality and user experience on-site. We use our big data warehouse in order to understand how to better attract and retain customers on the basis of their preferences. And we also use it to tweak our matching algorithm, which works more or less like a Google algorithm.

We crawl a lot of contents from different sources, both job boards and other job sites or directly in the working pages of individual companies. We put them together in a big database and, using statistical tools, we infer which kind of rankings our job-seekers are willing to see.

So it's a pretty heavy data crunching exercise that we do everyday on millions and millions of different sponsored or organic postings.

Gardner: And just to be clear, this is a site for not only those who are looking for job use, but those who are looking to hire as well.

Moving to B2B

Conforti: True. Most of our business deals with B2C, but we're developing tools and a B2B platform to address players such as job boards, for example. We crawl and get sponsored ads from job boards as well, but we're more and more going towards our end customers.

For example if Yammer guys or if Spil Games guys want to hire a software engineer, they can directly promote their sponsored ads on Jobrapido without having to sponsor them on a job board. So we're trying to aggregate and simplify the chain of job search.

Gardner: Now that we know more about you, let's learn more about the problem that you had when it comes to managing big data, and where to get to those all important customer insights and analysis to make those available to your workers and strategists.

Rob, let's start with you. What was the problem you had to solve when it comes to getting at this data in analysis?
As you start to bring in different data sources, you start with all the stuff that you know you're going to need right away.

Winters: For me, my problem was that no one had ever tried to do it in my company before. We walked in with effectively a clean slate. But as you start to bring in different data sources, you start with all the stuff that you know you're going to need right away.

You start seeing needed links for other data sources. At this point, we're pulling data from thousands of databases, merging with dozens of application programming interfaces (APIs). You're pulling in your web log data, so that you can personalize for those folks who aren’t giving you registration information.

For me the challenge was multi-fold. How do you deal with this data problem, with this variety and volume information? How do you present it in a meaningful fashion for employees who've never looked at data before, so that they can make good decisions on it? And how do you run models against it and feed that back into a production environment as quickly as possible, so that you can give those customers a better experience than they were ever getting before on your platform?

Gardner: How did you solve it?

Winters: We're still trying to solve it, to be honest. If you look at it, we've built a technology stack that is a mixture of open source, commercial, and proprietary software that we've developed to solve these different problems. It's an ongoing journey for us -- how we do these things, and we're moving forward two steps, falling back one, and continuing along this path.

Gardner: What was it about an HP Vertica architecture that helped mitigate some of these issues? Was there a comparison to the way you had done it before, or did you go directly to a Vertica solution when you encountered these issues?

Large data

Winters: When we first started looking for a data warehouse appliance or application, we were running Postgres with no indices, just copies of production data. For data guys, that means that a query will take eight hours to execute. It's a table of a couple of million rows.

We knew that a typical row-based solution was out. So we started looking at some of the other applications out there. The big ones are Teradata, Exadata, and Greenplum, but you're going to have to mortgage the house of every employee in the company to be able to afford a license for those applications, and we're a pretty small company. So those were out.

Then, we started looking at some of the other boutique vendors like Infobright, and basically we saw that with Vertica, we can have relatively low load on our database administrator (DBA), so we can develop quickly without a lot of maintenance.

The pricing model fits what we need to achieve, and the performance is so good that we don't have to spend a ton of time on optimization now. We can basically move very rapidly along this path of becoming a data-driven organization without having to get held up on index optimization or trying to optimize our queries and rewrite paths.
We can just throw a lot of stuff into the system, smash it together, take the results, and get big wins for the company quickly.

We can just throw a lot of stuff into the system, smash it together, take the results, and get big wins for the company quickly.

Gardner: And how important is it for you to be able to deploy this on appliances only, or do you have other directions that you would like to go with that?

Winters: No, we're doing everything within our own premises. We have a data center, and we do everything on our own private servers. For us, the next step is probably going to be moving more into a private-cloud model, and hopefully, Vertica will work in that environment as well.

Gardner: At Yammer, let's look at your problem set and how you went about solving it.

Fishman: I think more broadly than just data as the problem set. Our problem set was that there were a lot of people trying to get into the enterprise social space. A lot of social networks are popping up, and essentially competing for attention at work is a challenge.

We felt that data was necessary to have a competitive advantage. David Sacks and Adam Pisoni had a vision of developing a consumer software company with rapid iteration. With that rapid iteration you get an extra advantage if you're able to reorient yourself based on what part of the product is working. Our data problems were largely about making data be a competitive advantage in our development methodology.

Gardner: What was it about Vertica that was instrumental to the point where you've adopted it? Is it a concurrency issue, a volume issue, speed, or all the above?

It's about speed

Fishman: It's all of the above, but the real highlight is always going to be about speed, especially, given the incredible competition for talent, not just in the Bay Area, but all over, especially in the data field.

Anybody that has data in their title is someone that’s highly sought after. That ability to minimize the cycle times for those folks who are such a challenge to keep and get excited about the projects that they're working on and is a tremendous solution that allows them to maximize their own abilities is really critical. It's the same in our space, and in software development in general.

Since we're in Boston, I feel like I can use baseball analogy. Hall of Fame product managers are like Hall of Fame baseball players, meaning they get it right about a third of the time. When we take on these big risks and challenges, the ability to very quickly identify whether we're going in the right direction, and then reorienting where we are going, has been really critical to Yammer being successful.

Gardner: I guess we could say it's better to give your data scientists a Ferrari than a go-kart?

Fishman: That seems like a good investment these days.

Gardner: Davide, what's the Ferrari in your organization? How did you get to one and what were you using before?

Conforti: When I joined Jobrapido, we already ran tons of A/B tests, which are the lifeblood of our product innovation. We want to test everything, from changing the color or the font of one button to a different layout, because these have tremendous impact on improving the user engagement.
We really appreciate this flexibility and the high level of control that Vertica allows. This improved a lot our innovation throughput and it's going to improve it even more in the future./p>

Before, we used the Google Analytics tools, but we didn't like that much, because it's sample data, so you hardly reach statistically meaningful results. We decided to build a data warehouse to assure flexibility, performance, and also a higher level of control and data consistency. That's end-to-end control from the source, toward the visualization, in order to make them more actionable in terms of product development.

With Vertica, we did exactly this. We poured all the different data sources into one bucket, organized it, and now we have a full control over the data model. With my team, I manage these data models. It's fascinating how fast you can add pieces to the puzzle or remove others that are no longer interesting, because our business model, of course, is a living animal, a living creature.

We really appreciate this flexibility and the high level of control that Vertica allows. This improved a lot our innovation throughput and it's going to improve it even more in the future.

Gardner: Do you have any metrics of success for comparison, either in time, concurrency, or volume? Most of our listeners and audience are interested in some hard facts. Do you have any feeds and speeds you can share?

Conforti: Currently, we crunch on Vertica about 30 GB of data everyday (i.e. we upload 30 GB/day on Vertica). But we're going to double it in a few months, because we're adding more stuff. We want to know more about the click patterns of our job-seekers on the site, and this is massive data flowing into Vertica. Also, our licensing in terabytes will likely double in the future.

Increased performance

Another hard fact that I can share with you guys is that every one of you using Vertica doesn't have to be satisfied with the first implementation of the query. If you're able to optimize it, you almost increase the performance of the query by more than 100 percent. This is my personal experience with consultants or advisers. Vertica is happy to provide the support, and this is really value-adding.

Gardner: Given that you're seeing such a large increase very rapidly in terms of your data volume, do you have a sense of cost prediction, or is there a visibility at least into the relationship between the task and the total cost?

Conforti: What we try to understand is whether we have to pour this big amount of data, all into Vertica or if we have to flank it with Hadoop or some sort of cheaper storage solution, in order to get better control costs. Currently, I don't have the figures or a model to estimate how the cost moves with the numbers. This is a pretty good point. I will build it and I will share the results with you in the future.

Gardner: Rob Winters, any metrics of success and/or how do you feel about visibility into controlling costs?
For me, it allowed me to actually do my job and have my team do their jobs, which is a pretty big metric of success.

Winters: As far as metrics of success, when we were doing our proof of concept (POC), we looked at primarily query performance. At that point, we weren’t looking at using it for prediction and personalization, but just for analytics and reporting.

What we saw was against an indexed Postgres database. We had done some optimization on the data. Our queries were running more than 1,000 percent faster, and Vertica was scaling pretty linearly, whereas with Postgres, when we put more data into the tables, they just started choking and just died completely.

For me, it allowed me to actually do my job and have my team do their jobs, which is a pretty big metric of success.

The other thing is that with a relatively small cluster, we can support hundreds of people and reports directly accessing the database, a dozen analysts or people who directly query information out of the database, and all of our personalization activities simultaneously with minimal performance hiccups. That’s a big metric of success.

Gardner: Pete, how do you judge this? What are the important metrics? Maybe you could wow us with some of your speeds and feeds, too.

Fishman: I have similar feedback as Rob, which is a comparing against a Postgres database. The speeds are at least one -- and probably closer to two or better -- order of magnitude faster. Certainly on the cost side, it's important with data to consider the whole cost. So this is sort of a theme.

End-to-end costs

There is a cost in a variety of managing and teasing out the useful insights that aren't necessarily in the sticker price. When considering a data solution, people should consider the end-to-end costs. What's really the cost per insight, as opposed to the cost per terabyte or the cost per whatever.

We certainly feel that Vertica has been our best solution. We've been customers for over three years. So it's quite a long relationship. I couldn’t imagine going back to a multi-day query, or something like that.

Gardner: So on that important new metric of cost-per-insight, do you see a trend for that?

Fishman: One thing that Davide mentioned is that he's forecasting how much data he will be putting into Vertica. I'm a forecaster myself by trade. Back in 2010, we were doing some estimates of where we would be by the end of 2011 in terms of our data volumes. This is a pretty simple extrapolation, and I got it wrong by at least an order of magnitude.
Tripping over really valuable insights can happen a lot more easily than when you're more naïve about it.

What we found is that when you start to get real insights from data, you want to get a little bit more, collect it maybe here or there. Also, as our product was growing, we faced some real exponential growth on the data and adopted clever solutions for maximizing that metric that we care about -- cost per insight, or minimizing the cost for insight.

Gardner: But you're not willing to predict if that's going to go up or down based on your efficiency and the use of the technology?

Fishman: There are many things going on simultaneously. So tripping over really valuable insights can happen a lot more easily than when you're more naïve about it. Essentially, you're facing headwinds in that. Finding insights become harder. At the same time, you have larger data volumes and some economies of scale there. So there are a lot of things simultaneously interacting, but clearly one thing to drive down that metric is best-in-breed tools.

Gardner: Of course, best to get the information of the people who can use it than to simply look to cut cost.

Fishman: Of course. If you view analytics as a cost center, that's the wrong view. It should be aimed at optimizing revenue streams. We micro-optimize the product, we micro-optimize sales and marketing, the business. Analytics is about improving everybody at their job, making data available to allow people to be more effective.

Gardner: Well, great. I'm afraid we will have to leave it there. We've been learning about how various organizations are developing the means to far better analyze their customers, and these are some impressive organizations with very large sets of customers and data that go along with that.

We've seen how they deployed in HP Vertica Analytics Platform to provide better analytics to their internal users, and then, in some cases, back out to the very customers that they are gathering data from. So a big thank you to our guests, Rob Winters, Director of Reporting and Analytics at Spil Games based in Amsterdam. Thanks so much.

Winters: Thank you.

Gardner: And we've also been joined by Davide Conforti, Business Intelligence Director at Jobrapido in Milan. Thank you, David.

Conforti: Thank you, guys. It's been a pleasure.

Gardner: And also Pete Fishman, Director of Analytics at Yammer in San Francisco. Thanks, Pete.

Fishman: My pleasure. Thank you very much.

Gardner: And thanks to you all for joining us for this special HP Discover Performance Podcast coming to you from the HP Vertica Big Data Conference in Boston.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP Sponsored Discussions. Thanks again for joining us, and do come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast on how firms are using HP Vertica to gain more and faster insight from customer actions and interaction. Copyright Interarbor Solutions, LLC, 2005-2013. All rights reserved.

You may also be interested in: