BriefingsDirect Transcripts: data analytics

Showing posts with label data analytics. Show all posts

Monday, October 05, 2015

How Analytics as a Service Changes the Game and Expands the Market for Big Data Value

Transcript of a BriefingsDirect discussion on how cloud models propel big data as a service benefits.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner

Our next big-data thought leadership discussion highlights how big-data analytics as a service expands the market for advanced analytics and insights. We'll see how bringing analytics to a cloud services model allows smaller and less data-architecture-experienced firms to benefit from the latest in big-data capabilities. And we'll learn how Dasher Technologies is helping usher in this democratization of big data.

Here to share how big data as a service has evolved, we're joined by Justin Harrigan, Data Architecture Strategist at Dasher Technologies in Campbell, California. Welcome, Justin.

Justin Harrigan: Hi, Dana. Thanks for having me.

Gardner: We're glad you could join us. We are also here with Chris Saso, Senior Vice President of Technology at Dasher Technologies. Welcome, Chris.

Read more on tackling big data analytics
Learn how the future is all about fast data
Find out how big data trends affect your business

Chris Saso: Hi, Dana. Looking forward to our talk.

Gardner: Justin, how have big-data practices changed over the past five years to set the stage for multiple models when it comes to leveraging big-data?

Harrigan: Back in 2010, we saw big data become mainstream. Hadoop became a household name in the IT industry, doing scale-out architectures. Linux databases were becoming common practice. Moving away from traditional legacy, smaller, slower databases allowed this whole new world of analytics to open up to previously untapped resources within companies. So data that people had just been sitting on could now be used for actionable insights.

Harrigan

Fast forward to 2015, and we've seen big data become more approachable. Five years ago, only the largest organizations or companies that were specifically designed to leverage big-data architectures could do so. The smaller guys had maybe a couple of hundred or even tens of terabytes, and it required too much expertise or too much time and investment to get a big-data infrastructure up and running.

Today, we have approachable analytics, analytics as a service, hardened architectures that are almost turnkey with back-end hardware, database support, and applications -- all integrating seamlessly. As a result, the user on the front end, who is actually interacting with the data and making insights, is able to do so with very little overhead, very little upkeep, and is able to turn that data into business-impact data, where they can make decisions for the company.

Gardner: Justin, how big of an impact has this had? How many more types of companies or verticals have been enabled to start exploring advanced, cutting-edge, big-data capabilities? Is this a 20 percent increase? Perhaps almost any organization that wants to can start doing this.

Tipping point

Harrigan: The tipping point is when you outgrow your current solutions for data analytics. Data analytics is nothing new. We've been doing it for more than 50 years with databases. It’s just a matter of how big you can get, how much data you can put in one spot, and then run some sort of query against it and get a timely report that doesn’t take a week to come back or that doesn't time out on a traditional database.

Saso

Almost every company nowadays is growing so rapidly with the type of data they have. It doesn’t matter if you're an architecture firm, a marketing company, or a large enterprise getting information from all your smaller remote sites, everyone is compiling data to create better business decisions or create a system that makes their products run faster.

For people dipping their toes in the water for their first larger dataset analytics, there's a whole host of avenues available to them. They can go to some online providers, scale up a database in a couple of minutes, and be running.

They can download free trials. HP Vertica has a community edition, for example, and they can load it on a single server, up to terabytes, and start running there. And it’s significantly faster than traditional SQL.

It’s much more approachable. There are many different flavors and formats to start with, and people are realizing that. I wouldn’t even use the term big data anymore; big data is almost the norm.

Gardner: I suppose maybe the better term is any data, anytime.

Harrigan: Any data, anytime, anywhere, for anybody.

Gardner: I suppose another change over the past several years has been an emphasis away from batch processing, where you might do things at an infrequent or occasional basis, to this concept that’s more applicable to a cloud or an as-a-service model, where it’s streaming, continuous, and then you start reducing the latency down to getting close to real time.

Are we starting to see more and more companies being able to compress their feedback, and start to use data more rapidly as a result of this shift over the past five years or so?

Harrigan: It’s important to address the term big data. It’s almost like an umbrella, almost like the way people use cloud. With big data, you think large datasets, but you mentioned speed and agility. The ability to have real-time analytics is something that's becoming more prevalent and the ability to not just run a batch process for 18 hours on petabytes of data, but having a chart or a graph or some sort of report in real time. Interacting with it and making decisions on the spot is becoming mainstream.

We did a blog post on this not long ago, talking about how instead of big data, we should talk about the data pipe. That’s data ingest or fast data, typically OLTP data, that needs to run in memory or on hardware that's extremely fast to create a data stream that can ingest all the different points, sensors, or machine data that’s coming in.

Smarter analysis

Then we've talked about smarter analytic data that required some sort of number-crunching dataset on data that was relevant, not data that was real-time, but still fairly new, call it seven days or older and up to a year. And then, there's the data lake, which essentially is your data repository for historical data crunching.

Those are three areas you need to address when you talk about big data. The ability to consume that data as a service is now being made available by a whole host of companies in very different niches.

It doesn’t matter if it’s log data or sensor data, there's probably a service you can enable to start having data come in, ingest it, and make real-time decisions without having to stand up your own infrastructure.

Gardner: Of course, when organizations try to do more of these advanced things that can be so beneficial to their business, they have to take into consideration the technology, their skills, their culture -- people, process and technology, right?

Chris, tell us a bit about Dasher Technologies and how you're helping organizations do more with big-data capabilities, how you address this holistically, and this whole approach of people, process and technology.

Dasher has built up our team to be able to have a set of solutions that can help people solve these kinds of problems.

Saso: Dasher was founded in 1999 by Laurie Dasher. To give you an idea of who we are, we're a little over 65 employees now, and the size of our business is somewhere around $100 million.

We started by specializing in solving major data-center infrastructure challenges that folks had by actually applying the people, process and technology mantra. We started in the data center, addressing people’s scale out, server, storage, and networking types of problems. Over the past five or six years, we've been spending our energy, strategy, and time on the big areas around mobility, security, and of course, big data.

As a matter of fact, Justin and I were recently working on a project with a client around combining both mobility information and big data. It’s a retail client. They want to be able to send information to a customer that might be walking through a store, maybe send a coupon or things like that. So, as Justin was just talking about, you need fast information and making actionable things happen with that data quickly. You're combining something around mobility with big data.

Dasher has built up our team to be able to have a set of solutions that can help people solve these kinds of problems.

Gardner: Justin, let’s flesh that out a little bit around mobility. When people are using a mobile device, they're creating data that, through apps, can be shared back to a carrier, as well as application hosts and the application writers. So we have streams of data now about user experience and activities.

We also can deliver data and insights out to people in the other direction in that real-time of fashion, a closed loop, regardless of where they are. They don’t have to be at their desk, they don’t have to be looking at a specific business-intelligence (BI) application for example. So how has mobility changed the game in the past five years?

Capturing data

Harrigan: Dana, it’s funny you brought up the two different ways to capture data. Devices can be both used as a sensor point or as a way to interact with data. I remember seeing a podcast you did with HP Vertica and GUESS regarding how they interacted with their database on iPads.

In regards to interacting with data, it has become not only useful to data analysts or data scientists, but we can push that down into a format so lower-level folks who aren't so technical. With a fancy application in front of them, they can use the data as well to make decisions for companies and actually benefit the company.

You give that data to someone in a store, at GUESS for example, who can benefit by understanding where in the store to put jeans to impact sales. That’s huge. Rather than giving them a quarterly report and stuff that's outdated for the season, they can do it that same day and see what other sites are doing.

On the flip side, mobile devices are now sensors. A mobile device is constantly pinging access points over wi-fi. We can capture that data and, through a MAC address as an unique identifier, follow someone as they move through a store or throughout a city. Then, when they return, that person’s data is captured into a database and it becomes historical. They can track them through their device.

Read more on tackling big data analytics
Learn how the future is all about fast data
Find out how big data trends affect your business

It allows a whole new world of opportunities in terms of the way retailers interact with where they place merchandise, the way they interact with how they staff stores to make sure they have the proper amount of people for the certain time, what weather impact has on the store.

Lastly, as Chris mentioned, how do we interact with people on devices by pushing them data that's relevant as they move throughout their day?

The next generation of big data is not just capturing data and using it in reports, but taking that data in real time and possibly pushing it back out to the person who needs it most. In the retail scenario, that's the end users, possibly giving them a coupon as they're standing in front of something on a shelf that is relevant and something they will use.

Gardner: So we're not just talking about democratization of analytics in terms of the types of organizations, but now we're even talking about the types of individuals within those organizations.

Do you have any examples of some Dasher’s clients that have been able to exploit these advances and occurrences with mobile and cloud working in tandem, and how that's produced some sort of a business benefit?

Business impact

Harrigan: A good example of a client who leveraged a large dataset is One Kings Lane. They were having difficulty updating the website their users were interacting with because it’s a flash shopping website, where the information changes daily, and you have to be able to update it very quickly. Traditional technologies were causing a business impact and slowing things down.

They were able to leverage a really fast columnar database to make these changes and actually grow the inventory, grow the site, and have updates happen in almost real time, so that there was no impact or downtime when they needed to make these changes. That's a real-world example of when big data had the direct impact on the business line.

Gardner: Chris, tell us a little bit about how Dasher works with Hewlett Packard Enterprise technologies, and perhaps even some other HP partners like GoodData, when it comes to providing analytics as a service?

Once Vertica . . . has done the analysis, you have to report on that and make it in a nice human-readable form or human-consumable form.

Saso: HP has been a longtime partner from the very beginning, actually when we started the company. We were a partner of Vertica before HP purchased them back in 2011.

We started working with Vertica around big data, and Justin was one of our leads in that area at the time. We've grown that business and in other business units within HP to combine solutions, Vertica, big data, and hardware, as Justin was just talking about. You brought up the applications that are analyzing this big data. So we're partners in the ecosystem that help people analyze the data.

Once HP Vertica, or what have you, has done the analysis, you have to report on that and make it in a nice human-readable form or human-consumable form. We’ve built out our ecosystem at Dasher to have not only the analytics piece, but also the reporting piece.

Gardner: And on the as a service side, do you work with GoodData at all or are you familiar with them?

Saso: Justin, maybe you can talk a little bit about that. You've worked with them more I think on their projects.

Optimizing the environment

Harrigan: GoodData is a large consumer of Vertica and they actually leverage it for their back-end analytics platform for the service that they offer. Dasher has been working with GoodData over the past year to optimize the environment that they run on.

Vertica has different deployment scenarios, and you can actually deploy it in a virtual-machine (VM) environment or on bare-metal. And we did an analysis to see if there was a return on investment (ROI) on moving from a virtualized environment running on OpenStack to a bare-metal environment. Through a six-month proof of concept (POC), we leveraged HP Labs in Houston. We had a four-node system setup with multiple terabytes of data.

We saw 4:1 increase in performance in moving from a VM with the same resources to a bare-metal machine. That’s going to have a significant impact on the way they move data in their environment in the future and how they adjust to customers with larger datasets.

Gardner: When we think about optimizing the architecture and environment for big data, are there any other surprises or perhaps counter-intuitive things that have come up, maybe even converged infrastructure for smaller organizations that want to get in fast and don’t want to be too concerned with the architecture underlying the analytics applications?

That’s going to have a significant impact on the way they move data in their environment in the future and how they adjust to customers with larger datasets.

Harrigan: There's a tendency now with so many free solutions out there to pick a free solution, something that gets the job done now, something that grows the business rapidly, but to forget about what businesses will need three years down the road, if it's going to grow, if it’s going to survive.

There are a lot of startups out there that are able to build a big data infrastructure, scale it to 5,000 nodes, and then they reach a limit. There are network limits on how fast the switch can move data between nodes, constantly pushing the limits of 10 Gbyte, 40 Gyte and soon 100 Gbyte networks to keep those infrastructures up.

Depending on what architecture you choose, you may be limited in the number of nodes you can go to. So there are solutions out there that can process a million transactions per second with 100 nodes, and then there are solutions that can process a million transactions per second with 20 nodes, but may cost slightly more.

If you think long-term, if you start in the cloud, you want to be able to move out of the cloud. If you start with an open ecosystem, you want to make sure that your hardware refresh is not going to cost so much that the company can’t afford it three years down the road. One of the areas we help consult with, when picking different architectures, is thinking long-term. Don't think six weeks down the road, how are we going to get our service up and running? Think, okay, we have a significant client install base, how we are going to grow the business from three to five years and five to 10 years?

Gardner: Given that you have quite a few different types of clients, and the idea of optimizing architecture for the long-term seems to be important, I know with smaller companies there’s that temptation to just run with whatever you get going quickly.

What other lessons can we learn from that long-term view when it comes to skills, security, something more than the speeds and feeds aspects of thinking long term about big data?

Numerous regulations

Harrigan: Think about where your data is going to reside and the requirements and regulations that you may run into. There are a million different regulations we have to do now with HIPAA, ITAR, and money transaction processes in a company. So if you ever perceive that need, make sure you're in an ecosystem that supports it. The temptation for smaller companies is just to go cloud, but who owns that data if you go under, or who owns that data when you get audited?

Another problem is encryption. If you're going to start gaining larger customers once you have a proven technology or a proven service, they're going to want to make sure that you're compliant for all their regulations, not just your regulations that your company is enforcing.

There's logging that they're required to have, and there is going to be encryption and protocols and the ability to do audits on anyone who is accessing the data.

Gardner: On this topic of optimizing, when you do it right, when you think about the long term, how do you know you have that right? Are there some metrics of success? Are there some key performance indicators (KPIs) or ROIs that one should look to so they know that they're not erring on the side of going too commercial or too open source or thinking short term only? Maybe some examples of what one should be looking for and how to measure that.

If you implement a system and it costs you $10 million to run and your ROI is $5 million, you've made a bad decision.

Harrigan: That’s going to be largely subjective to each business. Obviously if you're just going to use a rule of thumb, it shouldn't cost you more money than it makes you. If you implement a system and it costs you $10 million to run and your ROI is $5 million, you've made a bad decision.

The two factors are the value to the business. If you're a large enterprise and you implement big data, and it gives you the ability to make decisions and quantify those decisions, then you can put a number to that and see how much value that big-data system is creating. For example, a new marketing campaign or something you're doing with your remote sites or your retail branches and it’s quantifiable and it’s having an impact on the business,

The other way to judge it is impact on business. So, for ad serving companies, the way they make money is ad impressions, and the more ad impressions they can view, for the least cost in their environment, the higher return they're going to make. The delta is between the infrastructure costs and the top line that they get to report to all their investors.

If they can do 56 billion ad impressions in a day, and you can double that by switching architectures, that’s probably a good investment. But if you can only improve it by 10 percent by switching architectures, it’s probably too much work for what it’s worth.

Gardner: One last area on this optimization idea. We've seen, of course, organizations subjectively make decisions about whether to do this on-premises, maybe either virtualized or on bare metal. They will do their cost-benefit analysis. Others are looking at cloud and as a service model.

Over time, we expect to have a hybrid capability, and as you mentioned, if you think ahead that if you start in the cloud and move private, or if you start private you want to be able to move to the cloud, we're seeing the likelihood of more of that being able to move back and forth.

Thinking about that, do you expect that companies will be able to do that? Where does that make the most sense when it comes to data? Is there a type of analysis that you might want to do in a cloud environment primarily, but other types of things you might do private? How do we start to think about breaking out where on the spectrum of hybrid cloud set of options one should be considering for different types of big-data activity?

Either-or decision

Harrigan: In the large data analytics world, it’s almost an either-or decision at this time. I don’t know what it will look like in the future.

Workloads that lend themselves extremely well to the cloud are inconsistent, maybe seasonal, where 90 percent of your business happens in December. Seasonal workloads like that lend themselves extremely well to the cloud.

Or, if your business is just starting out, and you don't know if you're going to need a full 400-node cluster to run whatever platform or analytics platform you choose, and the hardware sits idle for 50 percent of the time, or you don’t get full utilization. Those companies need a cloud architecture, because they can scale up and scale down based on needs.

Companies that benefit from on-premise are ones that can see significant savings by not using cloud and paying someone else to run their environment. Those companies typically pin the CPU usage meter at 100 percent, as much as they can, and then add nodes to add more capacity.

The best advice I could give is, if you start in the cloud or you start on bare metal, make sure you have agility and you're able to move workloads around. If you choose one sort of architecture that only works in the cloud and you are scaling up and you have to do a rip and replace scenario just to get out of the cloud and move to on-premise, that’s going to be significant business impact.

One of the reasons I like HP Vertica is that it has a cloud instance that can run on a public cloud. That same instance, that same architecture runs just as well on bare metal, only faster.

Gardner: Chris, last word to you. For those organizations out there struggling with big data, trying to figure out the best path, trying to think long term, and from an architectural and strategic point of view, what should they consider when coming to an organization like Dasher? Where is your sweet spot in terms of working with these organizations? How should they best consider how to take advantage of what you have to offer?

Saso: Every organization is different, and this is one area where that's true. When people are just looking for servers, they're pretty much all the same. But when you're actually trying to figure out your strategy for how you are going to use big-data analytics, every company, big or small, probably does have a slightly different thing they are trying to solve.

That's where we would sit down with that client and really listen and understand, are they trying to solve a speed issue with their data, are they trying to solve massive amounts of data and trying to find the needle in a haystack, the golden egg, golden nugget in there? Each of those approaches certainly has a different answer to it.

Read more on tackling big data analytics
Learn how the future is all about fast data
Find out how big data trends affect your business

So coming with your business problem and also what you would like to see as a result -- we would like to see x-number of increase in our customer satisfaction number or x-number of increase in revenue or something like that -- helps us define the metric that we can then help design toward.

Gardner: Great, I'm afraid we will have to leave it there. We've been discussing how optimizing for a big-data environment really requires a look across many different variables. And we have seen how organizations were able to spread the benefits of big data more generally now, not only the type of organization that can take advantage of it, but the people within those organizations.

We've heard how Dasher Technologies uses advanced technology like HP and HP Vertica to help organizations bring the big-data capabilities to more opportunities for business benefits and across more types of companies and vertical industries.

So a big thank you to our guests, Justin Harrigan, Data Architecture Strategist at Dasher Technologies, and Chris Saso, Senior Vice President of Technology at Dasher Technologies.

And I'd like to thank our audience for joining us as well for this big data thought leadership discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a BriefingsDirect discussion on how cloud models propel big data as a service benefits. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

Tuesday, August 18, 2015

The Future of Business Intelligence as a Service with GoodData and HP Vertica

Transcript of a BriefingsDirect discussion on how GoodData helps customers gain new insights into their businesses with on-demand data analytics.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner

Our next big data case study interview highlights how GoodData expands the realms and possibilities for delivering business intelligence (BI) and data warehousing as a service. We'll learn how they're exploring new technologies to make that more seamless across more data types for more types of users.

With that, we welcome Jeff Morris, Vice President of Marketing at GoodData in San Francisco. Welcome, Jeff.

Jeff Morris: Thanks very much, Dana.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

Gardner: We are also here with Chris Selland, Vice President for Business Development at HP Vertica. Welcome, Chris.

Chris Selland: Thanks, Dana. Great to be here with you both.

Gardner: First, Jeff, for those who might not be that familiar, tell us about GoodData, what you do and why it's different.

Morris: GoodData is an analytics platform as a service (PaaS). We cover the full spectrum end-to-end use case of creating an analytic infrastructure as a service and delivering that to our customers.

Morris

We take on the challenges of collecting the data, whatever it is, structured and unstructured. We use a variety of technologies as appropriate, as we do that. We warehouse it in our multitenant, massively scalable data warehouse that happens to be powered by HP Vertica.

We then combine and integrate it into whatever the customer’s particular key performance indicators (KPIs) are. We present that in aggregate in our extensible analytics engine and then present it to the end users through desired dashboards, reports, or discoverable analytics.

Our business is set up such that about half of our business operates on an internal use case, typically a sales and marketing and social analytic kind of use case. The other half of our business, we call "Powered by GoodData." and those customers are embedding the GoodData technology in their own products. So we have a number of companies creating these customer-facing data products that ultimately generate new streams of revenue for their business.

40,000 customers

We've been at this since 2007. We're serving about 40,000 customers at this point and enjoying somewhere around 2.4 million data uploads a week. We've built out the service such that it's massively scalable. We deliver incredibly fast time to market. Last quarter, about two thirds of our deployments were delivered within 16 weeks or less.

One of the divisions of HP, in fact, deployed GoodData in less than six weeks. They are giving their first set of KPIs and delivering that value to them. What’s making us different in the marketplace right now is that we're eliminating all of the headaches associated with creating your own big data lake-style BI infrastructure and environment.

What we end up doing is affording you the time to focus on the analytics and the results that you gain from them—without having to manage the back-end operations.

Gardner: What’s interesting to me is that you mentioned PaaS for BI. Instead of developing applications and then having a production environment that’s seamlessly available to you, you're creating analytic applications on datasets that are contributed to your platform. Is that right?

Morris: Yes, indeed. The datasets themselves also tend to be born in the cloud. As I said, the types of applications that we're building typically focus on sales and marketing and social, and e-commerce related data, all of which are very, very popular, cloud-based data sources. And you can imagine they're growing like crazy.

We see a leaning in our customer base of integrating some on-premise information, typically from their legacy systems, and then marrying that up with the Salesforce, or the market data or social information that they want to integrate and build a full view of their customers -- or a full exposure of what their own applications are doing.

What we end up doing is affording you the time to focus on the analytics and the results that you gain from them—without having to manage the backend operations.

Gardner: So, you're really providing an excellent example of how HP Vertica is a cloud-borne analytics platform and implementation. That’s kind of interesting.

But I wonder whether any of your clients, maybe not so much in the media, but some of the more traditional verticals like healthcare, retail, or government, are trying to do this across a hybrid model. For example, they're doing some BI and they have warehouses on-premises or maybe other hosting models, but they also want to start to dabble in moving this to the cloud and taking advantage of what the cloud does best. Are we now on the vanguard of hybrid BI?

Morris: We're getting there, and there are certainly some industries are more cloud friendly than others right now. Interestingly, the healthcare space is starting to, but they're still nascent. The financial services industry is still nascent. They're very protective of their information. But retailers, e-commerce organizations, technology ISVs, and digital media agencies have adopted the cloud-based model very aggressively.

We're seeing a terrific growth and expansion there and we do see use cases right now where we're beginning to park the cloud-based environment alongside your more traditional analytics environments to create that hybrid effect. Often, those customers are recognizing that the speed at which data is growing in the cloud is driving them to look for a solution like ours.

Gardner: Chris, how unique is GoodData in terms of being all cloud moving toward hybrid, and does this really provide a poster child, in a sense, for Vertica as a service?

Special relationship

Selland: GoodData is certainly a very special partner and a very special relationship for us. As you said, Vertica is fundamentally a software platform that was purpose-built for big data that is absolutely cloud-enabled. But GoodData is the best representation of the partner who has taken our platform and then rolled out service offerings that are specifically designed to solve specific problems. It's also very flexible and adaptable.

Selland

So, it’s a special partnership and relationship. It's a great proof point for the fact that the HP Vertica platform absolutely was designed to be running in the cloud for those customers who want to do it.

As Jeff said, though, it really varies greatly by industry. A large majority of the customers in our customer advisory board (CAB), which tend to be some of our largest customers and some pretty well-known industries, were saying how they will never put their data in the cloud.

Never is a very long time, but at the same time, there are other industries that are adopting it very rapidly. So there is a rate of change that’s going on in the industry. It varies by size of company, by the type of competitive environment, and by the type of data. And yes, there is a lot of hybridization going on out there. We're seeing more of the hybridization in existing organizations that are migrating to the cloud. There's a lot of new breed companies who started in the cloud and have every intent of staying there.

But there's a lot of dynamism in this industry, a lot of change, and this is a partnership that is a true win-win. As I said, it's a very special relationship for both companies.

Gardner: Jeff, given that we have such variability, vertical by vertical, company by company, green-field versus an established company will behave differently vis-à-vis their architecture and their IT implementation. You need to be ready for any and all of that, and I suppose Vertica does as well.

We're triple clustering each set of instances of our vertical warehouses, so they are always reliable and redundant.

We're hearing also more than just HP Vertica here. We're talking about Haven, which includes Hadoop, Autonomy, security and applications. Is there a path that you see whereby you can try to be as many things to as many types of customer and vertical industries?

I'm thinking about Hadoop, security, and bringing some of the more enterprise-caliber KPIs and SLAs, so that some of those folks that are hesitant to move at least some their data in some ways to the cloud would move in that direction. Is that a vision for you? Maybe you could explain where you see this going on a hybrid basis.

Morris: Absolutely. The HP Haven-style architecture is a vision in a direction that we are going. We do use Hadoop right now for special use cases of expanding and providing structure, creating structure out of unstructured information for a number of our customers, and then moving that into our Vertica-based warehouse.

The beauty of Vertica in the cloud is the way we have set this up and this also helps address both the security and the reliability issues that might be a thought of as issues in the cloud. We're triple clustering each set of instances of our vertical warehouses, so they are always reliable and redundant.

Daily updates

We, like the biggest enterprises out there, are vigilantly maintaining our network. We update our network on behalf of our customers on a daily basis, as necessary. We roll out and maintain a very standardized operating environment, including an open stack-based operating environment, so that customers never need to even care about what versions of the SSL libraries exist or what versions of the VPN exist.

We're taking care of all of that really deep networking and things that the most stalwart enterprise-style IT architects are concerned about. We have to do that, too, and we have to do it at scale for this multi-tenant kind of use-case.

As I said, the architecture itself is very Haven-like, it just happens to be exclusively in the cloud -- which we find interesting and unique for us. As for the Hadoop piece, we don’t use Autonomy yet, but there are some interesting use cases that we are exploring there. We use Vertica in a couple of places in our architecture, not only that central data warehouse, but we also use it as a high-performance storage vehicle for our analytic data marts.

So when our customers are pushing a lot of information through our system, we're tapping into Vertica’s horsepower in two spots. Then, our analytic engine can ingest and deal with those massive amounts of data as we start to present it to customers.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

On the Haven architecture side, we're a wonderful example of where Haven ends up in the cloud. For the applications themselves, the kind of things that customers are creating, might be these hybrid styles where they're drawing legacy information in from their existing on-premise systems. Then, they're gathering up, as I said before, their sales and marketing information and their social information.

The one that we see as a wonderful green field for us is capturing social information. We have our own social analytic maturity model that we describe to customers and partners on how to capitalize on your campaigns and how to maximize your exposure through every single social channel you can think of.

We're very proficient at that, and that's what's really driving the immense sizes of data that our customers are asking for right now. Where we used to talk in tens of terabytes for a big system, we're now talking in the world of hundreds, multiple hundreds of terabytes, for a system. Case by case by case, we're seeing this really take off.

Gardner: It's fine to talk about this as an abstraction, but it's really useful to hear some examples. Do you have any companies, either named or unnamed, that provide a great use case example of PaaS, for BI apps that take advantage of some of the attributes of HP Haven and Vertica?

Where we used to talk in tens of terabytes for a big system, we're now talking in the world of hundreds, multiple hundreds of terabytes, for a system.

Morris: One of our oldest and most dear customers is Zendesk. They have a very successful customer-support application in the cloud. They provide both a freemium model and degrees of for-fee products to their customers.

And the number one reason why their customers upgrade from freemium to general and then general to the gold level of product is the analytics that they're supplying inside of there. They very recently announced a whole series of data products themselves, all powered by GoodData, as the embedded analytic environment within Zendesk.

We have another customer, Service Channel which is a wonderful example of marrying together two very disparate user communities. Service Channel is a facility’s management enterprise resource planning (ERP) application. They bring together the facility managers of your favorite brick-and-mortar retailers with the suppliers who provide those retail facilities service, janitorial services, air-conditioning guy, the plumbers.

Disparate customers

Marrying disparate types of customers, they create their own data products as well, where they are integrating third-party information like weather data. They score their customers, both the retailers as well as the suppliers, and benchmark them against each other. They compare how well one vendor provides service to another vendor and they also compare how much one of the retailers spends on maintaining their space.

Of course, Apple gets incredibly high marks. RadioShack, right now, as they transition their stores, not so much. Service Channel knew this information long before the industry did, because they're watching spend. They, too, are starting to create almost a bidding network.

When they integrated their weather data into the environment, they started tracking and saying, "Apple would like to gain first right of refusal on the services that they need." So if Apple’s air conditioning goes out, the service provider comes in and fixes the air-conditioning sooner than Best Buy and all of their competitors. And they'll bid up for that. So they've created almost a marketplace. As I said before, these data products are really quite an advantage for us.

Gardner: Looking a bit to the future, we've heard the interest in moving from predictive to prescriptive analytics. It seems to me that that’s really a factor of the quality of the data in getting data from different sources and bring it together, something you can do in a cloud more easily or more efficiently than server by server, or cluster by cluster.

We feel like we're creating a central location where analysts, data scientists, and our regular IT can all come together and build a variety of analytic applications.

What kind of services should we envision as the analytics as a business model unfolds in the cloud and you can start to do joins across different types of data for an industry, rather than just an enterprise? Is there an opportunity to get that prescriptive value as a provider with the past capability? It sounds very exciting and interesting. What's coming next?

Morris: Most definitely, we're seeing a number of great opportunities, and many are created and developed by the technologies we've chosen as our platform. We love the idea of creating not only predictive, but prescriptive, types of applications in use cases on top of the GoodData environment. We have customers that are doing that right now and we expect to see them continue to do that.

What I think will become really interesting is when the GoodData community starts to share their analytic experiences or their analytic product with each other. We feel like we're creating a central location where analysts, data scientists, and our regular IT can all come together and build a variety of analytic applications, because the data lives in the same place. The data lives in one central location, and that’s an unusual thing. In most of the industry your data is still siloed. Either you keep it to yourself on-premise or your vendors keep it to themselves in the cloud and on-premise.

But we become this melting pot of information and of data that can be analytically evaluated and processed. We love the fact that Vertica has its own built-in analytic functions right in the database itself. We love the fact that they run our predictive language without any other issue and we see our customers beginning to build off of that capability.

My last point about the power of that central location and the power of GoodData is that our whole goal is to free time for those data scientists and those IT people to actually perform analytics and get out of the business of maintaining the systems that make analytics available, so that you can focus on the real intellectual capital that you want to be creating.

Identifying trends

Gardner: So, Chris, to cap this off, I think we've identified some trends. We have PaaS for BI. We have hybrid BI. We have cloud data joins and ecosystems that create a higher value abstraction from data. Any thoughts about how this comes together, and does this fit into the vision that you have at HP Vertica and that you're seeing in other parts of your business?

Selland: We're very much only at the front end of the big data analytics revolution. I ultimately don’t think we are going to be using the term "big data" in 10 years.

I often compare big data today to eBusiness 10, 12 years ago. Nobody uses that term anymore, but that was when everything was going online, and now everything is online, and the whole world has changed. The same thing is happening with analytics today.

With a hundred times more data we can actually get 10,000 times more insight. And that's true, but it's not just the amount of data; it's the ability to cross-correlate. That's the whole vision of what Jeff was just talking about that GoodData is trying to do.

We're very much only at the front end of the big data/analytics revolution. I ultimately don’t think we are going to be using the term "big data" in 10 years.

It's the vision of Haven, to bring in all types of data and to be able to look at it more holistically. One of my favorite examples, just to make that concrete, is that there is an airline we were talking to. They were having a customer service issue. They were having a lot of their passengers tweeting angrily about them, and they were trying to analyze the social media data to figure out how to make this stop and how to respond.

In a totally separate part of the organization, they had a predictive maintenance project, almost an Internet-of-things (IoT) type of project, going on. They were looking at data coming off the fleet, and trying to do better job of keeping their flights on time.

If you think about this, you say, "Duh." There was a correlation between the fact that they were having service problems and that the flights were late with the fact that the passengers were angry. Suddenly, they realized that maybe by focusing less on the social data in this case, or looking at that as the symptom as opposed to cause, they were able to solve the problem much more effectively. That's a very, very simple example.

I cite that because it makes real for people that it's when you really start cross-correlating data you wouldn't normally think belong together -- social data and maintenance data, for example -- you get true insights. It's almost a silly simple example, but those types of examples we're going to see much more. The more of this we can do, the more power we are going to get. I think that the front end of the revolution is here.

Gardner: And then those insights become empirical, and not just intuitive or based on someone's observation. You have hard evidence.

Selland: Correct, exactly.

Gardner: All right. I'm afraid we have to leave it there. We have been learning about how GoodData delivers a platform as a service around business intelligence, built on HP Vertica, in the cloud. I'd like to thank our guests, Jeff Morris, the Vice President of Marketing at GoodData, and Chris Selland, Vice President for Business Development at HP Vertica.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

And I'd like to thank our audience as well for joining us for this special new style of IT discussion. I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Transcript of a Briefings Direct discussion on how GoodData is helping its customers gain new insights into their businesses with data analytics. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

Monday, August 10, 2015

How ECommerce Sites Harvest Big Data Across Multiple Clouds

Transcript of a BriefingsDirect discussion on how HP Vertica helps a big-data consultancy scale workloads for ecommerce sites.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation.

Gardner

Our big data user interview highlights how a consultant is helping large ecommerce organizations better manage their big data and provide the insights that they need to thrive in a fast-paced environment.

With that, please join me in welcoming our guest, Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. Welcome, Jimmy.

Jimmy Mohsin: Thank you, Dana.

Gardner: We've been hearing an awful lot of about some extraordinary situations where the fast-paced environment and data volumes that users are dealing with have left them with a need for a much better architecture.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

Tell me what you are seeing in the marketplace? How desperate are people to find the right big data architecture?

Mohsin There's a lot of interest in trying to deal with large data volumes, not only large data volumes, but also data that changes rapidly. Now, there are many companies that have very large datasets, some in terabytes, some in petabytes and then they're getting live feeds.

The data is there and it’s changing rapidly. The traditional databases sometimes can’t handle that problem, especially if you're using that database as a warehouse and you're reporting against it.

Basically, we have kind of a moving-target situation. With HP Vertica, what we've seen is the ability to solve that problem in at least some of the cases that I've come across, and I can talk about specific use cases in that regard.

Input/output issues

Gardner: Before we get into a specific use case, I'm interested particularly in some of these input/output issues. People are trying to decide how to move the data around. They're toying with cloud. They're trying to bring data for more types of traditional repositories. And, as you say, they're facing new types of data problems with streaming and real-time feeds.

How do you see them beginning this process when they have to handle so many variables? Is it something that’s an IT architecture, or enterprise architecture, or data architecture? Who's responsible for this, given that it’s now a rather holistic problem?

Mohsin In my present project, we ran into that. The problem is that many companies don't even have a well defined data-architecture team. Some of them do. You'll find a lot of companies with an enterprise-architect role and you'll have some companies with a haphazard definition of an architectural group.

Mohsin

Net-net, at least at this point, unless companies are more structured, it becomes a management issue in the sense that someone at the leadership level needs to know who has what domain knowledge and then form the appropriate team to skin this cat.

I know of a recent situation where we had to build a team of four people, and only one was an architect. But we built a virtual team of four people who were able to assemble and collate all the repositories that spanned 15 years and four different technology flavors, and then come up with an approach that resulted in a single repository in HP Vertica.

So there are no easy answers yet, because organizations just aren't uniformly structured.

Gardner: Well, I imagine they'll be adapting, just like we all are, to the new realities. In the meantime, tell me about a specific use case that demonstrates the intensity of scale and velocity, and how at least one architecture has been deployed to manage that?

Mohsin One of my present projects deals with one of the world's largest retailers. It's eCommerce, online selling. One of the things they do, in addition to their transactions of buying and selling, is email campaign management. That means staying in touch with the customer on the basis of their purchases, their interests, and their profiles.

One of the things we do is see what a certain customer’s buying preferences have been over the past 90 days. Knowing that and the customer’s profile, we can try to predict what their buying patterns will be. So we send them a very tailored message in that regard. In this project, we're dealing with about 150 to 160 million emails a day. So this is definitely big data.

Here we have online information coming into one warehouse as to what's happening in the world of buying and selling. Then, behind the scenes, while that information is being sent to the warehouse, we're trying to do these email campaigns.

This is where the problem becomes fairly complicated. We tried traditional relational database management systems (RDBMS), and they kind of worked, but we ran into a slew of speed and performance issues. That's really where the big-data world was really beneficial. We were able to address that problem in about a seven-month project that we ran.

Gardner: And this was using Vertica?

Large organization

Mohsin We did an evaluation. We looked at a few databases, and the corporate choice was Vertica. We saw that there is a whole bunch of big-data vendors. The issue is that many of the vendors don't have any large organizations behind them, and Vertica does. The company management felt that this was a new big database, but HP was behind it, and the fact that they also use HP hardware helped a lot.

They chose Vertica. The team I was managing did a proof of concept (POC) and we were able to demonstrate that Vertica would be able to handle the reporting that is tied to the email campaign management. We ran a 90 day POC, and the results were so positive that there was an interest in going live. We went live in about another 90 days, following a 90-day POC.

Gardner: I understand that Vertica is quite versatile. I've heard of a number of ways in which it's used technically. But this email campaign problem almost sounds like a transactional issue, a complex event processing issue, or a transfer agent scaling issue. How does big data, Vertica, and analytics come to bear on this particular problem?

Mohsin It's exactly what you say it is. As we are reporting and pushing out the campaigns, new information is coming in every half hour, sometimes even more frequently. There's a live feed that's updating the warehouse. While the warehouse is being updated, we want to report against it in real time and keep our campaigns going.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

The key point is that we can't really stop any of these processes. The customers who are managing the campaigns want to see information very frequently. We can’t even predict when they would want their information. At the same time, the transactional systems are sending us live feeds.

The problem we ran into with the traditional RDBMS is that the reporting didn't function when the live feeds were underway. We couldn't run our backend email campaign reports when new data was coming in.

One of the benefits Vertica has, due to its basic architecture and its columnar design is that it's better positioned to do that. This is what we were able to demonstrate in the live POC, and nobody was going to take our word for it.

The end user said, "Take few of our largest clients. Take some of our clients that have a lot of transactions. Prove that the reports will work for those clients." That's what we did in 30 days. Then, we extended it, and then in 90 days, we demonstrated the whole thing end to end. Following that was the go-live.

Gardner: You had to solve that problem of the live feeds, the rapidity of information. Rather going to a stop, batch process, analyze, repeat, you've gained a solution to your problem.

But at the same time, it seems like you're getting data into an environment where you can analyze it and perhaps extract other forms of analysis, in addition to solving your email, eCommerce trajectory issues. It seems to me that you're now going to have the opportunity to add a new dimension of analysis to what's going on and perhaps we find these transactions more towards a customer inference benefit.

More than a database

Mohsin One of the things internally that I like to say is that Vertica isn't just a big database, it’s more than just a database. It's really a platform, because you have distributed all, you are publishing other tools. When we adopted it and went live with this technology, we first solved the feeds and speeds problem, but now we're very much positioned to use some of the capabilities that exist in Vertica.

We had Distributed R being one of them, Inference Analysis being another one, so that we can build intelligent reports. To date, we've been building those outside the RDBMS. RDBMS has no role in that. With Vertica, I call it more of a data platform. So we definitely will go there, but that would be our second phase.

As the system starts to function and deliver on the key use cases, the next stage would be to build more sophisticated reports. We definitely have the requirements and now we have the ability to deliver.

Gardner: Perhaps you could add visualization capabilities to that. You could make a data pool available to more of the constituents within this organization so that they could innovate and do experiments. That’s a very powerful stuff indeed.

Is there anything else you can tell us for other organizations that might be facing similar issues around real-time feeds and the need to analyze and react, now that you have been through this on this particular project. Are there any lessons learned for others.

One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

If you're facing transactional issues and you haven't thought about a big-data platform as part of that solution, what do you offer to them in terms of maybe lighting a light bulb in their mind about looking for alternatives to traditional middleware.

Mohsin Like so many people try to do, we tried to see if anyone else had done this. One of the issues in big data at least today is that you can’t find a whole slew of clients who have already gone live and who are in production.

There are lots of people in development, and some are live, but in our space, we couldn't find anyone who was live. We solved that issue via a quick-hit POC. The big lesson there was that we scoped the POC right. We didn’t want to do too much and we didn’t want to do too little. So that was a good lesson learned.

The other big thing is the data-migration question. Maybe, to some extent, this problem will never be solved. It's not so easy to pull data out of legacy database systems. Very few of them will give you good tools to migrate away from them. They all want you to stay. So we had to write our own tooling. We scoured the market for it, but we couldn’t find too many options out there.

Understand your data

So a huge lesson learned was, if you really want to do this, if you want to move to big data, get a handle on understanding your data. Make sure you have the domain experts in-house. Make sure you have the tooling in place, however rudimentary it might be, to be able to pull the data out of your existing database. Once you have it in the file system, Vertica can take it in minutes. That’s not the problem. The problem is getting it out.

We continue to grapple with that and we have made product enhancement recommendations. But in fairness to Vertica, this is really not something that Vertica can do much about, because this is more in the legacy database space.

Gardner: I've heard quite a few people say that, given the velocity with which they are seeing people move to the cloud, that obviously isn't part of their problem, as the data is already in the cloud. It's in the standardized architecture that that cloud is built around, if there is a platform-as-a-service (PaaS) capability, then getting at the data isn't so much of a problem, or am I not reading that correctly?

There is still a lingering fear of the cloud. People will tell you that the cloud is not secure.

Mohsin No, you're reading that correctly. The problem we have is that a lot of companies are still not in the cloud. There is still a lingering fear of the cloud. People will tell you that the cloud is not secure. If you have customer information, if you have personalized data, many organizations don't want to put it in the cloud.

Slowly, they are moving in that direction. If we were all there, I would completely agree with you, but since we still have so many on-premise deployments, we're still in a hybrid mode -- some is on-prem, some is in the cloud.

Gardner: I just bring it up because it gives yet another reason to seriously consider cloud. It’s a benefit that is actually quite powerful -- the data access and ability to do joins and bring datasets together because they're all in the same cloud.

Mohsin I fundamentally agree with you. I fundamentally believe in the cloud and that it really should be the way to go. Going through our very recent go-live, there is no way we could have the same elasticity in an on-prem is deployment that we can have in a cloud. I can pick up the phone, call a cloud provider, and have another machine the next day. I can't do that if it’s on-premise.

Again, a simple question of moving all the assets into the cloud, at least in some organizations, will take several months, if not years.

Gardner: Very good. I'm afraid we will have to leave it there. We have been discussing how a specific enterprise in the eCommerce space has solved some unique problems using big data and, in particular, the HP Vertica platform.

Become a member of myVertica today
Register now
Gain access to the free HP Vertica Community Edition

That sets the stage for a wider use of big data for transactional problems and live-feed issues. It's also why moving to cloud has also some potential benefits for speed, velocity, and dexterity when it comes to data across multiple data sources and implementations.

So with that, a big thank you to our guest, Jimmy Mohsin, Principal Software Architect at Norjimm LLC, a consultancy based in Princeton, New Jersey. Thanks, Jimmy.

Mohsin Thanks, Dana. Have a great day.

Gardner: And a big thank you to our audience as well, for joining us for the special new style of IT discussion.

I'm Dana Gardner; Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app for iOS or Android. Download the transcript. Sponsor: HP Enterprise.

Transcript of a BriefingsDirect discussion on how HP Vertica helps a big-data consultancy scale workloads for ecommerce sites. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

BriefingsDirect Transcripts

Monday, October 05, 2015

How Analytics as a Service Changes the Game and Expands the Market for Big Data Value

You may also be interested in:

Tuesday, August 18, 2015

The Future of Business Intelligence as a Service with GoodData and HP Vertica

You may also be interested in:

Monday, August 10, 2015

How ECommerce Sites Harvest Big Data Across Multiple Clouds

You may also be interested in:

Principal Analyst

Translate this Blog

Folo My Flipboard Magazines

Search Blog

Subscribe to Podcast Via iTunes

BriefingsDirect Network

Blog Archive