BriefingsDirect Transcripts: data warehouse

Showing posts with label data warehouse. Show all posts

Tuesday, December 16, 2008

MapReduce-scale Analytics Change Business Intelligence Landscape as Enterprises Mine Ever-Expanding Data Sets

Transcript of BriefingsDirect podcast on new computing challenges and solutions in data processing and data management.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today, we present a sponsored podcast discussion on the architectural response to a significant and fast-growing class of new computing challenges. We will be discussing how Internet-scale data sets and Web-scale analytics have placed a different set of requirements on software infrastructure and data processing techniques.

Following the lead of such Web-scale innovators as Google, and through the leveraging of powerful performance characteristics of parallel computing on top of industry-standard hardware, we are now focusing on how MapReduce approaches are changing business intelligence (BI) and the data-management game.

More types of companies and organizations are seeking new inferences and insights across a variety of massive datasets -- some into the petabyte scale. How can all this data be shifted and analyzed quickly, and how can we deliver the results to an inclusive class of business-focused users?

We'll answer some of these questions and look deeply at how these new technologies will produce the payback from cloud computing and massive data mining and BI activities. We'll discover how the results can quickly reach the hands of more decision makers and strategists across more types of businesses.

While the challenge is great, the new value for managing these largest data sets effectively offers deep and powerful new tools for business and for social and economic progress.

To provide an in-depth look at how parallelism, modern data infrastructure, and MapReduce technologies come together, we welcome Tim O’Reilly, CEO and founder of O’Reilly Media, and a top influencer and thought leader in the blogosphere. Welcome, Tim.

Tim O’Reilly: Hi, thanks for having me.

Gardner: We're also joined by Jim Kobielus, senior analyst at Forrester Research. Thank you, Jim.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: Also, Scott Yara, president and co-founder at Greenplum. Welcome, Scott.

Scott Yara: Thank you.

Gardner: We're still dealing with oceans of data, even though we have harsh economic times. We see reduction in some industries, of course, but the amount of data and need for analytics across the Internet is still growing rapidly. BI has become a killer application over the past few years, and we're now extending that beyond enterprise-class computing into cloud-class computing.

I want to go to Jim Kobielus first. Jim, why has this taken place now? What is happening in the world that is simultaneously creating these huge data sets, but also making necessary even better analytics across more businesses?

Kobielus: Thanks, Dana. A number of things are happening or have been happening over the past several years, and the trend continues to grow. In terms of the data sets, it’s becoming ever more massive for analytics. It’s equivalent to Moore’s Law, in the sense that every several years, the size of the average data warehouse or data mart grows by an order of magnitude.

In the early 1990s or the mid 1990s, the average data warehouse was in gigabytes. Now, in the mid to late 2000s, it's in the terabytes. Pretty soon, in the next several years, the average data warehouse will be in the petabyte range. That’s at least a thousand times larger than the current middle-of-the-road data warehouse.

Why are data warehouses bulking up so rapidly? One key thing is that organizations, especially in tough times when they're trying to cut costs, continue to consolidate a lot of disparate data sets into fewer data centers, onto fewer servers, and into fewer data warehouses that become ever-more important for their BI and advanced analytics.

What we're seeing is that more data warehouses are becoming enterprise data warehouses and are becoming multi-domain and multi-subject. You used to have tactical data marts, one for your customer data, one for your product data, one for your finance data, and so forth. Now, the enterprise data warehouse is becoming the be all and end all -- one hub for all of those sets.

What that means is that you have a lot of data coming together that never needed to come together before. Also, the data warehouse is becoming more than a data warehouse. It's becoming a full-fledged content warehouse, not just structured relational data, but unstructured and semi-structured data -- from XML, from your enterprise content management (ECM) system, from the Web, from various formats, and so forth. It's coming together and converging into your warehouse environment. That’s like the bottom of the iceberg that’s coming up, you're seeing it now, and it's coming into your warehouse.

Also, because of the Web 2.0 world and social networking, a lot of the customer and market intelligence that you need is out there in blogs, RSS feeds, and various formats. Increasingly, that is the data that enterprises are trying to mine to look for customers, marketing opportunities, cross-sell opportunities, and clickstream analysis. That’s a massive amount of data that’s coming together in warehouses, and it's going to continue to grow in the foreseeable future.

Gardner: Let’s go to Tim O’Reilly. Tim, from your perspective, what has changed over the past 10 or 20 years that makes these datasets so important?

Long-term perspective

O'Reilly: If you look at what I would call Web 2.0 in a long-term historical perspective, in one sense it's a story about the evolution of computing.

In the first age of computing, business models were dominated by hardware. In the second age, they were dominated by software. What started to happen in the 1990s, underneath everybody’s nose, but not understood and seen, was the commodification of software via open industry standards. Open source started to create new business models around data, and, in particular, around network applications that built huge data sets through user participation. That’s the essence of what I call Web 2.0.

Look at Google. It's a BI company, based on massive data sets, where, first of all, they are spidering all the activity off of the Web, and that’s one layer. Then, they do this detailed analysis of the link structure of that Web, and that’s another layer. Then, they start saying, "Well, what else can we find? They start looking at click stream data. They start looking at browsing history, and where people go afterward. Think of all the data. Then, they deliver service against that.

That’s the essence of Web 2.0, building a massive data set, doing real-time analytics against it, and then figuring out what services you can deliver. What’s happening today is that movement is transferring from the consumer Web into business. People are starting to realize, "Oh, the companies that are doing better are better with their data."

A great example of that is Wal-Mart. You can think of Wal-Mart as a Web 2.0 company. They've got end-to-end analytics in the same way that Google does, except they're doing it with stuff. Somebody takes something off the shelf at Wal-Mart and rings it up. Wal-Mart knows, and it sends a signal downstream to the supplier.

We need to understand that this move to real-time understanding of data at massive scale is going to become more and more important as the lever of competitive advantage -- not just in computer businesses, but in all businesses. Data warehousing and analytics aren't just something that you do in the back office and it's a nice-to-have. It's the very essence of competitive advantage moving forward.

When we think about where this is going, we first have to understand that everybody is connected all the time via applications, and this is accelerating, for example, via mobile. The need for real-time analytics against massive data sets is universal.

Look at some of the things that are happening on the phone. Okay, where am I? What data is relevant to me right now, because you know where I am? Speech recognition is starting to come into focus on the phone. Again, it's a massive data problem, integrating not only speech recognition, but also local dialogs. Oh, wait, local again, you start to see some cross connections between data streams that will help you do better.

Even in the case of starting with someone from Nuance about why Google is able to do some interesting things in the particular domain of search and speech recognition, it’s because they're able to cross-correlate two different data sets -- the speech data set and the search data set. They say, "Okay, yeah, when somebody says that, they are most likely looking for this, because we know that. When they type, they also are most likely looking for that." So this idea of cross-correlation between data sets is starting to come up more and more.

This is a real frontier of competitive advantage. You look at the way that new technologies are being explored by startups. So many of the advantages are in data.

A great example is the company where I'm on the board. It's called Wesabe. They're a personal finance application. People upload their bank statements or give Wesabe information to upload their bank statements. Wesabe is able to do customer analytics for these guys, and say, "Oh, you spent so much on groceries." But, more than that, they're able to say, "The average person who shops at Safeway, spends this much. The average person who shops at Lucky spends this much in your area." Again, it's a massive data problem. That’s the heart of their application.

Now, you think the banks are going to get clued into this and they are going to start to say, "Well, what services can we offer?" Phone companies: "What services can we offer against our data?"

One thing that’s going to happen is the migration of all the BI competencies from the back office to the front office, from being something that you do and generate reports from, to something that you actually generate real-time services from. In order to do that, you've absolutely got to have high performance at massive scale.

Second, a lot of these data sets are not the old-fashion data sets where it was simply structured data.

Gardner: Let’s go to Scott Yara. Scott, we need this transformation. We need this competitive differentiation and new, innovative business approaches by more real-time analytics across larger sets and more diverse sets of content and inference. What’s the approach on the solution side? What technologies are being brought to bear, and how can we start dealing with this at the time and scale that’s required?

A big shift

Yara: Sure. For Greenplum, one of the more interesting aspects of what’s going on is that big technology concepts and ideas that have really been around for two or three decades are being brought to bear, because of the big shift that Tim alludes to, and we are big believers. We're now entering this new cycle, where companies are going to be defined by their ability to capture and make use of the data and the user contributions that are coming from their customers and community. That is really being able to make parallel computing a reality.

We look at the other major computing trend today, and it’s a very mainstream thing like virtualization. Well, virtualization itself was born on the mainframe well over 30 years ago. So, why is virtualization today, in 2008, so important?

Well, it took this intersection of major trends. You had x86 and, as Tim mentioned, the commoditization of both hardware and software, and x86 and multi-core machines became incredibly cheap. At the same time, you had a high-level business trend, an industry trend. The rising cost of data centers and power became so significant that CIOs had to think about the efficiency of their data centers and their infrastructure and what could lower the cost of computing.

If you look at running applications on a much cheaper and much more efficient set of commodity systems and consolidating applications through virtualization, that would be a really compelling thing, and we've seen a multi-billion dollar industry born of that.

You're seeing the same thing here, because business is now driven by Web 2.0, by the success of Google, and by their own use and actions of the Web realizing how important data is to their own businesses. That’s become a very big driver, because it turns out that parallel computing, combined with commodity hardware, is a very disruptive platform for doing large-scale data analysis.

The fact that you can take very, very cheap machines, as Google has shown -- off-the-shelf PCs -- and with the right software, combine them to hundreds, thousands and tens of thousands of systems to deliver analytics at a scale that people couldn’t do before. It’s that confluence and that intersection of market factors that's actually making this whole thing possible.

While parallel computing has been around for 30 years, the timing has become such that it’s now having an opportunity to become really mainstream. Google has become a thought leader in how to do this, and there are a lot of companies creating technologies and models that are emblematic of that.

But, at the end of the day, the focus is in software that is purpose-built to provide parallelism out of the box. This allows companies to sift through huge amounts of data, whether structured or unstructured data. All the fault tolerance, all the parallelism, all those things that you need are done in software, so that you choose off-the-shelf hardware from HP, IBM, Dell, and white-box systems. That’s a model that's as disruptive a shift as client-server and symmetric multiprocessing (SMP) computing was on the mainframe.

Gardner: Jim Kobielus, speak to this point of moving the analytic results, the fruits of this impressive engine and architectural shift from the back office to the front office. This requires quite a shift in tools. We're not going to have those front-office folks writing long SQL queries. They're not going to study up on some of the traditional ways that we interact with data.

What’s in the offing for development, so developers can create applications that target this data now that’s in a format that we can get out and is cross-pollinated in huge data sets that are themselves diverse? What’s in store for app dev, and what’s in store for the people that are looking for a graphical way to get into the business strategist type of user?

Self-service paradigm

Kobielus: One thing we're seeing in the front-end app development is, to take Tim’s point even further, it’s very much becoming more of a Web 2.0 user-centric, self-service development paradigm for analytics.

Look at the ongoing evolution of the online analytical processing (OLAP) market, for example. Things that are going on in terms of user self service, development of data mining, advanced analytic applications within their browser, and within their spreadsheet. They can pull data from various warehouses and marts, and online transaction processing (OLTP) systems, but in a visual, intuitive paradigm.

That can catch a lot of that information in the front-end -- in other words, on the desktop or in the mobile device -- and allows the user to graphically build ever-richer reports and dashboards, and then be able to share that all out to the others in their teams. You can build a growing and collective analytical knowledge base that can be shared. That whole paradigm is coming to the fore.

At Forrester, we published a number of reports on it. Recently, Boris Evelson and I looked at the next generation of OLAP technology. One very important initiative to look at is what Microsoft is doing with Project Gemini. They're still working on that, but they demoed it a couple of months ago at their BI show.

The front office is the actual end user, and power users are the ones who are going to do the bulk of the BI and analytics application development in this new paradigm. This will mean that for the traditional high priesthood of data modelers and developers and data mining specialists, more and more of this development will be offloaded from them, so they can do more sophisticated statistical analysis, and so forth.

The front office will do the bulk of the development. The back office -- in other words, the traditional IT data-modeling professionals -- will be there. They'll be setting the policies and they'll be providing the tooling that the end users and the power users will use to build applications that are personalized to their needs.

So IT then will define the best practices, and they'll provide the tooling. They'll provide general coaching and governance around all of the user-centric development that will go on. That’s what’s going to happen.

It’s not just Microsoft. You can look at the OLAP tooling, more user-centric in-memory spreadsheet-centric approaches that IBM, Cognos, Oracle, and others are rolling out or have already rolled out in their product sets. This is where it’s all going.

Gardner: Tim O’Reilly, in the past, when we've opened up more technological power to more people, we've often encountered much greater innovation, unpredictably so. Should we expect some sort of a wisdom-of-crowd effect to come into play, when we take more of these data sets and analytic tools and make them available?

O'Reilly: There's a distinction between the wisdom of crowds and collective intelligence. The wisdom-of-crowds thesis, as expounded by Surowiecki, is that if you get a whole bunch of people independently, really independently, to weigh in on some subject, their average guess is better than any individual expert's. That’s really about a certain kind of quantitative stuff.

But, there's also a machine-learning approach in which you're not necessarily looking for the average, but you're finding different kinds of meaning in data. I think it’s important to distinguish those two.

Google realized that there was meaning in links that every other search engine of the day was throwing away. This was a way of harnessing collective intelligence, but it wasn’t just the wisdom of crowds. This was actually an insight into the structure of the data and the meaning that was hidden in it.

The breakthroughs are coming from the ability of people to discern meaning in data. That meaning sometimes is very difficult to extract, but the more data you have, the better you can be at it.

A great example of this recently is from the last election. Nate Silver, who ran 538.com, was uncannily accurate in calling the results of the election. The reason he was able to do that was that he looked at everybody’s polls, but didn’t just say, "Well, I'm just going to take the average of them." He used all kinds of deep thinking to understand, "Well, what’s the bias in this one. What’s the bias in that one?" And, he was able to develop an algorithm in which he weighted these things differently.

Gardner: I suppose it’s important for us to take the ability to influence the algorithms that target these advanced data sets and put them into the hands of the people that are closer to the real business issues.

More tools are critical

O'Reilly: That’s absolutely true. Getting more tools for handling larger and more complex data sets, and in particular, being able to mix data sets, is critical.

One of the things that Nate did that nobody else did was that he took everybody’s polls and then created a meta-poll.

Another example is really interesting. You guys probably are familiar with the Netflix Challenge, where Netflix has put up a healthy sum of money to whomever can improve their recommendation algorithm by 10 percent. What’s interesting is that people seem to be stuck at about 8 percent, and they haven’t been able to get the last couple of percent.

It occurred to me in a conversation I was having last night that the breakthroughs will come, not by getting a better algorithm against the Netflix data set, but by understanding some other data set that, when mixed with the Netflix data set, will give better predicted results.

Again, that tells us something about the future of data mining and the future of business intelligence. It is larger, more complex, and more diverse data sets in which you are able to extract meaning in new ways.

One other thing. You were talking earlier about the democratization of these tools. One thing I don’t want to pass by is a comment that was made recently by Joe Hellerstein, who is a computer science professor at UC Berkeley. It was one of those real wake-up-and-smell-the-coffee moments. He said that at Berkeley, every freshman student in CS is now being taught Hadoop. SQL is an elective for seniors. You say, "Whoa, that is a fundamental change in our thinking."

That’s why I think what Greenplum is doing is really interesting, trying to marry the old BI world of SQL with the new business intelligence world of these loose, unstructured data sets that are often analyzed with a MapReduce kind of approach. Can we bring the best of these things together?

That fits with this idea of crossing data sets being one of the new competencies that people are going to have to get better at.

Kobielus: If I can butt in here just one moment, I want to tie into something that Tim just said, that I said a little bit earlier. One important thing is that when you add more data sets to say your analytic environment, it gives you the potential to see more cross-correlations among different entities or domains. So, that’s one of the value props for an all-encompassing or more multi-domain enterprise data warehouse.

Before, you had these subject-specific marts -- customer data here, product data there, finance data there -- and you didn’t have any easy way to cross-correlate them. When you bring them altogether into common repository, implementing common dimensions and hierarchies, and conforming with common metadata, it makes it a whole lot easier for the data miners, the power users, and the end users, to build the applications that can tie it altogether.

There is the "aha" moment. "Aha, I didn’t realize all these hooked up in these various ways." You can extract more meaning by bringing it all together into a unified, enterprise data warehouse.

Gardner: To you, Scott Yara. There's a great emphasis here on bringing together different data sets from disparate sources, with entirely different technologies underlying them. It's not a trivial problem. It’s not a matter of scale necessarily.

What do you see as the potential? What is Greenplum working on to allow folks to mix and match in such a way that the analytics can be innovative and game-changing in a harsh economic environment?

Price/performance improvement

Yara: A couple of things. One, I definitely agree with the assertion that analysis gets easier the more data you have. Whether those are heterogeneous data sets or just the scale of data that people can collect, it's fundamentally easier, cheaper.

In general, these businesses are pretty smart. The executives, analysts, or people that are driving business know that their data is valuable and that insight in improving customer experience through data is key. It’s just really hard and expensive, and that has made it prohibitive for a long, long time.

Now, we're talking about using parallel computing techniques, open-source software, and commodity hardware. It’s literally a 10- to 100-fold improvement in price performance. When the cost of data analysis comes down 10 to 100 times, that’s when new things become possible.

O'Reilly: Absolutely.

Yara: We see lots of customers now from the New York Stock Exchange. These are all businesses that are across vertical industries, but are all affected by the Web and network computing at some level.

Algorithmic trading is driving financial services in a way that we haven’t seen before. They're processing billions of trades every day. Whether it's security, surveillance, or real-time support that they need to provide to very large trading companies, that ability to mine and sift through billions of transactions on a real-time basis is acute.

We were sitting down with one of our large telecom customers yesterday, and there was this convergence that Tim’s talking about. You've got companies with very large mobile carrier businesses. They're broadband service providers, fixed-line service providers, and Internet companies.

Today, the kind of basic personalization that companies like Amazon, eBay, or Google do, telecom carriers are just at the beginning of trying to do that. They have to aggregate the consumer event stream from all these disparate communication systems, and it’s at massive scale.

Greenplum is solely focused on making that happen and mixing the modalities of data, as Tim suggested. Whether it’s unstructured data, whether those are things that exist in legacy databases, or whether you want to mix and match SQL or MapReduce, fundamentally you need to make it easy for businesses to do those things. That’s starting to happen.

Gardner: I suppose part of the new environment that we are in economically is that incremental change is probably not going to cut it. We need to find new forms of revenue and be able to attain them at a very low cost, upfront if possible, and be transformative in how we can take our businesses out through the public networks to reach more customers and give them more value.

Now that we've established that we have these data sets, we can combine them to a certain degree, and that will improve over time. What are the ways in which companies can start actually making money in new ways using these technologies?

Apple’s Genius comes to mind for me as a way of saying, "Okay, you pick a song in your iTunes library, and we're going to use our data and our analytics, and come back with some suggestions on what you might like as a result of that." Again, this is sort of a first go at this, but it opens my eyes to a lot of other types of business development opportunities. Any thoughts on this, Tim O’Reilly?

O'Reilly: In general, as I said earlier, this is the frontier of competitive advantage. Sure, iTunes’ has Genius, but it's the same thing with Netflix recommendations. Amazon has been doing this for years. It's part of their competitive advantage. I mentioned earlier how this is starting to be a force in areas like banking. Think about phone companies and all of the opportunities for new local services.

Not only that, one of my pet hobbyhorses is that phone companies have this call-history database, but they're not building new services for users against it. Your phone still only remembers the last few people that you called. Why can’t I do a search against somebody I talked to three months ago. "Who the heck was that? Was it a guy from this company?" You should be able to search that. They've got the data.

So, as I said earlier, the frontier is turning the back office into new user-facing services, and having the analytics in place to be able to do that meaningfully at scale in real-time. This applies to supply chains. It applies to any business that has data that gets better through user interaction.

This is the lesson of the Web. We saw it first in Web applications. I gave you the example earlier of Wal-Mart. They realized, "Oh, wait a minute. Every time somebody buys something, it’s a vote." That’s the same point that Wesabe is trying to exploit. A credit card statement is a voting list.

I went to this restaurant once. That doesn’t necessarily mean anything. If I go back every week, that may mean something. I spent on average this much. It’s going up. That means something. I spend on average this much. It’s going down, and that means something. So, finding meaning in the data that I already have, how could this be useful not just me but to my users, to my customers, and the services could I build.

This is the frontier, particularly in the world that we are entering, in which computing is going mobile, because so many of the mobile services are fundamentally going to be driven by BI. You need to be able to say in real-time or close to real-time, "This is the relevant data set for this person based on where they are right now."

Needed: future view

Kobielus: I want to underline what Tim just said. Traditionally, data warehouses existed to provide you with perfect hindsight on the customer -- historical data, massive historical data, hopefully on the customer, and that 360 degree view of everything about the customer and everything they have ever done in the past, back to the dawn of recorded time.

Now, it’s coming down to managing that customer relationship and evolving and growing with that relationship. You have to have not so much a past or historical view, but a future view on that customer. You need to know that customer and where they are going better than they know themselves.

In other words, that’s where the killer app of the online recommendation engine becomes critical. Then, the data warehouse, as the platform for recommendation engines, can take both the historical data that persists, but also can take the continuing streams of real-time event data on pricing, on customer interaction in various channels -- be it on the Web or over the phone or whatever -- customer transactions that are going on now, and things and events that are going on in the customer social network.

Then, you feed that all into a recommendation engine, which is a predictive-analytics model running inside the data warehouse. That can optimize that customer’s interaction at every touch point. Let’s say they're dealing with a call-center person live. The call-center person knows exactly how the world looks to that customer right now and has a really good sense for what that customer might need now or might need in three month, six months, or a year, in terms of new services or products, because other customers like them are doing similar things.

It can have recommendations being generated and scripted for the call-center agent in real-time saying, "You know what we think. We recommend that you upgrade to the following service plan because, it provides you with these features that you will find useful in your lifestyle, blah, blah, blah."

In other words, it's understanding the customer in their future, in their possible future, and suggesting things to the customers that they themselves didn’t realize until you suggested them. That’s the future of analytics, and competitive advantage.

O'Reilly: I couldn’t agree more.

Gardner: Scott Yara, we've been discussing this with a little bit of a business-to-consumer (B2C) flavor. In the business-to-business (B2B) world many things are equal in a commoditized market, with traditional types of products and services.

An advantage might be that, as a supplier, I'm going to give you analytics that I can derive from data sets that you might not have access to. I might provide analytical results to you as a business partner free of charge, but as an enticement for you to continue to do business with me, when I don’t have any other way to differentiate. What do you see are some of the scenarios possible on the B2B side?

Yara: You don’t have to look much further than what Salesforce.com is doing. In a lot of ways, they're pioneering what it means to be an enterprise technology company that sells services, and ultimately data, back to their customers. By creating a common platform, where applications can be built, they are very much thinking about how the data is being aggregated on the platforms in use, not by their individual customers, but in aggregate.

You're going to see lots of cases where for traditional businesses that are selling services and products to other businesses, the aggregation of data is going to be interesting and relevant. At the same time, you have companies where even the internal analysis of their data is something they haven’t been able to do before.

We were talking about Google, which is an amazing company. They have this big vision to organize the world’s information. What the rest of the business world is finding out is that while it’s a great vision and they have a lot of data, they only have a small fraction of the overall data in the world. Telecommunication companies, financial stock exchange, retail companies, have all of this real-world data that's not being indexed or organized by Google. These companies actually have access to amazing amounts of information about the customers and businesses.

They are saying, "Why can’t we, at the point of interaction -- like eBay, Amazon, or some of these recommended engines -- start to take some of this aggregate information and turn it into improving businesses in the way that the Web companies have done so successfully. That’s going to be true for B2C businesses, as well as for B2B companies.

We're just at the beginning of that. That’s fundamentally what’s so exciting about Greenplum and where we're headed.

Gardner: Jim Kobielus, who does this make sense for right away? Some companies might be a little skeptical. They're going to have to think about this. But where is the low-lying fruit, where are the no-brainer applications for this approach to data and analytics?

Kobielus: No-brainers -- I always hate that term. It sounds like I am condescending, but low-hanging fruit should be one of those "aha!" opportunities that everybody realizes intuitively. You don’t have to explain to them, so in a sense it's a no-brainer. It’s call center -- customer-contact center.

The customer-contact center is where you touch the customer, and where you hopefully initiate, cultivate, nurture, maintain, and grow the customer relationship. It's one of the many places where you do that. There are people in your organization who are in that front-line capacity.

It doesn’t have to be just people. It could be automated programs through your Website that need to be empowered continuously with the full customer context -- the history of that customer's interactions, the customer’s current state, current sentiment and feelings, and with a full context on the customer’s likely future evolution. So, really it's the call center.

In fact, I cover data warehousing for Forrester. I talk to the data warehousing vendors and their customers about in database analytics, where they are selling this capability right now into real-world deployment. The customer call center is, far and away -- with a bullet -- the number one place for inline analytics to drive the customer interaction in a multi-channel fashion.

Gardner: How about you, Tim O’Reilly. Where are some of the hot verticals and early adopters likely to be on this?

O'Reilly: I've already said several times, mobile apps of various kinds are probably highest on the list. But, I'm a big fan of supply chain. There's a lot to be done there, and there's a huge amount of data. There already is a BI infrastructure, but it hasn’t really been tuned to think about it as a customer-facing application. It's really more a back-office or planning tool.

There are enormous opportunities in media, if you want to put it that way. If you think about the amount of money that’s spent on polling and the power of integrating actual data, rather than stated preference, I think it's huge.

How do we actually figure out what people are going to do? There is great marketing study. I forget who told this story, but it was about a consumer product. They showed examples of different colors. It was a boom box or something like that.

They said, "How many of you think white is the cool color, how many of you think black, how many, blah, blah, blah?" All the people voted, and then they had piles of the boom boxes by the door that the people took as their thank you gift. What they said and what they did were completely at variance.

One of the things that’s possible today is that, increasingly, we are able to see what people actually do, rather than what they say they will do or think they will do.

Gardner: We're just about out of time. Scott Yara, what’s your advice for those folks who are just getting their heads wrapped around this on how to get started? It’s not a trivial activity. It does require a great deal of concerted effort across multiple aspects of IT, perhaps more so than in the past. How do you get started, what should you be doing to get ready?

Yara: That’s one of the real advantages. In sort of a orthogonal way, the ability to create new businesses online in the age of Web 2.0 has been fundamentally cheaper and faster. Doing something disruptive inside of business with their data has to be a fundamentally cheaper and easier thing. So not starting with the big vision of where they need to go, and starting with something tactical -- whether it lives in the call center or at some departmental application -- is the best way to get going.

There are technologies, services, and people now that you can actually peel off a real project, and you can deliver real value right away.

I agree with Tim. We're going to see a lot of activity in the mobility and telecommunication space. These companies are just realizing this. If you think about the kind of personalization that you get with almost every major Internet site today, what’s level of personalization you get from your carrier, relative to how much data that they have? You're going to see lots of telecom companies do things with data that will have real value.

One of our customers was saying that in the traditional old data warehousing world, where it was back office, the service level agreement (SLA) was that when a call got placed and logged, it just needed to make its way into the warehouse seven days later. Seven days from the point of origination of a call, it would make itself into a back-office warehouse.

Those are the kinds of things that are going to change, if we are going to really provide mobility, locality, and recommendation services to customer.

It's having a clear idea of the first application that can benefit from data. Call centers are going to be a good area to provide the service representation of a profile of a customer and be able to change the experience. I think we are going to see those things.

So, they're tractable problems. Starting small is what held back enterprise data warehousing before, where they were looking at these huge investments of people and capital and infrastructure. I think that’s really changing.

Gardner: I am afraid we have to leave it there. We've been discussing new approaches to managing data, processing data, mixing data types and sets, and extracting real-time business results from that. We've looked at tools and we've looked at some of the verticals in business advantages.

I want to thank our panel. We've been joined today by Tim O’Reilly, the CEO and founder of O’Reilly Media. Thank you Tim.

O'Reilly: Glad to do it.

Gardner: Jim Kobielus, Forrester senior analyst. Thank you Jim.

Kobielus: Dana, always a pleasure.

Gardner: Scott Yara, president and co-founder of Greenplum. Appreciate it, Scott.

Yara: Great. Thanks everybody.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks, and come back next time.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Transcript of BriefingsDirect podcast on new computing challenges and solutions in data processing and data management. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

Thursday, October 02, 2008

Interview: HP's John Santaferraro on Latest BI Modernization and Data Warehousing Strategies

Transcript of BriefingsDirect podcast recorded at the Oracle OpenWorld Conference in San Francisco the week of Sept. 22, 2008.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Hewlett-Packard.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to a special BriefingsDirect podcast recorded at the Oracle OpenWorld conference in San Francisco. We are here the week of Sept. 22, 2008. This HP Live! Podcast is sponsored by Hewlett-Packard (HP) and distributed through the BriefingsDirect Network.

We welcome John Santaferraro, director of marketing for HP’s Business Intelligence (BI) portfolio. We're going to be talking about the intersection of BI in the context of not just business value and outcomes, but in the context of Oracle, a major data applications middleware and BI provider, and HP as prominent systems provider, as well as a prominent BI services provider.

We're going to try to figure out how this plays together. Then, we'll look toward the future of BI in the context of some major trends, such as service-oriented architecture (SOA), master data management (MDM), and bringing more automation to the delivery of intelligence from systems and data to those users who need it at the front lines of business. So I want to welcome John Santaferraro to the show.

John Santaferraro: Glad to be here, Dana. Thanks.

Gardner: First, let's set the stage and get a level-set about the Oracle-HP relationship vis-à-vis BI, because we're here at Oracle OpenWorld. Oracle is in the software side of things predominantly. You’ve got both systems and services. Perhaps you could paint a picture of how this fits together.

Santaferraro: It’s been a great and long relationship that we've had with Oracle since they were first building and releasing a database. We had folks in our labs that understood this idea of databases and data warehousing, and they were actually building and architecting our systems in a special way with things like massive I/O, massive memory to address -- the kinds of things you need in a data warehouse and query environment.

Back in those days, we were actually building our systems to handle data warehouse workloads, when everybody else was still focused only on the regular online transaction processing (OLTP) kinds of transactions in the enterprise resource planning (ERP) systems.

Because of that natural connection that we had with what was going in our labs, and what Oracle was doing, we have from the very start built a tight relationship with them from an engineering perspective and a good market perspective. Oracle is very clearly a leader in data warehousing and BI, and we augment that with the systems that we have developed to run in an optimized way with Oracle, as well as some other services that we bring to bear.

We recently bought a company called Knightsbridge, which was known as the go-to company for anybody who was doing data warehousing or BI and who ran into problems that nobody else could solve. Everybody knew that if you went to Knightsbridge, there were people there who could solve those problems. So it’s great to have them at the center of our global BI services organization. This company has taken their methodology and their expertise and has transferred it to folks around the world.

The other great thing about the acquisition of Knightsbridge is that they have real deep expertise in their various vertical markets -- health and life sciences, communications, financial services, retail manufacturing. Because of that, the Oracle-HP relationship is strengthening.

We are more than a systems provider and more than a services provider. We are delivering real solutions to our customers. We can come alongside of anybody, talk to them at the level of the business, and be able to build data warehousing and BI solutions that are mapped to the business, not just technology.

Gardner: I just got back from listening to Thomas Kurian at Oracle describe their full portfolio, and they’ve really put together quite a full lifecycle approach around the gathering, cleansing, and organizing of data, integrating it from disparate sources, managing the scale of huge loads, making this closer to a real-time value. They're also exporting middleware for application integration, creating the BI analytics, and then delivering that back out to those business applications.

It’s quite an impressive portfolio. They've been putting it together for quite some time, and they’re also quite proud of the metrics around the performance, and getting closer to that real-time nirvana. Tell us a little bit about how what Oracle has done from the lifecycle perspective and what you think are important aspects of the services’ side of making organizations readily able to let exploit those technologies.

Santaferraro: What you described is very much a product lifecycle in the data warehouse and BI space. Along with that, you can go in two directions. Along with the product lifecycle, there is actually a system lifecycle as well. Anytime anybody says to me that they can make data warehousing simple, I react, because the truth is that it’s very complex.

The processes you just described are extremely difficult for any company to work with and navigate through. Add to that the whole infrastructure piece of it. The more you move towards “operationalizing” BI, suddenly the more important the infrastructure becomes.

A lot of time we get calls from customers who are trying to deploy data warehousing solutions. They'll be in test and development and are supposed to perform, and they've got users out there who are expecting to click on a button and get all of the information back within a matter of seconds, and they can’t figure out how to make it work.

So they call the HP storage folks and they say, "Hey, we’ve got a storage problem. What’s going on here?" And, the storage folks say, "Well, wait a minute, it's not storage. That sounds like the database." So, they call Oracle, and Oracle says, “Well, that’s not us. It’s not the database. It must be a server problem.” So the customer has to go back to the server guy. We have people that will lose weeks of time in deploying their systems, because the entire lifecycle is extremely complex.

What we really do is look at how can we come alongside of Oracle in our labs and figure out how to build those systems with Oracle, pre-installed, pre-configured, and pre-tested, so that what the customer is getting is ready to go out of the box. It takes the guesswork out of all of this implementation and development that they’ve got to do.

I had one customer who lost a week in production, lost a week in test and dev, went into production and made the same exact little thing. They forgot to turn on a synchronous I/O on their storage system. It’s just a basic little problem, but it cost them another week in production time before they were up and running.

So, we’ve got solutions like HP BladeSystem for Oracle Optimized Warehouse. We have about 50 reference configurations that help take the guesswork out of deploying these.

Gardner: This is really more than just one hand washing the other. This is three hands washing each other. We have the systems integration and specialized software, which is created through products, integration, and technology innovation, and then the opportunity for that third hand of services to come in with methodologies and best practices, for preventing those gotchas.

Santaferraro: Exactly. And then, on the services’ side, here are people who have walked this path before. They’ve done it before. My recommendation to companies who are out there trying to do BI and data warehousing and are hitting difficulties is, “Why not go find somebody who has done it before?”

You really don’t have to do it alone. There are people out there who have walked this path. They’ve done it. They know the gotchas. They have accelerators. They have ways of making it all come together faster. And all of that translates into more business value. If I don’t have to spend as much time in deployment, as much time in all of the testing and trying to figure out what is wrong, then I can be investing my time and my effort in developing real business innovation and real business value.

Gardner: And, of course, in the field there are many different companies that are at different places on the path toward some of these goals. For those that are deeply into BI and recognize the value of getting this lifecycle, elevating the data, getting that good quality data out, and then be able to work with it, what’s the next step?

I’m hearing some buzzwords nowadays about operational BI and even BI modernization. Tell me little bit about what these mean, and are these in fact the next chapters in where companies will be taking this capability?

Santaferraro: Yes, these are definitely the next chapters, and you're seeing right now probably about five percent of companies out there -- the ones who are on the leading or bleeding edge -- already doing Operational BI and BI Modernization.

Operational BI has to do with this idea that I have all of this data in a single place, it’s accessible, and it’s fairly well cleaned. I don’t think anybody has perfectly clean data -- that doesn’t exit -- but once it’s there, what do I do with it?

We're finding that customers want to do two things. One, they want to get that information to everyone across the organization, as well as customers and partners, and they want it to be actionable. So how do I get actionable information in the hands of everyone across my organization who needs it?

The second thing I see is people wanting to do with operational BI is actually take the analytics that are driving their systems, and embed them in the business processes or in the business applications. When a loan comes in to be underwritten, you want to have the right rules that don't put you in a position as a bank where you end up with a bunch of loans that you can't sell in the secondary market, or going into default. Everybody is aware of that problem, right?

How do you take the analytics and discovery that you’ve made and put it right in the applications, so the decision is automatically made by the application or so somebody has it right there. As they are using the business application, they have the information to make the decision right there at their disposal.

Gardner: And is that what you call operational BI?

Santaferraro: Yes.

Gardner: Now, this also raises in my mind a question about the capabilities that a services oriented architecture (SOA) offers -- governance, bringing services like BI as a service into play with applications, but at the right point in time. So it's exercising governance policy; learning from your mistakes, and building on them. How does what you’re describing as operational BI and SOA fit together?

Santaferraro: It’s a great question, because when I hear people talking about SOA, I primarily hear them talking about business services. How do I take these mammoth applications that I’ve built, reduce them into reusable business services, and be able to use them effectively across the organization, instead of replicating them all over. The real opportunity comes when you have these business services in operation and you begin to bring in information services as well. Take customer profitability, for example. That's not really a business service. It’s an information service.

A lot of analysis has to go into the mix for companies to figure out or answer the question, "Who are my most profitable customers?" If you can figure that out, and give every customer a rating, then that information service again becomes a service within a SOA that you can actually use and distribute in a very useful way all across the organization. You can send it to the call center, send it to the sales force, send it to the Web, and send it to the ATM transactions that are happening. So there's a whole opportunity of information services as a part of SOA that haven't even begun to be tapped.

Gardner: It’s sort of the intelligent implementation of BI as a service?

Santaferraro: Absolutely.

Gardner: How does that differ from BI modernization?

Santaferraro: Modernization is built around this whole concept that folks started doing data warehousing 15 to 20 years ago. It’s a fairly old technology; yet it’s still very useful. It’s still something that companies need to do, but a lot of new technology has come in and new kinds of data. We are discovering that data warehousing had great value. It has all the information in a single place. It made information accessible. You could now do analysis.

Gardner: But it was largely structured data.

Santaferraro: Exactly. Now we have other kinds of data coming. What about email? What about document management systems, and all the documents that are being digitized? What about new types of data like RFID? What about GPS data? There are all these new types of data, and we're discovering now that the data warehouse bubbled up.

It's a great value for BI, but not everything has to go into the data warehouse. In fact, we’ve discovered with a lot of our customers that as soon as the data warehouse gets to a terabyte, about 70 percent of the data in that data warehouse never even gets touched or used.

So companies are spending enormous amounts of money to build these massive data warehouses, and a lot of that is not being used. Modernization is about figuring out what data needs to go into the data warehouse and what needs to be delivered through the enterprise service bus (ESB). Are there certain things where you can just embed analytics out at the application layer and do the analytics out there? Are there other types of data that should be just cataloged at the user level?

Gardner: Metadata, for example?

Santaferraro: Yes, and metadata becomes the rich side of definitions around that content, that actually brings it all together for the sake of the user.

Gardner: Regardless of where it resides?

Santaferraro: Exactly, and that becomes active metadata by the way. It’s no longer just this metadata that sits below for the data folks to understand what’s there. It’s active metadata that the users are using to understand the information that they're looking at.

Gardner: I suppose that, over time, that’s going to also include events?

Santaferraro: Absolutely, events and then tie right into the new complex event processing (CEP) systems. One of the opportunities that I’ve not seen tapped into by any software companies is this whole new world of information delivery.

So, if you’re operationalizing BI, if you’ve got a modernized BI infrastructure with data provisioning in place, and it’s not just the data warehouse -- you’re basically trying to get it out to all these users across the enterprise and embed it in business processes. There needs to be the design of a brand new information-delivery system that actually can handle all of these kinds of data to the desktop, to the application, to the hand-held device, or wherever it happens.

Gardner: Without belaboring this point, what sort of technologies are you looking at? Is this syndication, publish-and-subscribe, terminal services? What do you use to get that out there?

Santaferraro: I would say, yes. Because, as I said, I haven’t seen anybody that’s done it yet.

Gardner: Good, a big opportunity there. Okay. We've talked about this modernization of BI. This is happening in the context of other trends, of course, for virtualizing our data centers, and a lot has been done to virtualize storage and data over time.

We're going to be bringing in more kinds of content. We might even be getting content and services off of clouds, other people's public services or perhaps a cooperative private federation among business partners, all of which has to be managed and accurately projected back into the application services and processes that people use. It sounds very interesting, and is a much easier sale to the C-class, the corner office in the organization, because this really helps them in the way they do business.

What can companies do in terms of exploiting these technologies, getting those business outcomes, and, I suppose most importantly, how do they get started? As you say, this is not trivial. It’s complex and needs to be done properly.

Santaferraro: Most companies are started right now in BI and data warehousing. What I hear a lot of customers say is that they either are not getting the value out the investment they are putting into it, or they don’t know if they are. So I think it really makes sense to kind of pause where you're at and bring in some experts to do an assessment.

We do a lot of work with customers. We look at the vision, the strategy, and the planning behind data warehousing and BI, and because of our depth of experience, we can come alongside our customers and help them figure out what’s working and what’s not to put value on where to really invest moving forward, and help drive that forward in an intelligent way. Why not do BI with some intelligence behind it?

That’s one thing. The second thing is that with operational BI on the horizon, we’ve got a lot of folks within our organization who understand the potential of what could be done with BI in a bank? What if you could have customer profitability, customer segmentation services, and offer optimization at every point of sale? So, for the teller, for the ATM service, for the call center, wherever somebody is interacting with a bank, all of that information is right there with them.

What we find is that people have been so caught in the world of reporting and just basic analytics and online analytical processing that takes place in the back room. We think that it also makes sense to move to this next level. Bring in some folks who understand operational BI and let’s dream together and figure out if you could actually have these capabilities, what could you do with your company? How could you transform your relationship with your customers and your suppliers?

It's basic vision strategy and planning, too. Let’s get together and dream about operational BI, and figure out what your company could become? We actually believe that in the next five to seven years that there is going to be a major restructuring of leaders in every single industry. The ones who come out on top are going to be those companies that figure out how to use BI to transform themselves into competitive leaders.

We want to be there with our customers to make that happen for them.

Gardner: And this is not just for them to actually find new markets, but to uncover risks that they wouldn’t have been able to uncover until it was too late. And we’ve seen examples of that -- and perhaps to focus on what the right businesses are to be in and not to be in? So it’s not just how to make things better, it’s also risk mitigation on what to avoid?

Santaferraro: Absolutely.

Gardner: Very good. We’ve been talking about BI and some of the next chapters in BI, particularly in a context of a longstanding partnership between Oracle and HP. We’ve been joined by John Santaferraro, director of marketing for HP’s BI portfolio. Thanks very much, John.

Santaferraro: Thanks a lot, Dana.

Gardner: Our conversation comes to you today through a sponsored HP Live! Podcast from the Oracle OpenWorld conference in San Francisco. Look for other podcasts from this HP Live! event series at hp.com, as well as via the BriefingsDirect Network.

I'd like to thank our producers on today's show, Fred Bals and Kate Whalen. I'm Dana Gardner, principal analyst at Interarbor Solutions. Thanks for listening, and come back next time for more in-depth podcasts on enterprise IT topics and solutions. Bye for now.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Hewlett-Packard.

Transcript of BriefingsDirect podcast recorded at the Oracle OpenWorld Conference in San Francisco. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

Tuesday, September 30, 2008

Oracle and HP Explain History, Role and Future for New Exadata Server and Database Machine

Transcript of BriefingsDirect podcast recorded at the Oracle OpenWorld Conference in San Francisco the week of Sept. 22, 2008.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Hewlett-Packard.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to a special BriefingsDirect Podcast recorded at the Oracle OpenWorld Conference in San Francisco. We are here the week of Sept. 22, 2008. This HP Live! Podcast is sponsored by Hewlett-Packard, and distributed through the BriefingsDirect Network.

Today we are going to discuss a large and an impactful product announcement at Oracle OpenWorld that took place on Sept. 24. It was the introduction of appliances in a cooperative relationship between HP and Oracle to create some of the most high performing databases and date warehouses in history. We are going to talking about the Oracle Exadata Storage Server and -- when put together in a very impressive configuration -- what becomes the HP Oracle Database Machine.

Here to help us understand how these impressive server configurations and high-speed, extreme-performance databases came together, we are joined by Rich Palmer, the director of technology and strategy for industry standard servers at HP. We are also joined by Willie Hardie, vice president of Oracle database product marketing. Welcome to the show, Willie.

Willie Hardie: Good to be here, Dana.

Gardner: Tell me a little bit about this very momentous announcement. This has been several years in the making, but it’s not just a product announcement. It seems like an architectural shift, and also an alliance and partnership shift in terms of the cooperation between a hardware provider, in this case HP, and Oracle, until now purely a software company.

Hardie: That’s an excellent question. So what we actually announced this week is the Oracle Exadata Storage Server. Now, the Oracle Exadata Storage Server is an intelligent storage device. We’ve basically taken industry standard hardware and storage components from HP, and we’ve combined that with smart intelligence software from Oracle that allows us to offload query processing from the database servers to the storage servers.

So now they can do a lot of the work for us, to allow the stripping off of the rows and columns that we require, and push last data backups through much wider networks.

Gardner: For those of us who are not computer scientists, but are nonetheless interested in the outcomes, architecturally we are putting the intelligence that we usually have in a database server in very close proximity to the data storage itself, connecting that through a very fat pipe in the form of InfiniBand. And, in essence, parallel processing comes to bear, because of the proximity. Is that correct?

Hardie: Absolutely. So what we are able to do for the first time ever is we can use these storage devices to actually do the query processing itself. So the more that the storage server processes and we compute into our configuration, the more of the workload they can take off, which traditionally is done at the database server.

Gardner: Let’s go to Rich Palmer at HP. Tell us a little bit about the history. How did this come about, and what is it that HP has been doing to improve upon the performance of this long-term database lineage?

Rich Palmer: If you look at HP and Oracle as partners in this industry, we have a long-standing history together. We have several reference configurations, more than 50 reference configurations that we do with industry standard hardware and Oracle solutions, which we’ve been delivering for many years now.

Going back all the way to the introduction of Oracle Real Application Clusters (RAC), and even before RAC introductions, the history of the two companies really stems from two leadership positions. HP does more servers on Oracle than any other company. Oracle does more data warehouses than any other company. You bring those two forces together, and you get a very strong formidable entry into this data warehouse appliance market.

Where HP and Oracle really started this discussion stems back a couple of years, and it really became a trend in the market of bringing data and server processing power closer together; that trend has escalated over the last couple of years -- especially as so much data has been growing at exponential rates, every single year. What we found is that, you cannot push so much data over a traditional storage fabric. This new technology allows us to do that.

Gardner: And we are talking about very large data sets, of terabytes and larger, right?

Palmer: Enormous Data sets. Let me give you an example, and I think we are all very familiar with this example. We all use cell phones in today’s industry. Every one of those cell phone calls is a database record somewhere, be it on AT&T’s database or T-Mobile’s database or whomever's database -- they store that data. Now, when they are storing that data, sometimes they are going to want to move it. If you have a narrow pipe to push that data down, and you’re bringing back enormous amounts of data that is erroneous, and you don’t need the other data; all you need is just for what you’re looking for in the query.

So this process allows us to push just the query information across that pipe. Less data over the pipe, a wider pipe, and your performance goes up dramatically.

Gardner: Okay, so let’s unpack this a little bit. We’ve established that the marketplace is demanding better performance, particularly in the use of large data sets, 1 terabyte and larger up to 10 terabytes, and size often. That requires the movement of very large sets of data, and the inhibitor here was the storage’s physical capacity, and ability to deliver the data.

So you’ve re-architected, and we brought together two companies to work together. This brings the question: Why hasn’t the hardware and software duality gotten closer before this? Why now?

Palmer: In this market, it’s constantly evolving to a state where you have to bring software tools to the table, and you have to bring high-performance hardware to the table. The evolution of both of those have hit at the perfect time in the last year.

Oracle has been developing the software code for several years now, and HP has been working on the hardware side of this equation to bring together the two forces at this time. We are using industry standard technology, so it’s not something that we are the only hardware guys out there with InfiniBand, and InfiniBand is an evolving technology. But the performance of InfiniBand is at a point now where we can actually leverage it using Oracle software to offload the storage processing from the database server. Those are the two key components -- it’s not just the hardware, and it’s not just the software. You have to marry these two things together.

So why hasn’t it been done in the past? Well, it has to some degree, there are others who had tried to do this, but they haven’t done both. They haven’t been able to achieve both facets, and that’s really why this is the right product at the right time.

Gardner: Okay, Willie, let’s get into the actual product itself. Explain to me what the Oracle Exadata Storage Server actually is? What are we talking about?

Hardie: You see that the Oracle Exadata Storage Server is basically comprised of an industry standard HP DL180 Storage Server. So inside this storage server we have 12 3.5-inch disks to be 12 SATA drives. We have two Intel quad-core processors. We have 8 gigabytes of memory; we have two InfiniBand network connections, and dual power supplies.

So in this storage server we have a lot of storage capacity, we have a lot of processing power, and we have a lot of network bandwidth. Then the real secret sauce here is this intelligence software from Oracle that’s installed into each and every one of those devices. It’s this intelligent software that enables us to offload this query processing, which makes the Oracle Exadata Storage Server really unique.

Gardner: Okay, let's dumb this down a little bit in simplistic terms. Instead of large data sets moving from storage to the database and back, what happens differently now?

Hardie: What happens differently now is, because we are offloading the query processing at the storage server, the storage server can strip out the columns that we don’t need, strips out the rows we don’t need, returns a subset of data back up through this wide InfiniBand network. That’s what makes the difference. We are treating a much smaller data set that we pass up through this network, and the database server can just finish off that query processing much faster than it ever could previously.

Palmer: One of the other values that we achieve here is certainly in the data passing back and forth, or less data over a wider pipe. So you’re going to get exponentially better performance. Now at the storage servers you’ve taken the processing power of doing the query right at the disks, and in every one of these storage servers you have eight cores, these are Intel quad core processors, two of them in each servers, and so you have eight cores on the input/output (I/O) path directly to the disk.

So there is no external I/O going to your disks. Traditionally you’ve had to go outside of the server, go to the disk that is across the fabric -- and everyone else is sharing that fabric.

So you have many people sharing a fabric, versus now you have a dedicated fabric inside of the server. So it’s a copper-to-copper connection inside the server. Those disks are right on top of the processor. That is really the essence of it -- you can pull the data off of this rapidly because it’s all so much faster. As Willie indicated, you can strip out all the unnecessary data and pass a much smaller data set over a much wider pipe, back to your database servers. There are so many levels of performance improvement here.

Gardner: And to your point on the secret sauce -- you are also taking advantage of all those cores via multiple threads, and the software has been a deeply tuned to take advantage of those multiple threads in a concurrent fashion.

Hardie: Oh, absolutely, and Rich touched on that as well.

Palmer: When we add more Exadata Storage Servers into our configuration we can take advantage, not just that additional storage capacity, but we can now take advantage of that additional processing capability -- to own that storage layer, which is a big, big difference.

Gardner: And at the announcement here, Oracle Chairman and CEO Larry Ellison described use cases where improvement typically was 10x to up to 72x over what has been the industry benchmark.

Hardie: Absolutely, when you actually cut away the technology and look at this from a business perspective, what it means for me as a business user -- it means that when you’re accessing those data warehouses that Rich was talking about earlier -- like a call data record -- data warehouse have billions of rules additionally. What this means, when you’re accessing those, your queries are going to run much, much faster than they ever did previously. Not only will they run faster, you can have much more queries and more long-running queries concurrently. That’s what is going to be making the big difference.

So when we hear of customers talking about getting 20x performance, improving 30x performance in one particular instance; in one particular query, 72x performance -- that is extreme performance improvements, in anybody’s measurement.

Gardner: Okay, so we have this engine, this Oracle Exadata Storage Server. We also a new announcement, the HP Oracle Database Machine. Tell me how one relates to the other.

Palmer: The HP Oracle Database Machine is a single rack that contains everything you need to run a large data warehouse. It contains eight ProLiant servers running Oracle Database 11g and RAC. It has four InfiniBand network switches and it has 14 of these Oracle Exadata Storage Servers that we talked about earlier. So in a single unit you have everything you need, ready to load up your data and start running your business queries right away.

Gardner: Tell us a little bit, Rich, about this 42-slot rack configuration and why it’s right for the market now?

Palmer: Well, so if you look at the market in data warehousing, the appliance type of delivery is a much simpler deployment of hardware and software configurations. That is emerging as a high-growth area in data warehousing. So with this market trend that’s going on between HP and Oracle, we’ve been able to come together and put everything in customers’ needs in one box. We put it at the customer’s site, and that’s on a global basis.

If you look at HP, one of the strengths that HP brings to this relationship is our ability to distribute and deliver globally. We build all of these database servers or database machines in regions around the globe. They are not just built here in the United States; they are built in United States, they are built in Singapore, they are built in Scotland, and then they are delivered to those regions on a worldwide basis.

So this ability of HP to build the product from the ground up to an exact specification, deliver to the customer, install at to customer's site, and then have Oracle come in and tune the software to make sure it's optimally configured -- that is a no-lose environment. We have the ability here to deliver an appliance-like stack of hardware, put the right software set on that hardware, and target a customer's need for simplicity, high performance, and data reliability -- all in one box.

Gardner: Okay, we've described the marketplace need, the size of data pushing the envelope. Now we are re-architecting to adjust to that. We've described the subset, which is the Exadata Server, and then the configuration, which is the racked Machine. Now, what kind of organizations are going to be interested in having the forklift upgrade to this, bring it right in, drop it in, pre-configured, optimized, and what are they going to do with it? Is this for business intelligence (BI), is this for simply managing scale? What are the speeds that this now provides going to do for companies to improve, or to change, how they do business?

Hardie: The organizations that are going to be interested in Oracle Exadata Storage Server and the HP Oracle Database Machine are those primarily interested in large data warehouses. And by large data warehouses we're talking into the (terabytes and petabytes) and beyond. Now if you look at the organizations that are typically dependent on very large data warehouses, it's organizations that Rich mentioned earlier, the telcos could be an obvious one, call data records, retail organization, very much dependent on analyzing point of sales (POS) transactions. You look at other organizations like trading systems, massive amount of transactions flow through these systems on a daily basis.

Gardner: Especially these days.

Hardie: Absolutely. It is really important to understand what's going on with these transactions, and to make informed business decisions. The beauty of this is you have completely scalable infrastructure from a storage point of view. But more importantly, you've got completely scalable infrastructure from a query performance point of view. As you store more call data records into these systems, more POS transactions, more stock transactions into these systems, you're not going to deteriorate your query performance at all. The more hardware, the more storage servers you put into these systems, the better your performance is going to be.

Gardner: Now that I have this capability to bang on this thing, so to speak, in more ways without the degrading performance, in what ways do you expect these companies to actually "bang" on this? Is this going to provide new and higher level of business intelligence querying? Is this going to provide higher-order analytics? Are there going to more business applications that can derive near real-time data and analytics from this? All of the above? What's the qualitative payback?

Hardie: There is definitely an element of "all of the above." Let me give you some of the examples of some of the queries that customers have actually been experiencing using the Oracle Exadata Storage Server. This probably fits into the context pretty well. You have organizations out there, retail organizations, telcos, for example. You know, some of the queries they are running are literally running for over half an hour. In some cases it is hours.

Moving to this new architecture is bringing down these execution times. One particular example, a query that was running for over 30 minutes is now running in under 30 seconds. It's that scale of improvement. Now when you can set your terminal, your laptop, or your mobile device and then kick off a query and get an answer within seconds -- then you're going to do more of these. If you know that when you kick off a query it is going to take 30 seconds to return it, you're going to pick more times when you choose to kick that off. You don't have to worry timing that anymore. You can just ask queries when you like, and expect to get a quick answer.

Palmer: Willie, I think you are absolutely right. The ability to capture business information has accelerated so much because of this technology. There are customers that cannot access data records beyond a certain time period simply because of the massive size of those data records, or because of how long a query would take to access a historical group of data. That all goes away now.

Now you have the ability. Historically you might have been able to look at the last week's worth of retail records, or medical records. Now you have the ability to go and look at years and years of data in the same timeframe that you were looking at weeks of data, and query a much bigger dataset, because of this architecture. That's a big business value, because now I can trend my business in a much more effective way. I'm putting more productivity tools in the hands of the user, so that they can actually turn data queries and business intelligence back into a fundamental element of growing their business and being more competitive in their markets.

Gardner: I imagine this will also compel companies to put even more data and information into these warehouses, because they are not going to degrade the performance of these essential queries. They are also going to able to do more types of queries. And, again, we're improving the quality and breadth of the data types, but still getting even better performance. So it's sort of a qualitative improvement on many different dimensions.

Hardie: It's a qualitative improvement, and it's a quantitative. I mean, you're absolutely right. Organizations today are more and more dependent on faster access to better information. It's just as simple as that.

Gardner: We've talked about the types of organizations that we'll use this now in its current configuration. I expected this re-architecting of the database and the storage will also move down market a bit. What possible other use-case scenarios do you envision for leveraging this technology beyond the high-end of the market, into other areas of the market?

Palmer: If you look at some of the growing and emerging markets today, just think of cloud computing and all of the massive amounts of data that we're storing in other locations on the Internet, or through a paid service, and the massive amounts of storage that's being deployed for those types of applications. That's not going to slow down at all. This allows us through the Database Machine to go in and drop in a configured environment for that workload, specifically dedicated to a workload.

You can now scale this product by connecting multiple racks together, you can now scale just the storage component, if the processing side of the database environment is sufficient. You can now just scale the storage nodes, so it is a scalable grid architecture that can grow on the fly. So cloud computing is a very good example where we really don't know what the upper limit of that storage is going to be. So deploy a configuration, say, on a HP Oracle Database Machine and then grow it as your needs grow. This is one application where we know this is going to succeed.

Gardner: Willie, we're also aware that organizations will just want the Oracle Exadata Storage Server. They might have their own environments, their own preference for configuring what's available to them, and what would become available to them in the future.

Hardie: Any organization that wants to run their data warehouse on the Oracle Exadata Storage Server -- all they have to do is buy the Oracle Exadata Storage Server. It's just as simple as that. Oracle and HP of long given customers a choice of configurable options. So if customer feels that something like HP Oracle Database Machine is not the right fit for their organization, if it does not fit the standard needs for their organization, then they have the option of buying the individual components, the Oracle Exadata Storage Server, the InfiniBand connectors, connecting to the database servers, they have that option.

Gardner: Looking at this again through how to get started, where do organizations go? Now that this is available immediately, both of these configurations, is the sales happening through both HP and Oracle?

Palmer: It's a cooperative effort, but Oracle is leading the sales process. So the Oracle sales representatives on a global basis are leading this process, and HP is certainly as their partner going to join with them and make sure that the customer receives the best from both companies.

Gardner: HP is going to service the hardware, but the support comes through Oracle, is that correct?

Hardie: Oracle is the first point of contact if you want to buy an Oracle Exadata Storage Server, Oracle is your first point of contact. So talk to your local Oracle sales representatives. If you do decide to buy one, and you want to resolve a support issue, you call Oracle, and Oracle will bring in HP as and when required to resolve any issues.

Gardner: To sum up a little bit, for those folks who perhaps are a few steps removed from the IT department, who are doing queries, or using business applications, what's the big take away for them? What about this announcement is going to change their world?

Hardie: For these types of users you just mentioned, a little bit or a couple of steps removed from the IT department ... To be quite honest, they don't really care what their systems run on. What they are interested in is getting fast answers to their business queries. It's just simple as that. So when these business users know that they can get instantaneous response times, they can get real extreme performance of their date warehouse, or of their business intelligence applications -- that's what's going to make a big difference for them.

Gardner: Rich, at HP, let me flip the question to you. For those people inside the IT department, who want to come in Monday morning without big headaches, what is this new configuration and architectural approach mean for them?

Palmer: Simplicity, higher performance, the ability to increase their service level agreements (SLAs) with their customers in the warehousing world. This is a solution built on industry standard hardware, with Oracle software that is just well accepted in the industry as an enterprise software leader. The IT departments are very comfortable with both of those facts. They're very comfortable with HP; they're very comfortable with Oracle. Putting the two together is a natural event for any IT manager.

Gardner: We've been talking about a large and impactful announcement here at Oracle OpenWorld, the introduction of the Oracle Exadata Storage Server -- the first hardware product from Oracle. Isn't that right?

Hardie: Absolutely.

Gardner: We've also looked at the configuration of those Exadata servers into the HP Oracle Database Machine, which is in effect a data warehouse appliance. Joining us to help explain this, we have been happy to have Rich Palmer, director of technology and strategy in the industry standard servers group at HP. And also Willie Hardie, vice president of Oracle database product marketing. Thanks to you both.

Hardie: Thank you, Dana.

Palmer: Thank you very much, Dana.

Gardner: Our conversation comes to you today through a sponsored HP Live! Podcast from the Oracle OpenWorld Conference in San Francisco. Look for other podcasts from this HP Live! event series at hp.com, as well as via the BriefingsDirect Network. I'd like to thank our producers on today's show, Fred Bals and Kate Whalen.

I am Dana Gardner, principal analyst at Interarbor Solutions. Thanks for listening, and come back next time for more in-depth podcasts on enterprise IT topics and strategies. Bye for now.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Hewlett-Packard.

Transcript of BriefingsDirect podcast recorded at the Oracle OpenWorld Conference in San Francisco. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

BriefingsDirect Transcripts

Tuesday, December 16, 2008

MapReduce-scale Analytics Change Business Intelligence Landscape as Enterprises Mine Ever-Expanding Data Sets

Thursday, October 02, 2008

Interview: HP's John Santaferraro on Latest BI Modernization and Data Warehousing Strategies

Tuesday, September 30, 2008

Oracle and HP Explain History, Role and Future for New Exadata Server and Database Machine

Principal Analyst

Translate this Blog

Folo My Flipboard Magazines

Search Blog

Subscribe to Podcast Via iTunes

BriefingsDirect Network

Blog Archive