Tuesday, December 16, 2008

MapReduce-scale Analytics Change Business Intelligence Landscape as Enterprises Mine Ever-Expanding Data Sets

Transcript of BriefingsDirect podcast on new computing challenges and solutions in data processing and data management.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today, we present a sponsored podcast discussion on the architectural response to a significant and fast-growing class of new computing challenges. We will be discussing how Internet-scale data sets and Web-scale analytics have placed a different set of requirements on software infrastructure and data processing techniques.

Following the lead of such Web-scale innovators as Google, and through the leveraging of powerful performance characteristics of parallel computing on top of industry-standard hardware, we are now focusing on how MapReduce approaches are changing business intelligence (BI) and the data-management game.

More types of companies and organizations are seeking new inferences and insights across a variety of massive datasets -- some into the petabyte scale. How can all this data be shifted and analyzed quickly, and how can we deliver the results to an inclusive class of business-focused users?

We'll answer some of these questions and look deeply at how these new technologies will produce the payback from cloud computing and massive data mining and BI activities. We'll discover how the results can quickly reach the hands of more decision makers and strategists across more types of businesses.

While the challenge is great, the new value for managing these largest data sets effectively offers deep and powerful new tools for business and for social and economic progress.

To provide an in-depth look at how parallelism, modern data infrastructure, and MapReduce technologies come together, we welcome Tim O’Reilly, CEO and founder of O’Reilly Media, and a top influencer and thought leader in the blogosphere. Welcome, Tim.

Tim O’Reilly: Hi, thanks for having me.

Gardner: We're also joined by Jim Kobielus, senior analyst at Forrester Research. Thank you, Jim.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: Also, Scott Yara, president and co-founder at Greenplum. Welcome, Scott.

Scott Yara: Thank you.

Gardner: We're still dealing with oceans of data, even though we have harsh economic times. We see reduction in some industries, of course, but the amount of data and need for analytics across the Internet is still growing rapidly. BI has become a killer application over the past few years, and we're now extending that beyond enterprise-class computing into cloud-class computing.

I want to go to Jim Kobielus first. Jim, why has this taken place now? What is happening in the world that is simultaneously creating these huge data sets, but also making necessary even better analytics across more businesses?

Kobielus: Thanks, Dana. A number of things are happening or have been happening over the past several years, and the trend continues to grow. In terms of the data sets, it’s becoming ever more massive for analytics. It’s equivalent to Moore’s Law, in the sense that every several years, the size of the average data warehouse or data mart grows by an order of magnitude.

In the early 1990s or the mid 1990s, the average data warehouse was in gigabytes. Now, in the mid to late 2000s, it's in the terabytes. Pretty soon, in the next several years, the average data warehouse will be in the petabyte range. That’s at least a thousand times larger than the current middle-of-the-road data warehouse.

Why are data warehouses bulking up so rapidly? One key thing is that organizations, especially in tough times when they're trying to cut costs, continue to consolidate a lot of disparate data sets into fewer data centers, onto fewer servers, and into fewer data warehouses that become ever-more important for their BI and advanced analytics.

What we're seeing is that more data warehouses are becoming enterprise data warehouses and are becoming multi-domain and multi-subject. You used to have tactical data marts, one for your customer data, one for your product data, one for your finance data, and so forth. Now, the enterprise data warehouse is becoming the be all and end all -- one hub for all of those sets.

What that means is that you have a lot of data coming together that never needed to come together before. Also, the data warehouse is becoming more than a data warehouse. It's becoming a full-fledged content warehouse, not just structured relational data, but unstructured and semi-structured data -- from XML, from your enterprise content management (ECM) system, from the Web, from various formats, and so forth. It's coming together and converging into your warehouse environment. That’s like the bottom of the iceberg that’s coming up, you're seeing it now, and it's coming into your warehouse.

Also, because of the Web 2.0 world and social networking, a lot of the customer and market intelligence that you need is out there in blogs, RSS feeds, and various formats. Increasingly, that is the data that enterprises are trying to mine to look for customers, marketing opportunities, cross-sell opportunities, and clickstream analysis. That’s a massive amount of data that’s coming together in warehouses, and it's going to continue to grow in the foreseeable future.

Gardner: Let’s go to Tim O’Reilly. Tim, from your perspective, what has changed over the past 10 or 20 years that makes these datasets so important?

Long-term perspective

O'Reilly: If you look at what I would call Web 2.0 in a long-term historical perspective, in one sense it's a story about the evolution of computing.

In the first age of computing, business models were dominated by hardware. In the second age, they were dominated by software. What started to happen in the 1990s, underneath everybody’s nose, but not understood and seen, was the commodification of software via open industry standards. Open source started to create new business models around data, and, in particular, around network applications that built huge data sets through user participation. That’s the essence of what I call Web 2.0.

Look at Google. It's a BI company, based on massive data sets, where, first of all, they are spidering all the activity off of the Web, and that’s one layer. Then, they do this detailed analysis of the link structure of that Web, and that’s another layer. Then, they start saying, "Well, what else can we find? They start looking at click stream data. They start looking at browsing history, and where people go afterward. Think of all the data. Then, they deliver service against that.

That’s the essence of Web 2.0, building a massive data set, doing real-time analytics against it, and then figuring out what services you can deliver. What’s happening today is that movement is transferring from the consumer Web into business. People are starting to realize, "Oh, the companies that are doing better are better with their data."

A great example of that is Wal-Mart. You can think of Wal-Mart as a Web 2.0 company. They've got end-to-end analytics in the same way that Google does, except they're doing it with stuff. Somebody takes something off the shelf at Wal-Mart and rings it up. Wal-Mart knows, and it sends a signal downstream to the supplier.

We need to understand that this move to real-time understanding of data at massive scale is going to become more and more important as the lever of competitive advantage -- not just in computer businesses, but in all businesses. Data warehousing and analytics aren't just something that you do in the back office and it's a nice-to-have. It's the very essence of competitive advantage moving forward.

When we think about where this is going, we first have to understand that everybody is connected all the time via applications, and this is accelerating, for example, via mobile. The need for real-time analytics against massive data sets is universal.

Look at some of the things that are happening on the phone. Okay, where am I? What data is relevant to me right now, because you know where I am? Speech recognition is starting to come into focus on the phone. Again, it's a massive data problem, integrating not only speech recognition, but also local dialogs. Oh, wait, local again, you start to see some cross connections between data streams that will help you do better.

Even in the case of starting with someone from Nuance about why Google is able to do some interesting things in the particular domain of search and speech recognition, it’s because they're able to cross-correlate two different data sets -- the speech data set and the search data set. They say, "Okay, yeah, when somebody says that, they are most likely looking for this, because we know that. When they type, they also are most likely looking for that." So this idea of cross-correlation between data sets is starting to come up more and more.

This is a real frontier of competitive advantage. You look at the way that new technologies are being explored by startups. So many of the advantages are in data.

A great example is the company where I'm on the board. It's called Wesabe. They're a personal finance application. People upload their bank statements or give Wesabe information to upload their bank statements. Wesabe is able to do customer analytics for these guys, and say, "Oh, you spent so much on groceries." But, more than that, they're able to say, "The average person who shops at Safeway, spends this much. The average person who shops at Lucky spends this much in your area." Again, it's a massive data problem. That’s the heart of their application.

Now, you think the banks are going to get clued into this and they are going to start to say, "Well, what services can we offer?" Phone companies: "What services can we offer against our data?"

One thing that’s going to happen is the migration of all the BI competencies from the back office to the front office, from being something that you do and generate reports from, to something that you actually generate real-time services from. In order to do that, you've absolutely got to have high performance at massive scale.

Second, a lot of these data sets are not the old-fashion data sets where it was simply structured data.

Gardner: Let’s go to Scott Yara. Scott, we need this transformation. We need this competitive differentiation and new, innovative business approaches by more real-time analytics across larger sets and more diverse sets of content and inference. What’s the approach on the solution side? What technologies are being brought to bear, and how can we start dealing with this at the time and scale that’s required?

A big shift

Yara: Sure. For Greenplum, one of the more interesting aspects of what’s going on is that big technology concepts and ideas that have really been around for two or three decades are being brought to bear, because of the big shift that Tim alludes to, and we are big believers. We're now entering this new cycle, where companies are going to be defined by their ability to capture and make use of the data and the user contributions that are coming from their customers and community. That is really being able to make parallel computing a reality.

We look at the other major computing trend today, and it’s a very mainstream thing like virtualization. Well, virtualization itself was born on the mainframe well over 30 years ago. So, why is virtualization today, in 2008, so important?

Well, it took this intersection of major trends. You had x86 and, as Tim mentioned, the commoditization of both hardware and software, and x86 and multi-core machines became incredibly cheap. At the same time, you had a high-level business trend, an industry trend. The rising cost of data centers and power became so significant that CIOs had to think about the efficiency of their data centers and their infrastructure and what could lower the cost of computing.

If you look at running applications on a much cheaper and much more efficient set of commodity systems and consolidating applications through virtualization, that would be a really compelling thing, and we've seen a multi-billion dollar industry born of that.

You're seeing the same thing here, because business is now driven by Web 2.0, by the success of Google, and by their own use and actions of the Web realizing how important data is to their own businesses. That’s become a very big driver, because it turns out that parallel computing, combined with commodity hardware, is a very disruptive platform for doing large-scale data analysis.

The fact that you can take very, very cheap machines, as Google has shown -- off-the-shelf PCs -- and with the right software, combine them to hundreds, thousands and tens of thousands of systems to deliver analytics at a scale that people couldn’t do before. It’s that confluence and that intersection of market factors that's actually making this whole thing possible.

While parallel computing has been around for 30 years, the timing has become such that it’s now having an opportunity to become really mainstream. Google has become a thought leader in how to do this, and there are a lot of companies creating technologies and models that are emblematic of that.

But, at the end of the day, the focus is in software that is purpose-built to provide parallelism out of the box. This allows companies to sift through huge amounts of data, whether structured or unstructured data. All the fault tolerance, all the parallelism, all those things that you need are done in software, so that you choose off-the-shelf hardware from HP, IBM, Dell, and white-box systems. That’s a model that's as disruptive a shift as client-server and symmetric multiprocessing (SMP) computing was on the mainframe.

Gardner: Jim Kobielus, speak to this point of moving the analytic results, the fruits of this impressive engine and architectural shift from the back office to the front office. This requires quite a shift in tools. We're not going to have those front-office folks writing long SQL queries. They're not going to study up on some of the traditional ways that we interact with data.

What’s in the offing for development, so developers can create applications that target this data now that’s in a format that we can get out and is cross-pollinated in huge data sets that are themselves diverse? What’s in store for app dev, and what’s in store for the people that are looking for a graphical way to get into the business strategist type of user?

Self-service paradigm

Kobielus: One thing we're seeing in the front-end app development is, to take Tim’s point even further, it’s very much becoming more of a Web 2.0 user-centric, self-service development paradigm for analytics.

Look at the ongoing evolution of the online analytical processing (OLAP) market, for example. Things that are going on in terms of user self service, development of data mining, advanced analytic applications within their browser, and within their spreadsheet. They can pull data from various warehouses and marts, and online transaction processing (OLTP) systems, but in a visual, intuitive paradigm.

That can catch a lot of that information in the front-end -- in other words, on the desktop or in the mobile device -- and allows the user to graphically build ever-richer reports and dashboards, and then be able to share that all out to the others in their teams. You can build a growing and collective analytical knowledge base that can be shared. That whole paradigm is coming to the fore.

At Forrester, we published a number of reports on it. Recently, Boris Evelson and I looked at the next generation of OLAP technology. One very important initiative to look at is what Microsoft is doing with Project Gemini. They're still working on that, but they demoed it a couple of months ago at their BI show.

The front office is the actual end user, and power users are the ones who are going to do the bulk of the BI and analytics application development in this new paradigm. This will mean that for the traditional high priesthood of data modelers and developers and data mining specialists, more and more of this development will be offloaded from them, so they can do more sophisticated statistical analysis, and so forth.

The front office will do the bulk of the development. The back office -- in other words, the traditional IT data-modeling professionals -- will be there. They'll be setting the policies and they'll be providing the tooling that the end users and the power users will use to build applications that are personalized to their needs.

So IT then will define the best practices, and they'll provide the tooling. They'll provide general coaching and governance around all of the user-centric development that will go on. That’s what’s going to happen.

It’s not just Microsoft. You can look at the OLAP tooling, more user-centric in-memory spreadsheet-centric approaches that IBM, Cognos, Oracle, and others are rolling out or have already rolled out in their product sets. This is where it’s all going.

Gardner: Tim O’Reilly, in the past, when we've opened up more technological power to more people, we've often encountered much greater innovation, unpredictably so. Should we expect some sort of a wisdom-of-crowd effect to come into play, when we take more of these data sets and analytic tools and make them available?

O'Reilly: There's a distinction between the wisdom of crowds and collective intelligence. The wisdom-of-crowds thesis, as expounded by Surowiecki, is that if you get a whole bunch of people independently, really independently, to weigh in on some subject, their average guess is better than any individual expert's. That’s really about a certain kind of quantitative stuff.

But, there's also a machine-learning approach in which you're not necessarily looking for the average, but you're finding different kinds of meaning in data. I think it’s important to distinguish those two.

Google realized that there was meaning in links that every other search engine of the day was throwing away. This was a way of harnessing collective intelligence, but it wasn’t just the wisdom of crowds. This was actually an insight into the structure of the data and the meaning that was hidden in it.

The breakthroughs are coming from the ability of people to discern meaning in data. That meaning sometimes is very difficult to extract, but the more data you have, the better you can be at it.

A great example of this recently is from the last election. Nate Silver, who ran 538.com, was uncannily accurate in calling the results of the election. The reason he was able to do that was that he looked at everybody’s polls, but didn’t just say, "Well, I'm just going to take the average of them." He used all kinds of deep thinking to understand, "Well, what’s the bias in this one. What’s the bias in that one?" And, he was able to develop an algorithm in which he weighted these things differently.

Gardner: I suppose it’s important for us to take the ability to influence the algorithms that target these advanced data sets and put them into the hands of the people that are closer to the real business issues.

More tools are critical


O'Reilly: That’s absolutely true. Getting more tools for handling larger and more complex data sets, and in particular, being able to mix data sets, is critical.

One of the things that Nate did that nobody else did was that he took everybody’s polls and then created a meta-poll.

Another example is really interesting. You guys probably are familiar with the Netflix Challenge, where Netflix has put up a healthy sum of money to whomever can improve their recommendation algorithm by 10 percent. What’s interesting is that people seem to be stuck at about 8 percent, and they haven’t been able to get the last couple of percent.

It occurred to me in a conversation I was having last night that the breakthroughs will come, not by getting a better algorithm against the Netflix data set, but by understanding some other data set that, when mixed with the Netflix data set, will give better predicted results.

Again, that tells us something about the future of data mining and the future of business intelligence. It is larger, more complex, and more diverse data sets in which you are able to extract meaning in new ways.

One other thing. You were talking earlier about the democratization of these tools. One thing I don’t want to pass by is a comment that was made recently by Joe Hellerstein, who is a computer science professor at UC Berkeley. It was one of those real wake-up-and-smell-the-coffee moments. He said that at Berkeley, every freshman student in CS is now being taught Hadoop. SQL is an elective for seniors. You say, "Whoa, that is a fundamental change in our thinking."

That’s why I think what Greenplum is doing is really interesting, trying to marry the old BI world of SQL with the new business intelligence world of these loose, unstructured data sets that are often analyzed with a MapReduce kind of approach. Can we bring the best of these things together?

That fits with this idea of crossing data sets being one of the new competencies that people are going to have to get better at.

Kobielus: If I can butt in here just one moment, I want to tie into something that Tim just said, that I said a little bit earlier. One important thing is that when you add more data sets to say your analytic environment, it gives you the potential to see more cross-correlations among different entities or domains. So, that’s one of the value props for an all-encompassing or more multi-domain enterprise data warehouse.

Before, you had these subject-specific marts -- customer data here, product data there, finance data there -- and you didn’t have any easy way to cross-correlate them. When you bring them altogether into common repository, implementing common dimensions and hierarchies, and conforming with common metadata, it makes it a whole lot easier for the data miners, the power users, and the end users, to build the applications that can tie it altogether.

There is the "aha" moment. "Aha, I didn’t realize all these hooked up in these various ways." You can extract more meaning by bringing it all together into a unified, enterprise data warehouse.

Gardner: To you, Scott Yara. There's a great emphasis here on bringing together different data sets from disparate sources, with entirely different technologies underlying them. It's not a trivial problem. It’s not a matter of scale necessarily.

What do you see as the potential? What is Greenplum working on to allow folks to mix and match in such a way that the analytics can be innovative and game-changing in a harsh economic environment?

Price/performance improvement

Yara: A couple of things. One, I definitely agree with the assertion that analysis gets easier the more data you have. Whether those are heterogeneous data sets or just the scale of data that people can collect, it's fundamentally easier, cheaper.

In general, these businesses are pretty smart. The executives, analysts, or people that are driving business know that their data is valuable and that insight in improving customer experience through data is key. It’s just really hard and expensive, and that has made it prohibitive for a long, long time.

Now, we're talking about using parallel computing techniques, open-source software, and commodity hardware. It’s literally a 10- to 100-fold improvement in price performance. When the cost of data analysis comes down 10 to 100 times, that’s when new things become possible.

O'Reilly: Absolutely.

Yara: We see lots of customers now from the New York Stock Exchange. These are all businesses that are across vertical industries, but are all affected by the Web and network computing at some level.

Algorithmic trading is driving financial services in a way that we haven’t seen before. They're processing billions of trades every day. Whether it's security, surveillance, or real-time support that they need to provide to very large trading companies, that ability to mine and sift through billions of transactions on a real-time basis is acute.

We were sitting down with one of our large telecom customers yesterday, and there was this convergence that Tim’s talking about. You've got companies with very large mobile carrier businesses. They're broadband service providers, fixed-line service providers, and Internet companies.

Today, the kind of basic personalization that companies like Amazon, eBay, or Google do, telecom carriers are just at the beginning of trying to do that. They have to aggregate the consumer event stream from all these disparate communication systems, and it’s at massive scale.

Greenplum is solely focused on making that happen and mixing the modalities of data, as Tim suggested. Whether it’s unstructured data, whether those are things that exist in legacy databases, or whether you want to mix and match SQL or MapReduce, fundamentally you need to make it easy for businesses to do those things. That’s starting to happen.

Gardner: I suppose part of the new environment that we are in economically is that incremental change is probably not going to cut it. We need to find new forms of revenue and be able to attain them at a very low cost, upfront if possible, and be transformative in how we can take our businesses out through the public networks to reach more customers and give them more value.

Now that we've established that we have these data sets, we can combine them to a certain degree, and that will improve over time. What are the ways in which companies can start actually making money in new ways using these technologies?

Apple’s Genius comes to mind for me as a way of saying, "Okay, you pick a song in your iTunes library, and we're going to use our data and our analytics, and come back with some suggestions on what you might like as a result of that." Again, this is sort of a first go at this, but it opens my eyes to a lot of other types of business development opportunities. Any thoughts on this, Tim O’Reilly?

O'Reilly: In general, as I said earlier, this is the frontier of competitive advantage. Sure, iTunes’ has Genius, but it's the same thing with Netflix recommendations. Amazon has been doing this for years. It's part of their competitive advantage. I mentioned earlier how this is starting to be a force in areas like banking. Think about phone companies and all of the opportunities for new local services.

Not only that, one of my pet hobbyhorses is that phone companies have this call-history database, but they're not building new services for users against it. Your phone still only remembers the last few people that you called. Why can’t I do a search against somebody I talked to three months ago. "Who the heck was that? Was it a guy from this company?" You should be able to search that. They've got the data.

So, as I said earlier, the frontier is turning the back office into new user-facing services, and having the analytics in place to be able to do that meaningfully at scale in real-time. This applies to supply chains. It applies to any business that has data that gets better through user interaction.

This is the lesson of the Web. We saw it first in Web applications. I gave you the example earlier of Wal-Mart. They realized, "Oh, wait a minute. Every time somebody buys something, it’s a vote." That’s the same point that Wesabe is trying to exploit. A credit card statement is a voting list.

I went to this restaurant once. That doesn’t necessarily mean anything. If I go back every week, that may mean something. I spent on average this much. It’s going up. That means something. I spend on average this much. It’s going down, and that means something. So, finding meaning in the data that I already have, how could this be useful not just me but to my users, to my customers, and the services could I build.

This is the frontier, particularly in the world that we are entering, in which computing is going mobile, because so many of the mobile services are fundamentally going to be driven by BI. You need to be able to say in real-time or close to real-time, "This is the relevant data set for this person based on where they are right now."

Needed: future view


Kobielus: I want to underline what Tim just said. Traditionally, data warehouses existed to provide you with perfect hindsight on the customer -- historical data, massive historical data, hopefully on the customer, and that 360 degree view of everything about the customer and everything they have ever done in the past, back to the dawn of recorded time.

Now, it’s coming down to managing that customer relationship and evolving and growing with that relationship. You have to have not so much a past or historical view, but a future view on that customer. You need to know that customer and where they are going better than they know themselves.

In other words, that’s where the killer app of the online recommendation engine becomes critical. Then, the data warehouse, as the platform for recommendation engines, can take both the historical data that persists, but also can take the continuing streams of real-time event data on pricing, on customer interaction in various channels -- be it on the Web or over the phone or whatever -- customer transactions that are going on now, and things and events that are going on in the customer social network.

Then, you feed that all into a recommendation engine, which is a predictive-analytics model running inside the data warehouse. That can optimize that customer’s interaction at every touch point. Let’s say they're dealing with a call-center person live. The call-center person knows exactly how the world looks to that customer right now and has a really good sense for what that customer might need now or might need in three month, six months, or a year, in terms of new services or products, because other customers like them are doing similar things.

It can have recommendations being generated and scripted for the call-center agent in real-time saying, "You know what we think. We recommend that you upgrade to the following service plan because, it provides you with these features that you will find useful in your lifestyle, blah, blah, blah."

In other words, it's understanding the customer in their future, in their possible future, and suggesting things to the customers that they themselves didn’t realize until you suggested them. That’s the future of analytics, and competitive advantage.

O'Reilly: I couldn’t agree more.

Gardner: Scott Yara, we've been discussing this with a little bit of a business-to-consumer (B2C) flavor. In the business-to-business (B2B) world many things are equal in a commoditized market, with traditional types of products and services.

An advantage might be that, as a supplier, I'm going to give you analytics that I can derive from data sets that you might not have access to. I might provide analytical results to you as a business partner free of charge, but as an enticement for you to continue to do business with me, when I don’t have any other way to differentiate. What do you see are some of the scenarios possible on the B2B side?

Yara: You don’t have to look much further than what Salesforce.com is doing. In a lot of ways, they're pioneering what it means to be an enterprise technology company that sells services, and ultimately data, back to their customers. By creating a common platform, where applications can be built, they are very much thinking about how the data is being aggregated on the platforms in use, not by their individual customers, but in aggregate.

You're going to see lots of cases where for traditional businesses that are selling services and products to other businesses, the aggregation of data is going to be interesting and relevant. At the same time, you have companies where even the internal analysis of their data is something they haven’t been able to do before.

We were talking about Google, which is an amazing company. They have this big vision to organize the world’s information. What the rest of the business world is finding out is that while it’s a great vision and they have a lot of data, they only have a small fraction of the overall data in the world. Telecommunication companies, financial stock exchange, retail companies, have all of this real-world data that's not being indexed or organized by Google. These companies actually have access to amazing amounts of information about the customers and businesses.

They are saying, "Why can’t we, at the point of interaction -- like eBay, Amazon, or some of these recommended engines -- start to take some of this aggregate information and turn it into improving businesses in the way that the Web companies have done so successfully. That’s going to be true for B2C businesses, as well as for B2B companies.

We're just at the beginning of that. That’s fundamentally what’s so exciting about Greenplum and where we're headed.

Gardner: Jim Kobielus, who does this make sense for right away? Some companies might be a little skeptical. They're going to have to think about this. But where is the low-lying fruit, where are the no-brainer applications for this approach to data and analytics?

Kobielus: No-brainers -- I always hate that term. It sounds like I am condescending, but low-hanging fruit should be one of those "aha!" opportunities that everybody realizes intuitively. You don’t have to explain to them, so in a sense it's a no-brainer. It’s call center -- customer-contact center.

The customer-contact center is where you touch the customer, and where you hopefully initiate, cultivate, nurture, maintain, and grow the customer relationship. It's one of the many places where you do that. There are people in your organization who are in that front-line capacity.

It doesn’t have to be just people. It could be automated programs through your Website that need to be empowered continuously with the full customer context -- the history of that customer's interactions, the customer’s current state, current sentiment and feelings, and with a full context on the customer’s likely future evolution. So, really it's the call center.

In fact, I cover data warehousing for Forrester. I talk to the data warehousing vendors and their customers about in database analytics, where they are selling this capability right now into real-world deployment. The customer call center is, far and away -- with a bullet -- the number one place for inline analytics to drive the customer interaction in a multi-channel fashion.

Gardner: How about you, Tim O’Reilly. Where are some of the hot verticals and early adopters likely to be on this?

O'Reilly: I've already said several times, mobile apps of various kinds are probably highest on the list. But, I'm a big fan of supply chain. There's a lot to be done there, and there's a huge amount of data. There already is a BI infrastructure, but it hasn’t really been tuned to think about it as a customer-facing application. It's really more a back-office or planning tool.

There are enormous opportunities in media, if you want to put it that way. If you think about the amount of money that’s spent on polling and the power of integrating actual data, rather than stated preference, I think it's huge.

How do we actually figure out what people are going to do? There is great marketing study. I forget who told this story, but it was about a consumer product. They showed examples of different colors. It was a boom box or something like that.

They said, "How many of you think white is the cool color, how many of you think black, how many, blah, blah, blah?" All the people voted, and then they had piles of the boom boxes by the door that the people took as their thank you gift. What they said and what they did were completely at variance.

One of the things that’s possible today is that, increasingly, we are able to see what people actually do, rather than what they say they will do or think they will do.

Gardner: We're just about out of time. Scott Yara, what’s your advice for those folks who are just getting their heads wrapped around this on how to get started? It’s not a trivial activity. It does require a great deal of concerted effort across multiple aspects of IT, perhaps more so than in the past. How do you get started, what should you be doing to get ready?

Yara: That’s one of the real advantages. In sort of a orthogonal way, the ability to create new businesses online in the age of Web 2.0 has been fundamentally cheaper and faster. Doing something disruptive inside of business with their data has to be a fundamentally cheaper and easier thing. So not starting with the big vision of where they need to go, and starting with something tactical -- whether it lives in the call center or at some departmental application -- is the best way to get going.

There are technologies, services, and people now that you can actually peel off a real project, and you can deliver real value right away.

I agree with Tim. We're going to see a lot of activity in the mobility and telecommunication space. These companies are just realizing this. If you think about the kind of personalization that you get with almost every major Internet site today, what’s level of personalization you get from your carrier, relative to how much data that they have? You're going to see lots of telecom companies do things with data that will have real value.

One of our customers was saying that in the traditional old data warehousing world, where it was back office, the service level agreement (SLA) was that when a call got placed and logged, it just needed to make its way into the warehouse seven days later. Seven days from the point of origination of a call, it would make itself into a back-office warehouse.

Those are the kinds of things that are going to change, if we are going to really provide mobility, locality, and recommendation services to customer.

It's having a clear idea of the first application that can benefit from data. Call centers are going to be a good area to provide the service representation of a profile of a customer and be able to change the experience. I think we are going to see those things.

So, they're tractable problems. Starting small is what held back enterprise data warehousing before, where they were looking at these huge investments of people and capital and infrastructure. I think that’s really changing.

Gardner: I am afraid we have to leave it there. We've been discussing new approaches to managing data, processing data, mixing data types and sets, and extracting real-time business results from that. We've looked at tools and we've looked at some of the verticals in business advantages.

I want to thank our panel. We've been joined today by Tim O’Reilly, the CEO and founder of O’Reilly Media. Thank you Tim.

O'Reilly: Glad to do it.

Gardner: Jim Kobielus, Forrester senior analyst. Thank you Jim.

Kobielus: Dana, always a pleasure.

Gardner: Scott Yara, president and co-founder of Greenplum. Appreciate it, Scott.

Yara: Great. Thanks everybody.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks, and come back next time.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Transcript of BriefingsDirect podcast on new computing challenges and solutions in data processing and data management. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

Monday, December 15, 2008

IT Systems Analytics Become Crucial as Move to Cloud and SaaS Raises Complexity Bar

Transcript of a BriefingsDirect podcast on the role of log management and analytics as enterprises move to cloud computing and software as a service.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. More related podcasts. Sponsor: LogLogic.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today, we present a sponsored podcast discussion on the changing nature of IT systems' performance and the heightening expectations for applications delivery from those accessing application as services.

The requirements and expectations on software-as-a-service (SaaS) providers are often higher than for applications traditionally delivered by enterprises for their employees and customers. Always knowing what's going on under the IT hood, being proactive in detection, security, and remediation, and keeping an absolute adherence to service level agreements (SLAs), are the tougher standards a SaaS provider deals with.

Increasingly, this expected level of visibility, management, and performance will apply to those serving up applications as services regardless of their hosting origins or models.

Here to provide the full story on how SaaS is making all applications' performance expectations higher, and how to meet or exceed those expectations is Jian Zhen, senior director of product management at LogLogic. Welcome to the show Jian.

Jian Zhen: Thank you for having me.

Gardner: We're also joined by Phil Wainewright, an independent analyst, director of Procullux Ventures, and SaaS blogger at ZDNet and ebizQ. Welcome back to the show, Phil.

Phil Wainewright: Glad to be here, Dana.

Gardner: Phil, let’s start with you. The state of affairs in IT is shifting. Services are becoming available from a variety of different models and hosts. We're certainly hearing a lot about cloud and private cloud. I suppose the first part of this that caught the public's attention was this whole SaaS notion and some successes in the field for that.

Maybe you could help us understand how the world has changed around SaaS infrastructure, and what implications that has for the IT department?

Wainewright: One thing that's happening is that the SaaS infrastructure is getting more complicated, because more choice is emerging. In the past people might have gone to one or two SaaS vendors in very isolated environments or isolated use cases. What we're now finding is that people are aggregating different SaaS services.

They're maybe using cloud resources alongside of SaaS. We're actually looking at different layers of not just SaaS, but also platform as a service (PaaS), which are customizable applications, rather than the more packaged applications that we saw in the first generation of SaaS. We're seeing more utility and cloud platforms and a whole range of options in between.

That means people are really using different resources and having to keep tabs on all those different resources. Where in the past, all of an IT organizations' resources were under their own control, they now have to operate in this more open environment, where trust and visibility as to what's going on are major factors.

Gardner: Do you think that the type of application delivery that folks are getting from the Web will start to become more the norm in terms of what delivery mechanisms they encounter inside the firewall from their own data center or architecture?

Wainewright: If you're going to take advantage of SaaS properly, then you need to move to more of a service-oriented architecture (SOA) internally. That makes it easier to start to aggregate or integrate these different mashups, these different services. At the end of the day, the end users aren't going to be bothered whether the application is delivered from the enhanced data center or from a third-party provider outside the firewall, as long as it works and gives them the business results they're looking for.

Gardner: Let's go to Jian Zhen at LogLogic. How does this changing landscape in IT and in services delivery affect those who are responsible for keeping the servers running, both from the host as well as the receiving end in the network, and those who are renting or leasing those applications as services?

Zhen: Phil hit the nail on the head earlier when he mentioned that IT not only has to keep track of resources within their own environment, but now has to worry about all these resources and applications outside of their environment that they may or may not have control over.

That really is one of the fundamental changes and key issues for current IT organizations. You have to worry not only about who is accessing the information within your company firewall, but now you have all this data that's sitting outside of the firewall in another environment. That could be a PaaS, as Phil said, it could be a SaaS, an application that's sitting out there. How do you control that access? How do you monitor that access. That's one of the key issues that IT has to worry about.

Obviously, there are data governance issues and activity monitoring issues. Now, from a performance and operational perspective, you have to worry about, are my systems performing, are these applications that I am renting, or platforms or utilities I am renting, are they performing to my spec? How do I ensure that the service providers can give me the SLAs that I need.

Those are some of the key issues that IT has to face when they are going outside of this corporate firewall.

Gardner: I suppose if it were just one application that you knew you were getting as a service, if something would go wrong, you might have a pretty good sense of who is responsible and where, but we are very rapidly advancing toward mixtures, hybrids, multiple SaaS providers, different services that come together to form processes. Some of these might be on premises, and some of them might not be.

It strikes me that we're entering a time when finger pointing might become rampant if something goes wrong, who is ultimately responsible, and under whose SLA does it fall?

Phil, from your perspective, how important will it be to gain risk, compliance, and security comfort, by being able to quickly identify who is the source of any issue?

Wainewright: That's vitally important, and this is a new responsibility for IT. To be honest Dana, you're a little bit generous to the SaaS providers when you say that if you only dealt with one or two, and if something went down, you had a fair idea of what was going on. What SaaS providers have been learning is that they need to get better at giving more information to their customers about what is going wrong when the service is not up or the service is not performing as expected. The SaaS industry is still learning about that. So, there is that element on that side.

On the IT side, the IT people have spent too much time worrying about reasons why they didn't want to deal with SaaS or cloud providers. They've been dealing with issues like what if does go down, or how can I trust the security? Yes, it does go down sometimes, but it's up 99.7 percent of the time or 99.9 percent of the time, which is better than most organizations can afford to do with their own services.

Let's shift the emphasis from, "It's broken, so I won't use it," to a more mature attitude, which says, "It will be up most of the time, but when it does break, how do I make sure that I remain accountable, as the IT manager, the IT Director, or the CIO. How do I remain accountable for those services to my organization, and how do I make sure that I can pinpoint the cause of the problem, and get it rectified as quickly as possible?"

Gardner: Jian, this offers a pretty significant opportunity, if you, as a vendor and a provider of services and solutions, can bring visibility and help quickly decide where the blame lies, but I suppose more importantly, where the remediation lies. How do you view that opportunity, and what specifically is LogLogic doing?

Zhen: We talked to a lot of customers who were either considering or actually going into the cloud or using SaaS applications. One of the great quotes that we recently got from a customer is, "You can outsource responsibility, but not accountability." So, it fits right into what Phil what was saying about being accountable and about your own environment.

The requirement to comply with government regulations and industry mandates really doesn't change all that much, just because of SaaS or because a company is going into the cloud. What it means is that the end users are still responsible for complying with Sarbanes-Oxley (SOX), payment cared industry (PCI) standards, the Health Insurance Portability and Accountability Act (HIPAA), and other regulations. It also means that these customers will also expect the same type of reports that they get out of their own systems.

IT organizations are used to transparency in their own environment. If they want to know what's happening in their own environment, they can get access to it. They can at least figure out what's going on. As you go into the cloud and use some of the SaaS applications, you start to lose some of that transparency, as you move up the stack. Phil mentioned earlier, there's infrastructure as a service, PaaS, SaaS. As you go up the stack, you're going to lose more and more of that transparency.

From a service-provider perspective, we need these providers to provide more transparency and more information as to what's happening in their environment and who has access. Who did access the information? LogLogic's can help these service providers get that kind of information and potentially even provide the reports for their end users.

From a user's perspective, there is that expectation. They want to know what's going on and who is accessing the data. So, the service providers need to have the proper controls and processes in place, and need to continuously monitor their own infrastructure, and then provide some of these additional reports and information to their end customers as needed.

Gardner: LogLogic is in the business of collating and standardizing information from a vast array of different systems through the log files and other information and then offering reports and audit capabilities from that data. It strikes me that you are now getting closer to what some people call business intelligence (BI) for IT, in that you need to deal almost in real time with vast amounts of data, and that you might need to adjust across boundaries in order to gain the insights and inference.

Do you at LogLogic cotton to this notion of BI for IT, and if so, what might we expect in the future from that?

Zhen: BI for IT or IT intelligence, as I have used the term before, is really about getting more information out of the IT infrastructure; whether it's internal IT infrastructure or external IT infrastructure, such as the cloud.

Traditionally, administrators have always used logs as one of the tools to help them analyze and understand the infrastructure, both from a security and operational perspective. For example, one of the recent reports from Price Waterhouse, I believe, says that the number one method for identifying security incidents and operational problems is through logs.

LogLogic's can provide the infrastructure and the tools to help customers gather the information and correlate different log sources. We can provide them that information, both from an internal and external perspective. We work with a lot of service providers, as you know, companies like SAVVIS, VeriSign, Verizon Business Services, to provide the tools for them to analyze service provider infrastructures as well.

A lot of that information can be gathered into a central location, correlated, and presented as business intelligence or business activity monitoring for the IT infrastructure.

Gardner: Phil, the amount of data that we can extract from these systems inside the service providers is vast. I suppose what people are looking for is the needle in the haystack. Also, as you mentioned, it probably behooves these providers to offer more insights into how well they did or didn't do.

What's your take on this notion of BI for IT, and does it offer the SaaS providers an opportunity to get a higher level of insight and detail about what is going on within their systems for the assurance and risk mediation for their customers?

Wainewright: Yes, it does. This is an area where we are going to see best practices emerge. We're in a very early stage. Talking about keeping logs reminds me of what happened in the early days of Web sites and Web analytics. When people started having Web sites, they used to create these log files, in which they accumulated all this data about the traffic coming to the site. Increasingly, it became more difficult to analyze that traffic and to get the pertinent information out.

Eventually, we saw the rise of specialist Web-traffic analytics vendors, most of them, incidentally, providing their services as SaaS focused on helping the Web-site managers understand what was going on with their traffic.

IT is going to have to do the same thing. Anyone can create a log file, dump all the data into a log, and say that they've got a record of what's been going on. But, that's the technically easy challenge. The difficult thing, as Jian said, is actually doing the business analytics and the BI to see what was going on, and to see what the information is.

Increasingly, it comes back to IT accountability. If your service provider does go down, and if the logs show that the performance was degrading gradually over a period of time, then you should have known that. You should have been doing the analysis over time, so that you were ahead of that curve and were able to challenge the provider before the system went down.

If it's a good provider, which comes back to the question you asked, then the provider should be on top of that before the customer finds out. Increasingly, we'll see the quality of reporting that providers are doing to customers go up dramatically. The best providers will understand that the more visibility and transparency they provide the customers about the quality of service they are delivering, the more confidence and trust their customers will have in that service.

Gardner: As we mentioned, the expectations are increasing. The folks who rent an application for a few dollars a month actually have higher expectations on performance than perhaps far more expensive applications inside a firewall and the traditional delivery mechanisms.

Wainewright: That's right, Dana. People get annoyed when Gmail goes down, and that's free. People do have these high expectations.

Gardner: Perhaps we can meet those expectations, even as they increase, but even more importantly for these providers is the cost at which they deliver their services. The utilization rates, the amount of energy that’s required per task or some metric like that, these log files, and this BI will decide their margins and how competitive they are in what we expect to be a fairly competitive field. In fact, we are starting to see the signs of marketplace and auctioning types of activities around who can put up a service for the least amount of money, which, of course, will put more downward pressure on margin.

I've got to go back to Jian on this one. We can certainly provide for user expectations and SLAs, but ultimately how well you run your data center as a service provider dictates your survival ability or viability as a business.

Zhen: You're absolutely right. One of the things that service providers, SaaS providers, or cloud providers have always talked about is the economy of scale. Essentially, that's doing more with less in order to understand your IT infrastructure and understand your customer base. This is what BI is all about, right? You're analyzing your business, your user base, the user access, and all that information in trying to come up with some competitive advantage to either reduce cost or increase efficiency.

All that information is in logs, whether logs that are spewed out by your IT infrastructure, logs that are instrumented using agents or application performance, monitoring type of tools. That information is there, and you need to be able to automate and enhance the ways things are done. So, you need to understand and see what's going on in the environment.

Analyzing all those logs gives you critical capability, not only managing hundreds or thousands of systems and making them more efficient, but bringing that BI throughout. Seeing how your users are accessing, reacting to, or changing your system makes it more efficient for the user, faster for the user, and, at the same time, reduces that cost to manage the infrastructure, as well as to do business.

So, the need to understand and see what's going on is really driving the need to have better tools to do system analysis.

Gardner: Well, how about that Phil? With apologies to Monty Python, every electron is important, right?

Wainewright: Well, it certainly can be. I think the other benefits of providers monitoring this information is that, if they can build out a track record and demonstrate that they all providing better service, then maybe that's the way of defending themselves, of being able to justify asking higher prices than they might otherwise have done.

If the pricing is going to go down because of competitive pressures, there will be differential pricing according to the quality that providers can show they have a track record for delivering.

Zhen: I definitely agree with that. Being able to provide better SLAs, being able to provide more transparency, audit transparency, are things that enterprises care about. As many reports have mentioned, it's one of the biggest issues that's preventing enterprises from adopting the cloud or some of these SaaS applications. Not that the enterprises are not adopting, but the movement is still very slow.

The main reasons are security and transparency. As SaaS providers or service providers start providing a lot more information based on the data that they analyze, they can provide better SLAs, both from an uptime and performance perspective, not just uptime. A lot of the SLAs today just talk about uptime. If they can provide a lot of that information by analyzing the information that they already have -- the log data, access data, and what not -- that’s a competitive advantage for the providers. They can charge a higher price, and often, enterprises are willing to pay for that.

Wainewright: I've been speaking to enterprise customers, and they are looking for better information from the providers about those performance metrics, because they want to know what the quality of service is. They want to know that they're getting value for money.

Gardner: Well, we seem to have quite a set of pressures. One, to uphold performance, provide visibility, reduce risk, and offer compliance and auditing benefits. On the other side, it's pure economics. The more insight and utilization you have, and the more efficiently you can run your data centers, the more you can increase your margin and scale out to offer yet more services to more types of customers. It seems pretty clear that there's a problem set and a solution set.

Jian, you mentioned that you had several large service providers as customers. I don’t suppose they want all the details about what happens inside their organizations to come out, but perhaps you have some use case scenarios. Do you have examples of how analytics from a system’s performance, vis-à-vis log data, helps them on either score, either qualitatively in terms of performance and trust, and more importantly, over time, their ability to reap the most efficiency out of their system?

Zhen: These are actually partners of LogLogic. We've worked with these service-provider partners to provide managed services or cloud services for log management to the end customers. They're using it both working with the customers themselves, as well as using it internally.

Often, the use cases are really around compliance and security. That’s where the budget is coming from. Compliance is the biggest driver for some of these tools today.

However, some of the reports I mentioned, especially from Enterprise Strategy Group (ESG), one of the fastest-growing use cases for log management is operational use. This means troubleshooting, forensic analysis, and being able to analyze what's going on in the environment. But, the biggest driver today for purchasing that type of log-management solution is still compliance -- being able to comply with SOX, PCI, HIPAA, and other regulations.

Gardner: Let’s wrap up with some crystal-ball gazing. First, from Phil. How do you see this market shaking out? I know we're under more economic pressure these days, given the pending or imminent global recession, but it seems to me that it could be a transformative pressure, a catalyst, toward more adoption of services, and keeping application performance at lowest possible cost. What's your sense of where the market is going.

Wainewright: It’s a terrible cliché, but it’s about doing more with less. It may be a cliché, but it’s what people are trying to do. They've got to cut costs as organizations, and, at the same time, they have to actually be more agile, more flexible, and more competitive.

That means a lot of IT organizations are looking to SaaS and they're looking to cloud computing, because this is the way of getting resources without a massive outlay and starting to do things with a relatively low risk of failure.

They're finding that budgets are tight. They need to get things done quickly. Cloud or SaaS allows them to do that, and therefore there's a rosy future, even in bleak economic conditions, for this type of offering.

There are still a lot of worries among IT people as to the reliability and security and privacy compliance and all the other factors around SaaS. Therefore, the SaaS providers have to make sure that they're monitoring that, and that they're reporting. Likewise, the IT people, for their own peace of mind, need to make their own arrangement, so that they can also be keeping an eye on their side. I think everyone is going to be tracking and monitoring each other.

The upside of is that we're going to get more enterprise-class performance and enterprise-class infrastructure being built around the cloud services and the SaaS providers, so that enterprises will be able to have more confidence. So, at the end of the economic cycle, once people start investing again, I think we'll see people continue to invest in cloud services and SaaS, not because it's the low-cost option, but because it's the proven option that they have confidence in.

Gardner: Jian Zhen, how do you and LogLogic see the market unfolding? Where do you think the opportunities lie?

Zhen: I definitely agree with Phil. With the current economic environment, a lot of enterprises will start looking at SaaS and cloud services seriously and consider them.

However, enterprises are still required to be compliant with government regulations and industry mandate, so that's not going to go away. For the service providers and the SaaS providers, what they can do to attract these customers really is to make themselves more attractive, and make themselves be compliant with some of these regulations, and provide more transparency, giving people a view into who is accessing the data, and how they protect the data.

Amazon did a great thing, which was to release a white paper on some of their security practices. It's a very high level, but it’s a good start. Service providers need to start thinking more along the lines of, how to attract these enterprise customers, because the enterprise customers are willing and seriously considering SaaS services.

Phil had an article a while back, calling for a SaaS code of conduct. Phil, one of the things that you should definitely add there is a code to have the service providers provide all the transparency. That’s a thing that service providers can use to offer essentially a competitive advantage for their enterprise customers.

Gardner: Now, you sit at a fairly advantageous point, or a catbird's seat, if you will, on this regulatory issue. As enterprises seek more SaaS and cloud services for economic and perhaps longer-term strategic reasons, do we need to rethink some of our compliance and regulatory approaches?

We have a transition in the United States in terms of the government. So, now is a good time, I suppose, to look at those sorts of things. What, from your perspective, should change in order to allow companies to more freely embrace and use cloud and SaaS services, when it comes to regulation and compliance?

Zhen: As far as changing the regulations, I'm not sure there are a lot of things. We've seen SOX become a very high level and very costly regulation to be compliant with. However, we've also have seen PCI. That’s much more specific, and companies and even service providers can adopt and use some of these requirements.

Gardner: That's the payment card issue, right?

Zhen: Correct. The PCI data-security standard is a lot more specific as to what a company has to do in order to be compliant with it. Actually, one of the appendixes is really for service providers. A lot of service providers have used, for example, the Statement on Auditing Standards (SAS) 70 Type II kind of a report as one of the things they show the customer that they are compliant with. However, I don’t think the SAS 70 Type II is sufficient, mainly because the controls are described by the service providers themselves.

Essentially, they set their own requirements and they say, "Hey, we meet these requirements." I don’t think that’s sufficient. It needs to be something that’s more industry standard, like PCI, but maybe a little bit different, definitely more specific as to what the service providers needs to do.

On top of that, we need some kind of information on when security incidents happen with service providers. One of the things that 44 states have today is data-breach notification laws. That law obviously doesn’t apply to SaaS providers, but in order to provide more transparency there may need to be some standard or some processes in how breaches are reported and handled.

Some of these things certainly will help enterprises be more comfortable in adopting the services.

Gardner: Well, there are some topics Phil for about 150 blog entries, this whole notion of how to shift regulation and compliance in order to suit a cloud economy.

Wainewright: Yeah, it's going to be a difficult issue for the cloud providers to adapt to, but a very important one. This whole issue of SAS 70 Type II compliance, for example. If you're relying on a service provider for part of the services that you provide, then your SAS 70 Type II needs to dovetail with their SAS 70 Type II processes.

That’s the kind of issue that Jian was alluding to. It's no good just having SAS 70 Type II, if the processes that you've got are somehow in conflict with or don't work in collaboration with the service providers that you are depending on. We have to get a lot smarter within the industry about how we coordinate services and provide accountability and audit visibility and trackability between the different service providers.

Gardner: Very good. We've been discussing requirements and expectations around SaaS providers, looking at expected increases and demands for visibility, and management and performance metrics. Helping us to better understand these topics -- and I'm very happy that they joined us -- are Jian Zhen, senior director of product management at LogLogic. Thanks for your input, Jian.

Zhen: Thank you, Dana.

Gardner: Also Phil Wainewright, independent analyst, director of Procullux Ventures, and SaaS blogger at ZDNet and ebizQ. Always good to have you here Phil, thank you.

Wainewright: Thanks, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've have been listening to a sponsored BriefingsDirect podcast. Thanks, and come back next time.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. More related podcasts. Sponsor: LogLogic.

Transcript of a BriefingsDirect podcast on the role of log management and analytics as enterprises move to cloud computing and SaaS. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

Sunday, December 14, 2008

BriefingsDirect Analysts Handicap Large IT Vendors on How Cloud Trend Impacts Them

Edited transcript of BriefingsDirect Analyst Insights Edition podcast, Vol. 34, on cloud computing and its impact on IT vendors, recorded Nov. 21, 2008.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Charter Sponsor: Active Endpoints.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Dana Gardner: Hello, and welcome to the latest BriefingsDirect Analyst Insights Edition, Vol. 34. This periodic discussion and dissection of IT infrastructure related news and events with a panel of industry analysts and guests comes to you with the help of our charter sponsor, Active Endpoints, maker of the ActiveVOS, visual orchestration system. I'm your host and moderator, Dana Gardner, principal analyst at Interarbor Solutions.

Our topic this week, the week of Nov. 17, 2008 -- and Happy Holidays to you all -- is the gathering cloud-computing conundrum. From the hype and intrigue that's been a top story of 2008, do you think that cloud computing is past its infancy and well into adolescence?

We really are only beginning to understand how the IT services delivery, data management, and economic models of cloud computing will impact the market. Today, we're going to focus on the impact that this coalescing cloud phenomenon and the myriad cloud-computing definitions will have on the large, established IT vendors.

If this shift is as large and inevitable as many of us think, the impact on the current IT business landscape will also be large. Some will do well, and some will not. All, I expect, will need to adapt, and the shifts are certainly exacerbated by the deepening global recession.

To help us dig into how cloud computing will impact the IT industry we're joined by this week's panel. I'd like to welcome Jim Kobielus, senior analyst at Forrester Research. Hey, Jim.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: Tony Baer, senior analyst at Ovum. Hello, Tony.

Tony Baer: Hey, Dana, good to speak to you again.

Gardner: We're also joined by Brad Shimmin, principal analyst at Current Analysis. Welcome, Brad.

Brad Shimmin: Thanks for having me.

Gardner: Joe McKendrick, independent analyst and prolific blogger on ZDNet and ebizQ. Hi Joe.

Joe McKendrick: Hey, Dana, great to be here.

Gardner: This cloud chatter just doesn't seem to want to go away. In some respects, it's being heightened by the recession and this added emphasis on doing more with less, cutting down on IT spend and operation costs, the whole pressure on reducing the burden, if you will, that IT can provide for companies, even as they might be contracting.

We're also mixing in virtualization, services-oriented architecture (SOA), governance, platform as a service (PaaS), software as a service (SaaS), and social networking into this cloud-services definition mix.

Many see this cloud as joining, or at least somewhat mashing up or overlapping, the business-to-business (B2B) world with the business-to-consumer (B2C) world. In other words, IT provides services inside of organizations but now the Internet can provide some, and maybe many, of those services, regardless of whether they are a company or individual. So, there's a bit of competition between internal and external IT services.

Let me go to you first, Tony Baer. Of all the large IT vendors -- and you've been covering these folks for a long time -- what's your knee-jerk reaction? Which one has the best outlook from these cloud trends, and which one has the worst?

Baer: That's an interesting question. I hadn't really thought about it in that manner, but I'll just think out loud. Obviously, if we are going to talk about who has consistently positioned themselves as being the poster child, it has been Marc Benioff over at Salesforce.com, where they have evolved from a customer relationship management (CRM) application that you access on demand to expand towards PaaS.

Now, there certainly are a lot of questions about Salesforce's business model, per se, in terms of its very high cost of customer acquisition. The fact is that when you have a subscription-based business, there is going to be customer churn. There's no question about that. On the other hand, they certainly have put this issue on the front burner.

In terms of who is best positioned for all this, I think it's a little too early to tell, because most of the large vendors are only just starting to put their feet in the water. Obviously, IBM, HP, and Microsoft are making moves. SAP has actually had a couple of stumbles on the way there. Oracle has sort of a sitting-on-the-fence strategy.

But, some of the strongest vertical areas would be areas where you are doing planning or testing. Those are clear winners, when it comes to deployment in the cloud, because there is actually relatively little risk factor. When I say vertical, I don't mean vertical industry, but vertical from a software-application standpoint.

Gardner: Who can get kicked in the teeth by this thing?

Baer: Well, Microsoft clearly could get kicked in the teeth, and that's obviously why they've come out with their resource strategy and with their various live-office strategies. Microsoft clearly has the most to lose, because they've been very identified with the rich client.


The Internet Effect

Gardner: Let's go to Brad Shimmin. Brad, we've seen over the past 12 years or so, that the "Internet effect" often seems to move in a winner-take-all fashion, at least for a certain disruptive phase of adoption. Do you think we are going to see the same with cloud computing? Is this a phenomenon that is going to put more power, more impact, and more dominance in the hand of a small number of companies, or is this going to democratize IT?

Shimmin: I'm firmly on the democratic side of this argument, because when you look at the strategy vendors like IBM has, Sun will have, and Cisco has, in terms of how they're rolling out anything that's in the cloud -- whether its PaaS, infrastructure as a service, or SaaS -- they all seem to be doing two things.

One is that they are taking some point solutions that they are going direct with, like IBM with Bluehouse, for example. Secondly, they are going after an independent software vendor (ISV) market. They want to empower folks like amazon.com, Panorama, Pervasive, Peer1, Mosso, Akamai, Boomi, and all those guys. They're really looking to empower them to go out and deliver services, whether its any of those three layers that I mentioned.

They want to deliver their software and their services, whether it's taking IBM's Lotus software and delivering it directly via some sort of white-label solution on their own, or just mashing up the services that Cisco provides with their WebEx Connect, it doesn't matter. What these companies are doing is allowing this broader feel, allowing this channel of service providers to exist, using their software and their services, and, in some cases, their actual data-center resources.

Gardner: Let me understand. I think you're saying that the organization that can provide the best ecology of partners and provide the best environment to thrive for many other players will do best, whereas, in the past, it seemed that, as an IT vendor, having the most installed base and the most lock-in offered the path to who did best.

Shimmin: Exactly. A good example of that is SAP with their Business By Design software, which is direct to consumer. They launched that a little over year ago and basically pulled back from it. They're getting ready to launch phase two this coming year. That is a testament. Sometimes, you shouldn't do it on your own when it comes to cloud computing.

Gardner: Joe McKendrick, we've seen a couple of impactful bloggers out there saying that the cloud is going to provide this all-or-nothing, and someone is going to come in and dominate the whole thing. Richard Stallman, I think, was perhaps a tip on the arrow for that mentality recently. Now, we've got Brad Shimmin and others saying, "Follow the salesforce.com model. It's one big happy sandbox. Let's get more players in." How do you see it, one or the other?

McKendrick: It's probably going to be quite a diversity of offerings. Cloud computing SaaS is essentially a delivery model and its going to be one of several delivery models. Don't throw out that data center just yet. Most companies aren't ready to commit their mission-critical core applications to cloud computing, especially the larger enterprises. They're staying with the license model at this point.

You see that cloud computing is very much at the periphery of things though. Amazon Web Services is very much a play for small startup companies. WebEx is an example, as Jim mentioned, of a cloud function that's probably used within large enterprises, but, again, it’s not a mission-critical core application.

SaaS cloud computing is one intriguing delivery mechanism, and it will be offered alongside more traditional delivery mechanisms for a long time to come. Probably 20 years from now, we're still going to see data centers on site.

Gardner: This is more of a peripheral activity and you can't dominate from that position. So, I guess you are more on the democratic side.

McKendrick: The democratic side. It's giving a lot of opportunities for new players. In Microsoft, you see something like Zoho coming up with its own compelling office suite. Google also has offerings in that space. I don't think they're going to unseat Microsoft, but it’s a nice alternative and it makes the market more interesting.


Shooting for the Moon

Gardner: Jim Kobielus, I don't know if you're a card player, but there is this game out there called Hearts that we used to play a lot when I was younger. Most of the players would just try to win, but once in a while somebody would look at their hand and say, "It's so good, I'm going to shoot for the moon. I'm going to try to take all the tricks and put everybody out." Do you think there is an opportunity for somebody to try to do that with cloud computing?

Kobielus: I don't think so. I think you hit the nail on the head, Dana, a few minutes ago, when you pointed out that success in the emerging cloud arena depends on having a very broad and deep ecology of partners. I see the partner ecosystem as the new platform for cloud computing, being able to put together a group of partners that provide various differentiated features and services within an overall cloud-computing environment.

Then, the hub partner, as it were, provides some core, enabling infrastructure that binds them all together. Core infrastructures such as, for example, a core analytic environment or distributed data-warehousing environment that manages all of the structured, unstructured, and semi-structured data, manages all of the very compute-intensive analytical workloads, CPUs, and other resources that many or all of the partner solutions can tap into -- a basic utility computing environment.

So if you look at this, then the salesforce.com APPExchange model is very much the template for success in cloud computing. Will others, or can others, provide a similarly rich partner ecosystem environment for their cloud computing efforts? Clearly, well established platform vendors, like SAP, Oracle, IBM, and Microsoft, stand to do very well in the cloud-computing paradigm, because they have substantial, global, differentiated partner ecosystems.

New players who come in to do just collaboration in the cloud, storage in the cloud, and so forth, will not necessarily be the dominating vendors in this new environment. They'll probably be members or participants in several partner ecosystems, providing some core functionality, but they won't dominate to the extent that the established brands will.

Gardner: Brad Shimmin, there is a certain traditional bifurcation in IT vendors. Some focus on platform values, services, and products, monetize around the platform, and are either agnostic or neutral around applications. There are others who are mostly focused on application-level values, monetize around that, and are either picking and choosing platforms or offering wide portability.

Can this bifurcation continue into the cloud era? For those who have an installed base of either platform and infrastructure or applications, can they transfer that to some sort of a cloud-based service offering and leverage off of their installed base?

Shimmin: I think so, Dana, and it's already happening with most of the major players we're talking about -- Oracle aside, because Oracle still seems to be on the fence with this. But, when you look at a company like Microsoft, they seem to be slow to market, and then, once they enter the market, they go really, really fast. They seem to be going really, really fast at the moment with two things, because they have both. They have the infrastructure and they also have the apps. They're going to have both paths.

They have the Azure platform, which is truly a PaaS offering that you use to build your own applications. So it's a layer above the Amazon EC2 infrastructure as a service.

Then they have the full-on SaaS-type products with Microsoft Online Services, which has in it almost the entirety of their collaboration software. So, they have actually sort of leapfrogged IBM Bluehouse a little bit with that.

The point is that these vendors are really looking at their portfolios and seeing which ones fit either of those two models. They're not committing to one or the other, Dana. They're really trying to tackle both ends at once.

Gardner: It seems to me, though, that, if you are focused on the applications, alternative cloud-based offerings for those applications could develop and undercut your business. If you're an infrastructure provider, you could start to offer similar cloud-based infrastructure services. That is to say, you could run your apps, you could perform services, do architectural development of business processes, and cross this barrier between internal and external services, if you stick with a common platform and a de-facto set of standards.

So, do you think, Brad, that there is a disadvantage for being an applications vendor who doesn't move rapidly to the cloud?


Risking the Customer Base

Shimmin: I think you're at a disadvantage. You're at risk of losing your customer base. We're going back to talking about the difference between Microsoft Online Services and Office Live, compared to Zoho and Google Apps, for example. Microsoft can be undercut in a heartbeat, if they don't do this right.

Back to what you were just talking about there, I see it as a distinction between PaaS and infrastructure as a service. There are a number of rudimentary basement layer infrastructure-as-a-service vendors out there who are going to be taking the Amazon model and talking to vendors, as we've been talking about, to act as a part of that ecosystem, and to build out this infrastructure that will allow you to build and deploy any PaaS or SaaS on top of that.

What that says to me is that the vendors who have that strong infrastructure piece -- the infrastructure software and in some cases the hardware and data centers themselves -- are going to be well positioned to play in that ecosystem, regardless of what sort of SaaS applications they have.

Gardner: Salesforce is an example of this. They were very application specific, when they first came out with their SaaS offering, and now they say, "No, no, we don't want to just be application specific. We really want to be cloud platform services specific." Right?

Shimmin: Yeah, they may be following that tail a little bit. If you look at EMC as well, they have sort of a similar approach. They built out a true platform. I think its called the Fortress. It's their own data center. They started with Mozy, which is like a consumer backup, and they have Recovery Point. They just announced a more generic data-center service for that. They'll just keep building out this broader infrastructure as they go.

Gardner: Tony Baer, two points. One, it's almost ironic now that being in the hardware business might actually be a good thing. For years, there was this notion that the hardware is commodity, and there will be no margin. It's the software where all the margin is. If we follow the logic of cloud computing, being a provider of the data center infrastructure, including hardware and storage, could be a safe bet, whereas the margins are going to continually be under assault on the applications. Does that make sense?

Baer: I'd still go along with what Brad was saying. I don't think that just a strict platform alone -- in other words, storage as a service -- is going to ultimately be a predominant play. It's a very niche buy. However, storage, along with what Jim was saying about analytics as a service, atop which you can then build an ecosystem of solution vendors, that's much more a winning combination.

When you look at the cloud, you're looking at a platform play, as opposed to a niche play, with one exception. The exception is that one of the advantages that's been stated about cloud computing is that it gives you a chance to kick the tires before you actually commit to implementing a solution or application. That's the one exception I would make there. Otherwise, I agree with the others here that, in the long run, probably the strongest play is more of a platform-based ecosystem, which will include infrastructure, because you can't have an ecosystem without having underlying infrastructure services.

Gardner: So, we have these cloud providers that are spending, in some cases, billions of dollars on a quarterly basis to build out their data centers. As more and more applications and services move to the cloud, they're not going to be able to get utilization beyond 100 percent. That's all you can get. So, they need to add more servers, storage, and data centers.

As this trend unfolds around the world, they need to be regional. They need to take advantage of lower costs of labor and energy. It seems to me that being a supplier to the cloud data-center providers is a pretty good business right now.

Baer: That is a good business. It's not the end business, but it's a good business. In other words, your successful cloud platform is not going to be just, "I'm going to provide storage as a service or backup as a service." However, to players who are trying to deliver PaaS it certainly is a winning strategy. You won't become a dominant cloud player, but you can certainly play a major part in that ecosystem.

Gardner: Joe McKendrick, let's look at this through the eyes of application development. It seems to me that developers and ISVs, are once again a very important component of how these newer trends unfold. The hearts and minds of developers, where ISVs see their businesses being good, can invest their development in order to get a return.

These green-field application developers, are they going to continue to be concerned about installed-base platforms, or are they going to see more of an opportunity in this PaaS and build their apps of, for, and by the cloud?


The New Heavy Industries

McKendrick: Dana, I just want to go back one point there. It's interesting when we talk about the larger infrastructure providers and cloud providers: the Amazons, and, to some degree, Microsoft. They've become the new heavy industry of the 21st Century. It's no longer the thing to do to pursue smokestack factories, and so forth.

Gardner: These are the new River Rouge build-outs, right?

McKendrick: Exactly. Microsoft, Amazon, and a couple of others are building huge sites, or have built huge sites, along the Columbia river basin, where there are vast amounts of hydroelectric power and cheap energy. We're seeing other communities around the country pursuing these large data center providers. That's become the heavy industry du jour.

Gardner: At a time when they will take any business they can get.

McKendrick: Exactly.

Gardner: So, what about developers, green field versus building to an on-premises platform framework lock in?

McKendrick: Just about every small ISV coming on the market now is offering a SaaS model. This is the way to go with the emerging smaller software-development companies.

For the larger developers, ISVs that are already well-established, it's now another delivery mechanism, another channel to reach their customer base. There are a lot of efficiencies. When you have a cloud model or are working with a cloud model, you don't have to worry about making sure all your customers receive the latest upgrade or deal with problems customers may be having with conflicting software. It's all done once. You do the upgrade once, test it, ensure the quality, deliver it, and it's all done in one location. It makes their job a lot easier.

Gardner: It certainly does. It's a very attractive model, if you're a developer. You don't have to put up a lot of upfront cost. You might not need to go out to a venture capitalist and get $30 million to start your company. You might to be able to do it with an angel or two, or maybe even bootstrap it, right?

McKendrick: Exactly. You don't have to worry about buying and mailing out those CDs.

Gardner: Jim Kobielus, you talk with a lot of startups. I talk with a lot of startups. When it comes to money -- whether it's VC money, angel money, or credit card debt money -- there's a lot of ferment in developing services of, for, and by clouds. I'm not being approached by many startups saying, "We're going to build a business around this Unix platform, this particular flavor of Java, or this particular instance of a Windows runtime environment." Does your view of the market jibe with mine?

Kobielus: Oh yes, very much so. What the whole trend towards SOA started was the gradual dissolution or deconstruction of the underlying platforms, as you mentioned -- OSs, development environments, and the declarative programming languages. This is all buggy-whip territory now in terms of what large and small software vendors are developing to. Pretty much everyone is now developing to a virtualized SOA, cloud environment.

When I say "cloud" in this context, most of the large and small vendors that I talk to -- although they may not use the word cloud -- are really looking at more of a flex-sourcing approach to delivering solutions to market. They might come in with a subscription service, but they often say, "We want to put it out in the market, see if anybody signs up, and then see if from there we want to turn it into some sort of packaged, licensed software offering, or, conceivably, we might turn some aspect of it into a hardware appliance that's optimized for one function."

Most of the vendors that I talk to now have three broad go-to market delivery approaches for flexible delivery of applications or of solutions. They have everything as a service approach, the appliance approach, and the packaged, licensed software approach.

If you look at cloud computing as a Venn diagram, with many smaller bubbles within it, one of the hugest bubbles is this notion of flexible packaging and sourcing of solution functionality.

The "Chinese Wall" between internal hosting and external hosting is dissolving, as more and more organizations say, "You know what. We want to do data warehousing. We'll license a software from vendor X. We might also use their hosted offerings for these particular data marts. We also might go with an appliance from them, for either our data warehousing hub, a particular operational data store, or another deployment wall where the appliance form factor makes most sense."

Gardner: Brad Shimmin, do you see this breakdown too, where the green-field applications, these newer services, can take advantage of ecologies and bring other services into play and create a solution level or a business-process level value? They're going to be all formed by the cloud. They're going to be virtualized. And, they really don't have much concern about what the underlying infrastructure is. Perhaps what they would like to see more of is the ability to run that application in some sort of a coordinated fashion, both on premises and in some cloud.


Pressures on the Market


Shimmin: I do. There are a couple of pressures that are making that so, and I'm very much with Jim on this. When you look at companies like IBM and Microsoft -- Microsoft with their Software plus Services, and IBM with their Foundation Start Appliance, coupled with their Bluehouse software as a service, coupled with their on-premises collaboration software -- you're talking about a solution that spans those three delivery mechanisms.

The pressures I'm talking about that are making that so for the enterprise buyer is that you don't want to have a full SaaS deployment, and you don't want to have a full appliance deployment. When you consider issues like ownership of data, privacy of that data or SOAs, even transaction volumes, there are facets of your enterprise application that are best suited to running in an appliance, in your data center, or in the cloud.

So, these vendors we are talking about here clearly recognize that need, and are trying to re-architect their software so it can run across those three channels in different ways.

For example, let's say you have an email-messaging repository that sits inside your data center. Then, you have the actual transactions for sending out those messages that lives inside Microsoft's Online Services. That's what they're building toward. As I said, they're building toward it. Neither of those vendors -- and they are the two that are leading the way with that approach -- have achieved that. This isn't something you can do today. It's something that you are going to be able to do tomorrow.

Gardner: I think I hear you saying that we shouldn't think of this as an application-by-application decision. That is to say, "I'm going to pick and choose among my 800 applications, and 500 of them will stay on premises, and 300 of them I'll push out into a cloud provider. Let them do it as a SaaS provider. What I hear you saying is not application by application but underlying infrastructure service by infrastructure service, and that's what I will use to divide what I keep on premises and when I go to a cloud.

For example, I might want to keep data services on premises. I might want to keep directory and access management and overall governance on premises, but I might want to pick and choose from a variety of services off the cloud that then create these application processes.

McKendrick: The underlying architecture that a lot of vendors are moving toward to enable that degree of flexible deployment of different form factor -- hosted service, appliance, and packaged license software -- is the notion of shared nothing, massively parallel processing for extreme scale-out capabilities and extreme scale up as well.

In a federated model, where you have different clusters that can be internal, external, or in combinations specialized to particular roles within the application environment, some might be optimized for data warehousing, some might be optimized for business-process management and workflow, and others might be optimized for the upfront delivery, Web 2.0, REST, and all that. But, having shared nothing, massively parallel processing, with a federated middleware fabric in an SOA context, is where everybody is moving their platform and strategy.

Gardner: What's essential for any of that to happen is you need to be doing SOA. You're not going to be able to take advantage of this hybrid model, not just on a application level, but on an infrastructure services level, unless you have already adopted, or well on your way to moving towards SOA. Does anyone disagree with that?

Baer: No disagreeing from me. SOA is the way. Ultimately the vision of SOA -- and this is something I try to bring up -- is that companies are going to be both consumers and providers of services, and the vision of SOA is that it runs both ways. You publish services and you consume services.

We're going to see companies, not necessarily in the IT business, take on more of a role as a provider of service, and they are going to have the infrastructure. They have the infrastructure in place now. SOA connected with grid computing can play a role here, where you have companies providing these services. They're looking for scalability. They're looking to maybe extend their service offerings beyond their corporate walls to their ecosystem of partners, maybe to customers, customers exclusively of that service, and without a prior relationship with the company.

Through SOA, perhaps companies can look at increasing capacity or tapping into capacity as needed in a grid like fashion, either with each other, or with a provider out there such as Amazon or IBM.


SOA an Essential Starting Point

Gardner: I'm definitely telling people that if you like the idea of cloud, if you see yourself moving towards wanting to take advantage of cloud-like services and values over time, you need to get your act together on the SOA front. It's absolutely essential. I think we all agree on that.

We talked about ecologies. We talked about how groupings of companies working together with mutual interdependence, or the fact that they are going to benefit mutually from a certain market orientation, makes sense.

It seems to me that this is not just going to be among the little guys however. There is a certain alignment that might be in the offing among some of these big companies. So, I'll offer up one potential bedfellows opportunity. That would be IBM and Google, perhaps working closely with someone else like Cisco. If you put together what Cisco brings to the table in terms of increasing services as a function of the network and what Google offers, which is more SaaS and PaaS, as well as a tremendous amount of metadata on what consumers are doing, with what IBM does in terms of its touch in the enterprise and in infrastructure, those three together strike me as a powerhouse.

Does anyone disagree, or do we have any other sort of mega-clusters of vendors out there that would align well in a cloud model?

Shimmin: I think, just like you have at the high school prom, some strange pairings and some power pairings. The power pairing you mentioned there is like the prom king and queen, and they will probably be one of those.

There are a number of others that are kind of bizarre, but I think really beneficial, like TIBCO and Cisco, for example. You wouldn't think it, but TIBCO, with their work on Active Matrix as the abstraction and virtualization of software coupled with Cisco's work in the cloud for their own data center that they're rolling out -- and in terms of the equipment they ship into private clouds inside enterprises -- it's a great combination.

Gardner: How about a combination of HP with EDS, and Microsoft plus something else? Does that make a cluster sense?

Shimmin: If you're talking from the SI perspective, absolutely, because Microsoft is channel bound. That's their legacy and their current status. Being able to work with somebody like an EDS would benefit them tenfold in this situation.

Gardner: How about Apple and SAP? How would they line up with a sort of mega-cluster, anybody?

Baer: It would have to be on Steve Jobs' own terms, and I would have a hard time seeing the Germans at SAP go along with that.

Gardner: Oracle, SAP and Apple, is that what you are seeing? One or the other. Apple teaming up with SAP or Oracle?

Baer: When you take a look at Apple's business model, they like to definitely be like the queen bee in a hive of lots of drones.

Gardner: Well, we talked about playing Hearts and shooting for the moon. Isn't Apple a shoot-for-the-moon kind of company?

Baer: Certainly. Take a look at Apple's traditional approach to partnership, and take a look at how they are handling the application space with the iPhone. They will definitely insist on control, and with a powerful player that also insists on control, like SAP, I have a hard time imagining those two coming together.

Gardner: Remember, we're not talking about mergers and acquisitions. We are talking about partnerships.

Baer: I agree with you, but, essentially, when you are talking about partnering in a cloud, it is a form of virtual merger and acquisition.


All For One -- One For All

Gardner: Maybe it is. Interdependency -- we live or die together, all for one, one for all.

How about Amazon? That would be in my thinking a pretty good candidate for prom queen right now. Perhaps there will be some polygamy at the prom, because Amazon could team up potentially with say an Oracle and a Salesforce. Can you imagine such a pairing?

Kobielus: Yeah, because Oracle, a couple of months ago, announced that you can now take your existing Oracle database licenses and you can move them to the Amazon EC cloud and the Amazon storage service. So, to a degree, that partnership foreshadows possibly a larger relationship between those two companies going forward.

I think its really an interesting pairing of Oracle plus Amazon. Once again, I always have to hit the analytics thing on the head, because I think database analytics or cloud-scalable analytics is going to be a key differentiator for most application vendors.

So, out of the blue, I can imagine that Oracle and Amazon partner with one of the leading data-mining vendors, such as SAS Institute. SAS Institute, based in North Carolina, is a privately held firm. They continue to emphasize that they've been doing SaaS as an alternate delivery channel for their very verticalized and content-rich analytic applications for several years now.

I don't think it's inconceivable that Jim Goodnight, the founder of SAS might say, "Yeah, Oracle and Amazon have got a cloud thing going on. It makes great sense for us to take our existing SaaS strategy and bring it into an Oracle/Amazon cloud so we can continue to penetrate a broader and deeper range of verticals with very flexible options.

Gardner: Clearly, in the cloud, the best analytics will have a significant advantage. Now, what about Red Hat? How about a Red Hat-Amazon pairing? Does that make sense?

Shimmin: That's already done. They have both their app server and their operating system running on it now, but interestingly enough, Dana, they're not going to go up the stack any further at this point.

Gardner: They seem to be sort of pulling back. I think you're right.

Shimmin: I talked to them about their enterprise service bus (ESB) in particular a couple of weeks ago, and I was sort of surprised to hear them tell me that they really didn't feel that, given the way their customers purchased software, it was a model that would work for them.

Gardner: I think they've recognized that the virtualized runtime instance with a stripped-down Linux kernel is a really good business and they should stick with their knitting.

Shimmin: Right, but if you look at their portfolio, you have Drools for example and their business process management (BPM) products, for which I can't remember the name. Those two PaaS offerings would be phenomenal. Don't you think?

Gardner: Yeah. Let's go back to Microsoft. Microsoft has an opportunity to shoot for the moon. I'm going to be a little bit of a contrarian on that. They have all the essential pieces. They have a very difficult transformation to make in terms of their business. They have a lot of cash in the bank, and we're in a transformational period.

If you were going to make a big move, now it would be an excellent time to do it. It really comes down to execution -- whether they can get the various feuding parties inside the company to line up well. But, Microsoft also has to make a choice as to whether they want to be everything to everybody or strive for a better ecology.

Does anyone have any thoughts about what Microsoft should do? Should they try to do it all, or should they become more of an infrastructure-focused provider, not try to be buying Yahoo and becoming search and consumer and applications? Leave that to the other players in the ecology Let's look at Microsoft's situation and let's go to Tony Baer.

Baer: For this, I think there's a difference between what they strategically should do as a company, versus what Wall Street would prefer them to do. Wall Street is always looking for quarterly numbers and a show of growth. Obviously, buying something like a Yahoo, even though Yahoo at this point is pretty much damaged goods, provides that obvious growth into an area that Microsoft and Wall Street have been obsessed with.

However, what's smarter in the long run is the whole Software-plus-Services strategy, which is a great idea, but the devil is going to be in the detail. The idea of providing a seamless, or relatively seamless, experience of whether you're working with -- let's say, Word online versus Word at your desk or SharePoint, or whatever -- is a great idea. I think Microsoft is right now puzzling out the technical details, which is that Word online is not going to be the same exact creature as Word on your desktop.

Gardner: Doesn't Microsoft Software-plus-Services put them at odds with the ecology mentality? Doesn't it, in a sense, push these green-field applications that only want to be in tuned with a virtualized environment. Doesn't it turn them off?

Baer: Well, it might, but do you want to follow your customers in terms of how they want to work or do you want to follow a blind ideal. I think what Jim and Brad were saying before is that, in terms of what customers are going to prefer in the long run, it's going to be a mixed bag.

There are going to be certain services that you will want to consume as a service versus some assets, processes, or functions that your corporate policies and matters of governance are going to require that you keep in-house.

Gardner: I think Microsoft has an opportunity to make an offer that developers can't resist -- and probably no one else is in a position to do it -- which is to say, "We will have at least one of the top three clouds. We're going to give you the tools and give you simplicity that Joe the plumber can develop, and we're going to make sure that you have a huge audience of both consumers and businesses that we're going to line up for you." Isn't that a formidable position, Joe McKendrick?

McKendrick: Very much a formidable position. They've already made a lot of moves in this direction: Software plus Services, the Live offerings. They're already positioning a lot of their product line. They work with Amazon and have offerings through the Amazon service as well.

Microsoft gets into everything. Wherever you look, in the enterprise or in computing, they have some kind of offering there. Sometimes, the things don't take off for a while. They sit and bide their time, and eventually it takes hold.

Gardner: Tony Baer that said Apple computer was like a queen bee with drones. We could apply that to Microsoft as well. It might not be an ecology, as much as the queen bee in the hive dictating all the rules and then the drones just click along, making that a pretty good living, but Microsoft makes the lion's share of the dough.

McKendrick: I think that's a good model. In fact, thinking about the Microsoft plus Yahoo, it makes really good sense for them both to be a real powerhouse together in cloud computing. Earlier, I stressed that the providers who dominate the cloud world will be those that focus on extreme scalability, scale out, shared nothing, massively parallel processing being able to sift and analyze petabyte upon petabyte of data from all over especially of the Web 2.0 world especially clickstream information, and so forth.

Microsoft is already very much focused from the highest level on cloud computing with Azure, Live, and so forth. Clearly, they've got all of their online assets from years back. So, they are very much focused on that.

In terms of scalability, Microsoft has one recent announcement that's pivotal to the development of their platform. They acquired a company called DATAllegro a few months ago, a data warehousing appliance vendor whose primary differentiator was a very strong shared nothing, massively parallel-processing architecture that Microsoft is making the core of a near-future SQL server-based data warehousing environment.

The thing is I am fascinated about with DATAllegro's technology is that it can be used to build the underlying scale-out substrate for cloud computing as well -- combining DATAllegro's strengths from a technical side with Yahoo's strengths on parallelization through the MapReduce or Hadoop framework. They've been one of the leaders in pushing to do really massively parallel clickstream analysis on Web 2.0 and social networking information within the Microsoft cloud, I think that would be a killer combination to dominate this world.

Gardner: Why is that important? Microsoft has Yahoo, or at least the search part of Yahoo, and apparently they can just dump the rest, if they want. That gives them all that metadata about what the consumers are doing. Then, they've got all that information about each and every enterprise in small-to-medium sized business that they deal with. They've got the PaaS cloud and their own channel, if not an ecology of channels.

They can go back to those developers and say, "If you want to remain in business, we're the best bet. We can give you the metadata of how to reach the consumers. We can give you the metadata how to reach the businesses. We might even be able to join them together in a transactional relationship. We take a cut. You take a cut." Any thoughts out there?

Shimmin: With regards to Microsoft's channel, as you and Jim were saying, Microsoft is definitely going to be the queen bee and they are definitely going to make it beneficial to this channel to work with them in their cloud initiatives. At the same time, it's also Microsoft's greatest risk.

When you look at their PaaS with Azure, that makes sense for the channel, because how the channel differentiates is by the services they provide their customers directly, and that comes from developing code. But, when you talk about Microsoft's online services, Office Live, and those things, they are in a very precarious predicament of undercutting the values that their channel partners provide.

They're literally saying, "Hey, why do you need a channel partner for the SMB market, just come right to us and give us your credit card, which you can do for a certain number of dollars a month, and you are running."

Gardner: Right, so perhaps Microsoft has the golden opportunity but the transition is perilous, and execution has to be perfect. Just as we had back in the "anti" days, when all of the Unix vendors got together and created what they called the "anti-Microsoft coalition," all these other cloud providers, ISVs, developers, and all the PaaS people are going to get together and try to provide more of a marketplace, in order to if not staunch Microsoft, at least create that democratic approach to cloud. Does that make sense?

Shimmin: Agreed, and interestingly enough -- I can’t believe I'm saying this -- Microsoft has really done something spectacular here, because it all comes back to the developer. What the developer does drives what software you run on the server, in many cases. What Microsoft has done with the Software-plus-Services program initiative, right now, today, using the 3.5 .NET framework in Windows 2008, you can write code that can be dropped in the cloud or on the desktop automatically. You can just write a rule that says, "If I reach a certain service level agreement (SLA), just kick this piece of code to the cloud."

Gardner: So Microsoft and not the business becomes the arbiter.

Shimmin: Exactly

Gardner: OK. I'm afraid we have to wrap this up. We've had an engaging discussion about cloud, but in the context of large vendors, and how the business side of IT will react to this. It's clearly a subject we'll be dealing with for a long time.

I'd like to thank this week's panel. We've been joined by Jim Kobielus, senior analyst at Forrester Research.

Kobielus: Thanks, Dana and thanks everybody. Have a happy Thanksgiving.

Gardner: Tony Baer, senior analyst at Ovum. Thanks, Tony.

Baer: Have a great holiday everybody.

Gardner: Brad Shimmin, principal analyst at Current Analysis.

Shimmin: You're welcome, and thanks for having me. Happy holidays everyone.

Gardner: And last, but not least, Joe McKendrick, independent analyst and prolific blogger on ZDNet and ebizQ.

McKendrick: Thanks Dana, and since everybody will be listening to this in December, have a happy holiday and a happy new year.

Gardner: And I'd also like to thank our charter sponsor for the BriefingsDirect Analyst Insights Edition podcast series, Active Endpoints, maker of the Active VOS visual orchestration system. This is Dana Gardner, principal analyst at Interarbor Solutions. Thanks for listening and come back next time.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Charter Sponsor: Active Endpoints.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Edited transcript of BriefingsDirect Analyst Insights Edition podcast, Vol. 34, on cloud computing and its impact on IT vendors, recorded Nov. 21, 2008. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.