Showing posts with label Jim Kobielus. Show all posts
Showing posts with label Jim Kobielus. Show all posts

Wednesday, February 03, 2010

BriefingsDirect Analysts Discuss Ramifications of Google-China Dust-Up over Corporate Cyber Attacks

Edited transcript of a BriefingsDirect Analyst Insights Edition podcast, Volume 50, on what the fallout is likely to be after Google's threat to leave China in the wake of security breaches.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Charter Sponsor: Active Endpoints.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Dana Gardner: Hello, and welcome to the latest BriefingsDirect Analyst Insights Edition, Volume 50. I'm your host and moderator Dana Gardner, principal analyst at Interarbor Solutions.

This periodic discussion and dissection of IT infrastructure related news and events with a panel of industry analysts and guests, comes to you with the help of our charter sponsor Active Endpoints, maker of the ActiveVOS business process management system.

Our topic this week on BriefingsDirect Analyst Insights Edition focuses on the fallout from the Google’s threat to pull out of China, due to a series of sophisticated hacks and attacks on Google, as well as a dozen more IT companies. Due to the attacks late last year, Google on January 12th vowed to stop censoring Internet content for China’s web users and possibly to leave the country altogether.

This ongoing tiff between Google and the Internet control authorities in China’s Communist Party-dominated government have uncorked a Pandora’s Box of security, free speech and corporate espionage issues. There are human rights issues and free speech issues, questions on China’s actual role, trade and fairness issues, and the point about Google’s policy of initially enabling Internet censorship and now apparently backtracking.

But, there are also larger issues around security and Internet governance in general. Those are the issues we’ll be focusing on today. So, even as the US State Department and others in the US federal government seek answers on China’s purported role or complicity in the attacks, the repercussions on cloud computing and enterprise security are profound and may be long-term.

We’re going to look at some of the answers to what this donnybrook means for how enterprises should best protect their intellectual property from such sophisticated hackers as government, military or, quasi-government corporate entities and whether cloud services providers like Google are better than your average enterprise or even medium-sized business at thwarting such risks.

We'll look at how users of cloud computing should trust or not trust providers of such mission-critical cloud services as email, calendar, word processing, document storage, databases, and applications hosting. And, we’ll look at how enterprise architecture, governance, security best practices, standards, and skills need to adapt still to meet these new requirements from insidious world-class threats.

So, join me now in welcoming our panel for today’s discussion. Welcome to Jim Kobielus, senior analyst at Forrester Research. Hello, Jim.

Jim Kobielus: Hi Dana. How are you, buddy?

Gardner: Jason Bloomberg, managing partner at ZapThink.

Jason Bloomberg: Hi. Glad to be here.

Gardner: Jim Hietala, Vice President for Security at The Open Group.

Jim Hietala: Hello, Dana. [Disclosure: The Open Group is a sponsor of BriefingsDirect podcasts.]

Gardner: Elinor Mills, senior writer at CNET. Hello, Elinor.

Elinor Mills: Hi.

Gardner: And Michael Dortch, Director of Research at Focus.

Michael Dortch: Hi, Dana, and greetings, everyone.

Gardner: Thanks. Great having you with us Michael.

Elinor, let me start with you. You’ve been covering Internet security, and even Google specifically, for several years now. When we think of security, we often think of teenage hackers or lowbrow malware and pesky pop-ups, but do you think that this Google-China finger-pointing business has, in a sense, changed the way security is viewed.

Pointing fingers

Mills: Oh, absolutely. We’ve got a huge first public example of a company coming out and saying, not only that they've been attacked -- companies don’t want to admit that ever and it’s all under the radar -- but also they’re pointing the fingers. Even though they're not specifically saying, "We think it’s the Chinese state," but they think enough of it that they're willing to threaten to pull out of the country.

It’s huge and it’s going to have every company reevaluating what their response is going to be -- not just how they’re going to do business in other countries, but what is their response going to be to a major attack.

Gardner: Does this mean that the companies, enterprises specifically, need to rethink both security for what you'd call criminal activity, but now think at a higher level -- higher level being government versus government?

Mills: Yes, if they’re big companies -- mid-size companies maybe not so much. Bigger companies have been targeted with espionage for a while, especially if they have any kind of technology that China or any other country might want. I think there's going to be more emphasis on it. They’re going to have to think about it. For smaller companies, it’s not going to be as much of a problem.

Gardner: Jim Kobielus, do you view this as a big issue or is this more of the same? Have the folks that you deal with, who are protecting their data and information, been aware of these threats? Is this more of a public relations problem than a real one?

Kobielus: I won’t say it’s just a public relations problem. It is a real one. If you’re going to be a multinational firm -- I've heard the term "supernational" used as well -- you’re not above the laws and governmental structures of the nations within which you operate. It's always been this way. This is a sovereign nation, and you're subject to their laws.

If you’ve been a multinational firm before, or if you wish to be one, you’ve got to play by whatever rules are imposed upon you to operate in these spheres. One of the key issues for Google is whether they want to continue to be a business that’s growing in this particular market, subject to whatever rules are laid down, whether they want to be a crusader for civil rights, human rights, whatever, in the Western context, or if they’re trying to be both. It means they’re going to have to contend with the government of the People’s Republic of China on their own turf -- and good luck there.

Gardner: Don’t you think, Jim, that these issues transcend national boundaries or even laws that govern as a particular sovereign nation? If your servers are in one country, why should it be bound by the laws in another?

Kobielus: Well, your servers are physically hosted somewhere. Your access is from people, end users, in many nations that are trying to access whatever services you provide from those physically hosted servers.

So, your users and your servers are subject to the laws and the firewalls and security constraints and so forth in the various nations within which you will physically operate, as well as where your supply chain and your customer base will physically operate. None of these segments, these nodes, in this broader value chain are free floating in space like they're elevated platforms in the Jetsons.

Wakeup call?

Gardner: I think Google is going to perhaps challenge the way you’re looking at this. It should be interesting to see how it pans out. Jason Bloomberg, does this provide some sort of a wakeup call for enterprises and service providers as well about how they architect? Do they need to start architecting for a larger class of threats?

Bloomberg: It’s not as big of a wakeup call as it should be. You can ask yourself, "Is this an attack by some small cadre of renegade hackers or is this attack by the government of the People’s Republic of China? That’s an open question at this point.

Who is the victim? Is it Google, a corporation, or the United States? Is it the western world that is the victim here? Is this a harbinger of the way that international wars are going to be fought down the road?

We’ve all been worried about cyber warfare coming, but we maybe don’t recognize it when we see it as a new battlefield. It's the same as terrorism. It’s not necessarily clear who the participants are. We have this 18th Century view of warfare, where two armies meet on the battlefield and slug it out with the weapons of the day. But, terrorism has introduced new types of weapons and new types of battlefields.

Now we have cyber warfare, where it’s not even necessarily clear who the perpetrator is, who the victim is, or who the offended party is. This is a whole new context for conflict in the world.

When you place the enterprise into this context, well, it’s not necessarily just that you have a business within the context of a government subject to particular laws of particular government, you have the supernational, as Jim was taking about where large corporations have to play in multiple jurisdictions. That’s already a governance challenge for these large enterprises.

We already have this awareness that every single system on our network has to look out for itself and, even then, has levels of vulnerability.



Now, we have the introduction of cyber warfare, where we have concerted professional attacks from unknown parties attacking unknown targets and where it’s not clear who the players are. Anybody, whether it’s a private company, a public company, or a government organization is potentially involved.

They may not even fully know how involved they are or whether or not they are being targeted. That basically raises the bar for security throughout the entire organization. We’ve seen this already, where perimeter-based security has fallen by the wayside as being insufficient.

Sure, we need firewalls, but even though we have systems inside our firewalls, it doesn’t mean they are secure. A single virus can slip through the firewall with no problem at all. We already have this awareness that every single system on our network has to look out for itself and, even then, has levels of vulnerability. This just takes it to the national level.

Kobielus: But, there has always been corporate espionage and there’s always been vandalism perpetrated by companies against each other through subterfuge, and also by companies or fronts operating as the agent of unseen foreign power. This is what was the Germans did in this country before World War II to infiltrate, or what the Soviet Union did after World War II.

This is international real-politic as usual, but in a different technological realm. Don’t just focus on China. Let’s say that Google had a data center in Venezuela. They could just as easily have that expropriated by Hugo Chavez and his government. In China, that’s a possibility too.

Nothing radically new

What I’m saying is that I don’t see anything radically or fundamentally new going on here. This is just a big, powerful, and growing world power, China, and a big and growing world power on a tech front Google, colliding.

Mills: They have so much data. They’re becoming a service provider for the world. It’s not just their data that’s being targeted. You’ve got the City of Los Angeles, you’ve got DC, other government entities, moving onto Google Apps. So, the end target in the cloud is different than just the employees of one company.

Dortch: That challenge puts Google in the very interesting position of having to decide. Is it a politically neutral corporation or is it a protector of the data that its clients around the world, not just here, and not just from governments but corporations? Is it a protector and an advocate of protection for the data that those clients have been trusted to it? Or, is it going to use the fact that it is a broker of all that data to sort of throw its muscle around and take on governments like China’s in debates like this.

The implications here are bigger than even what we’ve been discussing so far, because they get at the very nature of what a corporation is in this brave new network world of ours.

And, this is taking place against the backdrop where the Supreme Court just decided that corporations in the United States have the same free speech rights and political campaigns as individuals. We're not clear at all on what this is going to mean for how the entity called a corporation is perceived, especially in the cloud.

Gardner: Thank you, Michael. Jim Hietala, help me understand, from your perspective, is this a game-changing event or is this more business as usual when it comes to corporate security.

Hietala: In terms of the visibility it’s gotten and the kinds of companies that were attacked, it’s a little bit game-changing. From the information security community perspective, these sorts of attacks have been going on for quite a while, aimed at defense contractors, and are now aimed at commercial enterprises and providers of cloud services.

I don’t think that the attacks per se are game-changing. There’s not a lot new here. It’s an attack against a browser that was couple of revs old and had vulnerability. The way in which the company was attacked isn’t necessarily game-changing, but the political ramifications around it and the other things we’ve just been talking about are what make it a little game-changing.

Gardner: I’d like to understand more about Michael Dortch’s point about the cloud providers and Elinor's as well. Should people think about a cloud provider as the best defense against these things, because they are current and they’ve got the power of scale they need to make this secure or their business itself is undermined?

Or, is this something that’s best done at the individual level, company by company, firewall by firewall? Does anyone have some thoughts about that?

Dortch: I’m reminded of what Ronald Reagan famously said, “Trust, but verify.” It’s one of those things where the cloud becomes a part of a good defense, but you can’t place all of your eggs in any one basket.

Combining resources

Companies that are doing business internationally and that worry about this sort of thing -- and they all should -- are going to have to combine cloud-based resources from reputable companies with documented protections in place with other protections, in case the first line of defense fails or is challenged in some major way.

Kobielus: In some ways, we all perceive what a cloud provider like Google needs to be regarded as in international law. It’s almost like a cyber Switzerland. Basically, it’s almost like, in another metaphor, an off-shore bank for your data and your other assets, in the same neutral role that Switzerland has played through the years, including during World War II for Nazi secreted assets.

In other words, it’s somehow a sovereign state, in its own right, with the full rights and privileges accruing thereto. I don’t think anybody is willing to take it that far in international law, but I think there is this perception that for cloud providers like Google to really realize their intended mission, there needs to be some change in international governance of sort of assets that transcend nation states.

Bloomberg: You could actually think of that as a reductio argument, because there isn’t going to be such a change. Cloud environments do not have that sort of power or capability and, if anything, cloud environments reduce the level of security.

They don’t increase it for the very reason that we don’t have a way of making them sovereign in their own right. They’re always not only subject to the laws of the local jurisdiction, but they’re subject to any number of different attacks that could be coming from any different location, where now the customers aren’t aware of this sort of vulnerability.

So, “Trust, but verify,” is a good point, but how can you verify, if you’re relying on a third party to protect your data for you? It becomes much more difficult to do the verification. I'd say that organizations are going to be backing away from cloud, once they realize just how risky cloud environments are.

All enterprises still are going to have to be at the top of their game, in terms of protecting their assets. . .



Mills: Microsoft’s general counsel Brad Smith this week gave a keynote at the Brookings Institute Forum, and he talked about modernizing and updating the laws to adapt specifically to the cloud. That included privacy rights under the Electronic Communications Privacy Act being more clearly defined, updating the Computer Fraud and Abuse Act, and setting up a framework so that differences in the regulations and practices in various countries can be worked out and reconciled.

Gardner: What happens if you are a small to medium-sized business and you might not have the resources to put into place all the security you need to deal with something like a China or Venezuela, or perhaps some large company that’s in another country that wants to take your intellectual property? Are you better going to a cloud provider and, in a sense, outsourcing security? Jim Hietala, does that make sense for a small to medium-sized business?

Hietala: I don’t think you can make that case yet today. I don’t think there is a silver-bullet cloud provider out there that has superior security to have that position. All enterprises still are going to have to be at the top of their game, in terms of protecting their assets, and that extends to small or medium businesses.

At some point, you could see a cloud provider stake out that part of the market to say, "We’re going to put in a superior set of controls and manage security to a higher degree than a typical small-to-medium business could," but I don’t see that out there today.

Waiting for disaster

Dortch: All of us who’ve doing this for a while, I think, will agree that where security is concerned, especially where cyber security is concerned, at least in North America, where I’m most familiar, companies tend not to talk about it or do anything, until there is some major catastrophe.

Nobody buys insurance, until the house next doors theirs burns down. So, from that perspective, this event could be useful. In terms of protecting their data, one of the issues that incidents like this raises is exactly how much corporate data is already in the cloud.

Many small businesses outsource payroll processing, customer relationship management (CRM), and a whole bunch of things. A lot of that stuff is outsourced to cloud service providers, and companies haven’t asked enough questions yet about exactly how cloud providers are protecting data and exactly how they can reassure that nothing bad is going to happen to it.

For example, if their servers come under attack, can they demonstrate credibly how data is going to be protected. These are the types of questions that incidents like this can and should raise in the minds of decision-makers at small and mid-sized businesses, just as they're starting to raise these issues, and have been raising them for a while, among decision-makers at larger enterprise.

Kobielus: I think what will happen is that some cloud providers will increasingly be seen as safe havens for your data and for your applications, because (A) they have the strong security, and (B) they are hosted within, and governed by, the laws of nation states that rigorously and faithfully try to protect this information, and assure that the information can then be removed -- transferred out of that country fluidly by the owners, without loss.

How about governments in general, maybe it's the United Nations who steps in? Who is the ultimate governor of what happens in cyber space?



In other words, it's like the Cayman Islands of the cloud -- that offshore banking safe haven you can turn to for all this. Clearly, it's not going to be China.

Gardner: We’ve seen in the history of the United States -- and, of course, the business world at large -- that whenever threats elevate to a certain level, the government steps in. We have seen with piracy, border controls, taxation, trade mandates, freedom pacts, and so forth. Whenever a threat arises, businesses get up and say, "Hey, we pay taxes. Uncle Sam, please come in and save us," whether it's through the navy or some technology.

Should we expect that, if we come to understand that this was an attack against American business interests from a foreign government of some kind, that it's up to the government to solve the problem? How about governments in general, maybe it's the United Nations who steps in? Who is the ultimate governor of what happens in cyber space?

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Dortch: Dana, in 2007, the National Academies of Science issued a cyber security report, and it included ten provisions that, at that time at least, were looked at as potentially the foundation for a cyber security bill of rights. Maybe it's time to reawaken discussions like that. Maybe what's needed is the cyberspace equivalent of the United Nations.

This is a lot of heavy lifting that we're talking about, and businesses have problems to solve and threats to address today. So your question begs another one: how do we get to the stage we need to be, where there can be trusted offshore equivalence databanks and all of that? And, what do we do in the meantime? I'm not smart enough to have answers to those questions, but they're really interesting.

We know the game

Kobielus: At a governmental level, obviously there will always be approaches and tools available to any sovereign nation -- treaties, negotiations, war, and so forth. We all know that. Clearly, we all know the game there.

In terms of who has responsibility and how will governance best practices be spread uniformly across the world in such areas of IT protection, it's going to be some combination of multilateral, bilateral, and unilateral action. For multilateral, the UN points to that, but there are also regional organizations. In Southeast Asia there is ASEAN, and in the Atlantic there is NATO, and so forth.

So, there is going to be a combination of all that. For this administration and subsequent administrations in the U.S., it’s just a matter of their putting together a clear agenda for trying to influence the policies, practices, and enforcement within China and other nations that may prove unreliable in terms of protecting the interest of our businesses.

Dortch: And, Secretary of State Clinton’s director of innovation -- I believe that's his title -- has already said publicly that it's a linchpin of our negotiating strategy with China and other countries.

Just as we, as a country, are an advocate for human rights, we're increasingly and more overtly advocating that other country’s citizens have free access to the Internet and basically have the cyber equivalent of human rights. That's going to play out in some very interesting ways as it becomes a larger part of our global diplomatic effort.

At a governmental level, obviously there will always be approaches and tools available to any sovereign nation -- treaties, negotiations, war, and so forth.



Kobielus: Keep in mind that the UN had a human rights declaration in 1946. China signed up, the Soviet Union signed up, and it didn’t make a whole lot of difference in terms of how they treated their own people over time. Keep in mind that such declarations are fine and dandy, but often don’t have much impact on the ground.

Gardner: So, enforcement is important. What we’ve seen so far is the enforcement of the marketplace, and I think that's what Google is up to in many respects. They’re saying, "Listen, we are a big enough company. We have such sophisticated technology and our price points for our services are so low that you would be at a disadvantage as a competitive nation not to have us working inside of your market, China."

Then, China says back to Google, "We are potentially, if not already, the biggest Internet market in the world, so don’t you think you have to adhere to our dictates in order to play ball in our court?" So, there is sort of a tussle within market powers. Is that's going to be the best way for these issues to be resolved?

Kobielus: It’s going to have to be resolved in the China context. They are the middle kingdom. They’ve seen themselves as the center of the universe, and it's not just me saying that. It's all manner of China scholars. This not fundamentally any different from the way in which Chinese centralized bureaucracy and governance for over 2,000 years.

Gardner: Jason Bloomberg, do you think that the traditional free market -- the powerful interests and the money -- are enough to balance the risks associated with security in this newest age?

Who decides "enough?"

Bloomberg: When you say "enough," the question is who decides what is enough. We have these opposing forces. One is that information should be free, and the Internet should be available to everybody. That basically pushes for removing barriers to information flow.

Then you have the security concerns that are driving putting up barriers to information flow, and there is always going to be conflict between those two forces. As increasingly sophisticated attacks develop, that pushes the public consensus toward increasing security.

That will impact our ability to have freedom, and that's going to be, continue to be a battle that I don’t see anybody winning. It's’ really just going to be an ongoing battle as technology improves and as the bad guys attacks improve. It's going to be an ongoing battle between security and freedom and between the good guys and the bad guys, as it were, and that's never going to change.

Gardner: Now, taking up on your point, Jason Bloomberg, about this being a spy-versus-spy kind of world, that's been that way so far. We thought about how governments might come in. Large corporations can play their role. Cloud providers might have to step in and offer some sort of an SLA-based protection or outsourced security opportunity of some kind.

What about going in the other direction? What if we go down to the individual who says, "If I'm going to play in the cloud or in this world-class cyber warfare environment, I want to have high encryption. I want to be able to authenticate myself in the best way possible. Therefore, I’ll give up some convenience. I might even pay a price, but I want to have the best security around my identity and I want to be able to play with the big boys, when it comes to encryption and authentication?"

If you're talking about specific individuals, it’s almost hopeless, because your average individual consumer doesn’t have the level of knowledge to go out and find the right solutions to protect themselves today.



We don’t really have an opportunity for those people to say, "I want to exercise security at an individual level." Jim Hietala, is there anything like that out there to get them to move towards the individual level of self-help, when it comes to high levels of security?

Hietala: Large enterprises are going to have to be responsible for the security of their information. I think there are a lot of takeaways for enterprises from this attack. If you're talking about specific individuals, it’s almost hopeless, because your average individual consumer doesn’t have the level of knowledge to go out and find the right solutions to protect themselves today.

So, I'll focus on the large enterprises. They have to do a good job of asset inventory, know where, within their identity infrastructure, they're vulnerable to this specific attack, and then be pretty agile about implementing countermeasures to prevent it. They have to have patch management that's adequate to the task of getting patches out quickly.

They need to do things like looking at the traffic leaving their network to see if people are already in their infrastructure. These Trojans leave traces of themselves, when they ship information out of an organization. When people really understand what happened in this attack, they can take something away, go back, look at what they are doing from a security standpoint, and tighten things up.

If you're talking about individuals putting things in the cloud, that’s a different discussion that doesn’t seem real feasible to me to get them to the point where they can secure their information today.

Centralized directory

Gardner: Jim, I was getting back to what I used to hear almost 20 years ago in the messaging space, when we first started talking about directories, that the directory is only as good as the authentication and the information and verification.

Don’t we need a centralized directory that we can bounce off these credentials and make sure that they are valid and authenticated? But, there was no central place to do that. Is it time for the government or some other agency or organization to come in and create that über directory for that large-scale global authentication capability?

Kobielus: You're talking about identity systems, with a web of trust, PKI and so forth. We've been talking about that for years. About five years ago, I was with a company that was trying to build federated cross-industry identity management for aerospace and defense, one North Atlantic industry, and even that was frightfully complicated. It probably still hasn’t gotten off the ground.

Imagine creating a similar federated directory with all the stronger authentication and encryption and so forth for all industries within the US. Especially consider worldwide. It’s not going to happen. It’s just a huge engineering nightmare, putting together the trust relationships and working out all the interchange and interoperability issues. It’s just overkill. It’s just much more trouble than it’s worth.

Gardner: Too much federation. But what if there are only a handful of major cloud providers? Maybe it’s Google, Yahoo, Amazon, and Microsoft -- and I've just thrown those out. It could be a number of others. They might have the market heft or the technological wherewithal to enforce and deliver such an authentication and federated directory into existence.

I don’t see the people running cloud-computing companies being radically different from the people that run phone companies . . .



Is anybody thinking like I am, that maybe cloud computing is different, that we can start to actually use the scale of these cloud providers to accomplish these large security requirements?

Dortch: You know, Dana, people change a lot more slowly than technology does. Just a few short months ago, a lot of us were outraged, when it turned out that a handful of major telephone service providers had apparently been giving information to the government without the knowledge or consent of the subscribers whose information was manipulated. At least, that's what the published report seemed to indicate.

I don’t see the people running cloud-computing companies being radically different from the people that run phone companies, and I don’t see them being, a priori, any less subject to influence by their own governments, bribes, threats, or anything else than the people who run the phone companies. I think that’s a good idea but I think it’s fraught with the same level of peril.

Kobielus: In fact, look at the last nine years since 9/11 and you can see in all the articles and stories how telcos have just bent over backwards to allow the Feds to come in and survey their users and subscribers and to abscond with call detail records to monitor terrorist and other people's calling patterns, quite often not even using a search warrant. In other words, it's exactly what he said. How can you trust the carrier to safeguard our privacy, when they so easily succumb to such government pressure?

Gardner: So, these are very big issues that will impact us all as individuals and citizens within our national interests, as well as our companies. Yet, no one seems to have a good sense -- and, there are some very bright people on the line today, of how to even go about defining the problem, never mind solving it.

Identity registrars

Kobielus: Dana, there is another point you raised about, why we don't just let the providers become sort of the über identity management registrars and then set a rate among themselves.

Remember about 10 years ago -- I'm getting old, I can remember back 10 or more years -- Microsoft with its MSN Passport fiasco? Microsoft was saying, "We want to be everybody's identity management hub." Then, the huge thing that was raised about it was, "Microsoft wants to control our identities." Then, things like Liberty Alliance and all the others sprung up to say, "No, no, it must be a centralized and better way, so no one company can control all of our online identities."

That whole passport idea was kind of cool in some ways, but was just shot down completely and definitively, because the culture just said, "No, we cannot allow one group to have that much power."

Gardner: They typically didn't trust Microsoft at that point, when it was at perhaps the apex of its power, right?

Kobielus: Exactly. Now, Google is at the apex of their power. Would we trust Google in the same capacity? Look at China. They will become probably the largest economy in the world, in the next 25 years. Can we trust them? No, of course not.

When you have too much power concentrated in one place, people naturally sort of revolt.



When you have too much power concentrated in one place, people naturally sort of revolt. "No, wait, wait. I don't want to give them any more powers than they already have. Let's rethink this whole 'give them control of my identity' thing."

Dortch: It was the desire to get away from too much centralized control that led to the invention of the PC in the first place. It's it's important to keep that in mind in this context.

Gardner: So, if you truly want to be safe, you should just turn off your PC and start sending out mail at 44 cents a pop.

Kobielus: And, then you're not safe from Anthrax, you know.

Gardner: Let's go around our panel. We’re almost out of time. I’d be interested now in hearing some predictions about what you think is going to happen next. We've done a great job at defining the scope, depth, and complexity of this problem set, a very complex undertaking. But, it seems like it's not something that's going to go away. What do you think is going to happen next, Jim Kobielus?

Kobielus: I don't think Google is going to leave China. I even saw a headline today. I think it said that they were going to stay in China and somehow try to work it out with the PRC. I don't know where that's going, but fundamentally Google is a business and has a "don't do evil" philosophy. They're going to continue to qualify evil down to those things that don't actually align with their business interest.

In other words, they're going to stay. There's going to be a lot of wariness now to entrust Google's China operation with a whole lot of your IT -- "you" as a corporation -- and your data. There will be that wariness.

Preferred platforms

Other cloud providers will be setting up shop or hosting in other nations that are more respectful of IP, other nations that may not be launching corporate or governmental espionage at US headquartered properties in China. Those nations will become the preferred supernational cloud hosting platforms for the world.

I can't really say who those nations might be, but you know what, Switzerland always sort of stands out. They're still neutral after all these years. You've got to hand that to them. I trust them.

Gardner: Jason Bloomberg, what do you think is going to happening next?

Bloomberg: In the short-term, the noise is going to die down or going to go back to business as usual. The security is going to need to improve, but so are hacks from the bad guys. It's going to continue, until there is the next big attack. And the question is, "What's it going to be and how big is it going to be?"

We're still waiting for that game changer. I don't think this is a game changer. It's just a way to skirmish. But, if a hacker is able to bring down the internet, for example, targeting the DNS infrastructure to the point that the entire thing collapses, that’s something that could wake people up to say, "We really have to get a handle on this and come up with a better approach."

Gardner: That's mass vandalism. That doesn't really suit the purposes of some of the types of folks we are talking about. They don't want to bring the Internet down. They simply want to get an advantage over their competitors.

From our perspective, we're starting to see more awareness at higher levels in governments that the threats and issues here are real.



Bloomberg: Well, it really depends. We don't know who the bad guys are and what they’re trying to do. There's no single perspective. There's no single bad guy out there with a single agenda. We just don't know. We don't know what the agendas are.

Gardner: We don't know whether we've a level playing field or not?

Bloomberg: We can count on it not being leveled.

Gardner: Right. Jim Hietala, what do you see as some of the short- or medium-term next steps?

Hietala: From our perspective, we're starting to see more awareness at higher levels in governments that the threats and issues here are real. They’re here today. They seem to be state sponsored, and they're something that needs to be paid attention to.

Secretary of State Clinton gave a speech just today, where she talked specifically about this attack, but also talked about the need for nations to band together to address the problem. I don't know what that looks like at this point, but I think that the fact that people at that level are talking about the problem is good for the industry and good for the outlook for solutions that are important in the future.

Gardner: So, perhaps a free world versus an unfree world, at least in cyber terms, and perhaps the free world would have an advantage, or maybe the unfree world would have an advantage. It's hard to say.

Hietala: I'd agree it's hard to say, but the fact that those discussions going on is positive.

Gardner: Elinor Mills, any sense of where things are going?

Leading the way

Mills: I'm horrible at predictions, but I'll just throw this out. I think Google is going to get out of China and try and lead some kind of US corporate effort or be a role model to try to do business in a more ethical way, without having to compromise and censor.

There will be a divergence that you'll see. China and other countries may be pushed more towards limiting and creating their own sort of channel that's government filtered. I think the battle is just going to get bigger. We're going to have more fights on this front, but I think that Google may lead the way.

Gardner: Very good. Michael Dortch, where do you see it going?

Dortch: Elinor is at least partly right. Especially, if Google leaves China, Baidu's going to rise up as being the government approved version of Google for China and its localities. The very next thing Google will do is forge a strong working relationship as it possibly can with Baidu. You might see that model replicated across multiple countries in the world.

In the meantime though, something that -- if I remember correctly -- Astrodienst said almost 30 years ago is important to remember. Privacy is fungible. It's like currency. You're going to see individuals, small businesses, and individual corporate entities forging negotiations, deals, relationships, and accommodation that treat privacy and security as currency.

If it costs me a little bit more to do business here, I'm going to think seriously about it. Every once in a while, I'm going to swallow hard and pay the piper.

Google made itself into a figurehead of representing what a free enterprise approach could do. It's not state sponsored or nationalistic. It's corporate sponsored.



Gardner: Great. I'm going to throw my two cents as well. This boils down to almost two giant systems or schools of thought that are now colliding at a new point. They've collided at different points in the past on physical sovereignty, military sovereignty, and economic sovereignty. The competition is between what we might call free enterprise based systems and state sponsorship through centralized control systems.

Free enterprise won, when it came to the cold war, but it's hard to say what's going to happen in the economic environment where China is a little different beast. It's state sponsored and it's also taking advantage of free enterprise, but it's very choosy about what it allows for either one of those systems to do or to dominate.

When you look at the Google, Google made itself into a figurehead of representing what a free enterprise approach could do. It's not state sponsored or nationalistic. It's corporate sponsored. So, it would be interesting to see who has the better technology, who has the better financial resources, and ultimately who has the organizational wherewithal to manifest their goals online that wins out in the marketplace.

If an organized effort is better at doing this than a corporate one, well then they might dominate. But so far, we've seen a very complex system that the marketplace -- with choice, and shedding light and transparency on activities -- ultimately allows for free enterprise predominance. They can do it better, faster, cheaper and that it will ultimately win.

I think, we're really on the cusp here of a new level of competition, but not between countries or even alliances, but really between systems. The free enterprise system versus the state-sponsored or the centralized or the controlled system. It should be very interesting.

I want to thank our guests for today’s discussion. Jim Kobielus, senior analyst at Forrester Research. Thanks, Jim.

Kobielus: Sure.

Gardner: Jason Bloomberg, managing partner at ZapThink. Great to have you.

Bloomberg: My pleasure.

Gardner: Jim Hietala, Vice President for Security at The Open Group. Thank you, Jim.

Hietala: Thank you, Dana.

Gardner: And thank you for joining us, Elinor Mills, senior writer at CNET.

Mills: My pleasure.

Gardner: Lastly, I appreciate your debut here today, Michael Dortch, Director of Research at Focus.

Dortch: It was great fun, and I hope I passed the audition.

Gardner: You did.

Gardner: I also want to thank our charter sponsor for supporting today’s BriefingsDirect, Analyst Insights Edition, that's Active Endpoints. This is Dana Gardner, principal analyst at Interarbor Solutions. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Charter Sponsor: Active Endpoints.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Edited transcript of a BriefingsDirect Analyst Insights Edition podcast, Volume 50, on what the fallout is likely to be after Google's threat to leave China in the wake of security breaches. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in:

Tuesday, January 05, 2010

Game-Changing Architectural Advances Take Data Analytics to New Performance Heights

Transcript of a BriefingsDirect podcast on how new advances in collocating applications with data architecturally provides analytics performance breakthroughs.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Aster Data Systems.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on how new architectures for data and logic processing are ushering in a game-changing era of advanced analytics.

These new approaches support massive data sets to produce powerful insights and analysis, yet with unprecedented price-performance. As we enter 2010, enterprises are including more forms of diverse data into their business intelligence (BI) activities. They're also diversifying the types of analysis that they expect from these investments.

We're also seeing more kinds and sizes of companies and government agencies seeking to deliver ever more data-driven analysis for their employees, partners, users, and citizens. It boils down to giving more communities of participants what they need to excel at whatever they're doing. By putting analytics into the hands of more decision makers, huge productivity wins across entire economies become far more likely.

But such improvements won’t happen if the data can't effectively reach the application's logic, if the systems can't handle the massive processing scale involved, or the total costs and complexity are too high.

In this discussion we examine how convergence of data and logic, of parallelism and MapReduce -- and of a hunger for precise analysis with a flood of raw new data -- all are setting the stage for powerful advanced analytics outcomes.

Here to help us learn how to attain advanced analytics and to uncover the benefits from these new architectural activities for ubiquitous BI, are Jim Kobielus, senior analyst at Forrester Research. Welcome, Jim.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: We're also joined by Sharmila Mulligan, executive vice president of marketing at Aster Data. Welcome, Sharmila.

Sharmila Mulligan: Thank you. Hello, everyone.

Gardner: Jim, let me start with you. We're looking at a shift now, as I have mentioned, in response to oceans of data and the need for analysis across different types of applications and activities. What needs to change? The demands are there, but what needs to change in terms of how we provide the solution around these advanced analytical undertakings?

Rethinking platforms

Kobielus: First, Dana, we need to rethink the platforms with which we're doing analytical processing. Data mining is traditionally thought of as being the core of advanced analytics. Generally, you pull data from various sources into an analytical data mart.

That analytical data mart is usually on a database that's specific to a given predictive modeling project, let's say a customer analytics project. It may be a very fast server with a lot of compute power for a single server, but quite often what we call the analytical data mart is not the highest performance database you have in your company. Usually, that high performance database is your data warehouse.

As you build larger and more complex predictive models -- and you have a broad range of models and a broad range of statisticians and others building, scoring, and preparing data for these models -- you quickly run into resource constraints on your existing data-mining platform, really. So, you have to look for where you can find the CPU power, the data storage, and the I/O bandwidth to scale up your predictive modeling efforts. That's the number one thing. The data warehouse is the likely suspect.

Also, you need to think about the fact that these oceans of data need to be prepared, transformed, cleansed, meshed, merged, and so forth before they can be brought into your analytical data mart for data mining and the like.

Quite frankly, the people who do predictive modeling are not specialists at data preparation.



Quite frankly, the people who do predictive modeling are not specialists at data preparation. They have to learn it and they sometimes get very good at it, but they have to spend a lot of time on data mining projects, involved in the grunt work of getting data in the right format just to begin to develop the models.

As you start to rethink your whole advanced analytics environment, you have to think through how you can automate to a greater degree all these data preparation, data loading chores, so that the advanced analytics specialists can do what they're supposed to do, which is build and tune models of various problem spaces. Those are key challenges that we face.

But, there is one third challenge, which is advanced analytics producing predictive models. Those predictive models increasingly are deployed in-line to transactional applications, like your call center, to provide some basic logic and rules that will drive such important functions as "next best offer" being made to customers based on a broad variety of historical and current information.

How do you inject predictive logic into your transactional applications in a fairly seamless way? You have to think through that, because, right now, quite often analytical data models, predictive models, in many ways are not built for optimal embedding within your transactional applications. You have to think through how to converge all these analytical models with the transactional logic that drives your business.

Gardner: Okay. Sharmila, are your users or the people that you talk to in the market aware that this shift is under way? Do they recognize that the same old way of doing things is not going to sustain them going forward?

New data platform

Mulligan: What we see with customers is that the advanced analytics needs and the new generation of analytics that they are trying to do is driving the need for a new data platform.

Previously, the choice of a data management platform was based primarily on price-performance, being able to effectively store lots of data, and get very good performance out of those systems. What we're seeing right now is that, although price performance continues to be a critical factor, it's not necessarily the only factor or the primary thing driving their need for a new platform.

What's driving the need now, and one of the most important criteria in the selection process, is the ability of this new platform to be able to support very advanced analytics.

Customers are very precise in terms of the type of analytics that they want to do. So, it's not that a vendor needs to tell them what they are missing. They are very clear on the type of data analysis they want to do, the granularity of data analysis, the volume of data that they want to be able to analyze, and the speed that they expect when they analyze that data.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.



They are very clear on what their requirements are, and those requirements are coming from the top. Those new requirements, as it relates to data analysis and advanced analytics, are driving the selection process for a new data management platform.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.

Gardner: Let's take a pause and see if we can't define these advanced analytics a little better. Jim, what do we mean nowadays when we say "advanced analytics?"

Kobielus: Different people have their definitions, but I'll give you Forrester's definition, because I'm with Forrester. And, it makes sense to break it down into basic analytics versus advanced analytics.

What is basic analytics? Well, that's BI. It's the core of BI that you build your decision support environment on. That's reporting, query, online analytical processing, dashboarding, and so forth. It's fairly clear what's in the core scope of BI.

Traditional basic analytics is all about analytics against deep historical datasets and being able to answer questions about the past, including the past up to the last five seconds. It's the past that's the core focus of basic analytics.

What's likely to happen

Advanced analytics is focused on how to answer questions about the future. It's what's likely to happen -- forecast, trend, what-if analysis -- as well as what I like to call the deep present, really current streams for complex event processing. What's streaming in now? And how can you analyze the great gushing streams of information that are emanating from all your applications, your workflows, and from social networks?

Advanced analytics is all about answering future-oriented, proactive, or predictive questions, as well as current streaming, real-time questions about what's going on now. Advanced analytics leverages the same core features that you find in basic analytics -- all the reports, visualizations, and dashboarding -- but then takes it several steps further.

First and foremost, it's all about amassing a data warehouse or a data mart full of structured and unstructured information and being able to do both data mining against the structured information, and text analytics or content analytics against the unstructured content.

Then, in the unstructured content, it's being able to do some important things, like natural language processing to look for entities and relationships and sentiments and the voice of the customer, so you can then extrapolate or predict what might happen in the future. What might happen if you make a given offer to a given customer at a given time? How are they likely to respond? Are they likely to jump to the competition? Are they likely to purchase whatever you're offering? All those kinds of questions.

The query and reporting aspect continues to be very important, but the difference now is that the size of the data set is far larger than what the customer has been running with before.



Gardner: Sharmila, do you have anything to offer further on defining advanced analytics in this market?

Mulligan: Before I go into advanced analytics, I'd like to add to what Jim just talked about on basic analytics. The query and reporting aspect continues to be very important, but the difference now is that the size of the data set is far larger than what the customer has been running with before.

What you've got is a situation where they want to be able to do more scalable reporting on massive data sets with very, very fast response times. On the reporting side, in terms of the end result to the customer, it is similar to the type of report they are trying to achieve, but the difference is that the quantity of data that they're trying to get at, and the amount of data that these reports are filling up is far greater than what they had before.

That's what's driving a need for a new platform underneath some of the preexisting BI tools that are, in themselves, good at reporting, but what the BI tools need is a data platform beneath them that allows them to do more scalable reporting than you could do before.

Kobielus: I just want to underline that, Sharmila. What Forrester is seeing is that, although the average data warehouse today is in the 1-10 terabyte range for most companies, we foresee the average warehouse size going, in the middle of the coming decade, into the hundreds of terabytes.

In 10 years or so, we think it's possible, and increasingly likely, that petabyte-scale data warehouses or content warehouses will become common. It's all about unstructured information, deep history, and historical information. A lot of trends are pushing enterprises in the direction of big data.

Managing big data

Mulligan: Absolutely. That is obviously the big topic here, which is, how do you manage big data? And, big data could be structured or it could be unstructured. How do you assimilate all this in one platform and then be able to run advanced analytics on this very big data set?

Going back to what Jim discussed on advanced analytics, we see two big themes. One is
the real-time nature of what our customers want to do. There are particular use cases, where what they need is to be able to analyze this data in near real-time, because that's critical to being able to get the insights that they're looking for.

Fraud analytics is a good example of that. Customers have been able to do fraud analytics, but they're running fraud checks after the fact and discovering where fraud took place after the event has happened. Then, they have to go back and recover from that situation. Now, what customers want, is to be able to run fraud analytics in near real-time, so they can catch fraud while it's happening.

What you see is everything from cases in financial services companies related to product fraud, as well as, for example, online gaming sites, where users of the system are collaborating on the site and trying to commit fraud. Those type of scenarios demand a system that can return the fraud analysis data near real-time, so it can block these users from conducting fraud while it's happening.

The other big thing we see is the predictive nature of what customers are trying to do. Jim talked about predictive analytics and modeling analytics. Again, that's a big area that we see massive new opportunity and a lot of new demand. What customers are trying to do there is look at their own customer base to be able to analyze data, so that they can predict trends in the future.

. . . The other big theme we see is the push toward analysis that's really more near real time than what they were able to do before.



For example, what are the buying trends going to be, let's say at Christmas, for consumers who live in a certain area? There is a lot around behavior analysis. In the telco space, we see a lot of deep analysis around trying to model behavior of customers on voice usage of their mobile devices versus data usage.

By understanding some of these patterns and the behavior of the users in more depth, these organizations are now able to better service their customers and offer them new product offerings, new packages, and a higher level or personalization, by understanding the behavior of their customers in more depth.

Predictive analytics is a term that's existed for a while, and is something that customers have been doing, but it's really reaching new levels in terms of the amount of data that they're trying to analyze for predictive analytics, and in the granularity of the analytics itself in being able to deliver deeper predictive insight and models.

As I said, the other big theme we see is the push toward analysis that's really more near real time than what they were able to do before. This is not a trivial thing to do when, it comes to very large data sets, because what you are asking for is the ability to get very, very quick response times and incredibly high performance on terabytes and terabytes of data to be able to get these kind of results in real-time.

Gardner: Jim, these examples that Sharmila has shared aren't just rounding errors. This isn't a movement toward higher efficiency. These are game changers. These are going to make or break your business. This is going to allow you to adjust to a changing economy and to shifting preferences by your customers. We're talking about business fundamentals here.

Social network analysis

Kobielus: We certainly are. Sharmila was discussing behavioral analysis, for example, and talking about carrier services. Let's look at what's going to be a true game changer, not just for business, but for the global society. It's a thing called social network analysis.

It's predictive models, fundamentally, but it's predictive models that are applied to analyzing the behaviors of networks of people on the web, on the Internet, Facebook, and Twitter, in your company, and in various social network groupings, to determine classification and clustering of people around common affinities, buying patterns, interests, and so forth.

As social networks weave their way into not just our consumer lives, but our work lives, our life lives, social network analysis -- leveraging all the core advanced analytics of data mining and text analytics -- will take the place of the focus group. In an online world, everything is virtual. As a company, you're not going to be able, in any meaningful way, to bring together your users into a single room and ask them what they want you to do or provide for them.

What you're going to do, though, is listen to them. You're going to listen to all their tweets and their Facebook updates and you're going to look at their interactions online through your portal and your call center. Then, you're going to take all that huge stream of event information -- we're talking about complex event processing (CEP) -- you're going to bring it into your data warehousing grid or cloud.

You're also going to bring historical information on those customers and their needs. You're going to apply various social network behavioral analytics models to it to cluster people into the categories that make us all kind of squirm when we hear them, things like yuppie and Generation X and so forth. Professionals in the behavioral or marketing world are very good at creating segmentation of customers, based on a broad range of patterns.

They can get a sense of how a product or service is being perceived in real-time, so that the the provider of that product or service can then turn around and tweak that marketing campaign . . .



Social network analysis becomes more powerful as you bring more history into it -- last year, two years, five years, 10 years worth of interactions -- to get a sense for how people will likely respond likely to new offers, bundles, packages, campaigns, and programs that are thrown at them through social networks.

It comes down to things like Sharmila was getting at, simple things in marketing and sales, such as a Hollywood studio determining how a movie is being perceived by the marketplace, by people who go out to the theater and then come out and start tweeting, or even tweeting while they are in the theater -- "Oh, this movie is terrible" or "This movie rocks."

They can get a sense of how a product or service is being perceived in real-time, so that the the provider of that product or service can then turn around and tweak that marketing campaign, the pricing, and incentives in real-time to maximize the yield, the revenue, or profit of that event or product. That is seriously powerful and that's what big data architectures allow you to do.

If you can push not just the analytic models, but to some degree bring transactional applications, such as workflow, into this environment to be triggered by all of the data being developed or being sifted by these models, that is very powerful.

Gardner: We know that things are shifting and changing. We know that we want to get access to the data and analytics. And, we know what powerful things those analytics can do for us. Now, we need to look at how we get there and what's in place that prevents us.

Let's look at this architecture. I'm looking into MapReduce more and more. I am even hearing that people are starting to write MapReduce into their requests for proposals (RFPs), as they're looking to expand and improve their situation. Sharmila, what's wrong with the current environment and why do we need to move into something a bit different?

Moving the data

Mulligan: One of the biggest issues that the preexisting data pipeline faces is that the data lives in a repository that's removed from where the analytics take place. Today, with the existing solutions, you need to move terabytes and terabytes of data through the data pipeline to the analytics application, before you can do your analysis.

There's a fundamental issue here. You can't move boulders and boulders of data to an application. It's too slow, it's too cumbersome, and you're not factoring in all your fresh data in your analysis, because of the latency involved.

One of the biggest shifts is that we need to bring the analytics logic close to the data itself. Having it live in a completely different tier, separate from where the data lives, is problematic. This is not a price-performance issue in itself. It is a massive architectural shift that requires bringing analytics logic to the data itself, so that data is collocated with the analytics itself.

MapReduce, which you brought up earlier, plays a critical role in this. It is a very powerful technology for advanced analytics and it brings capabilities like parallelization to an application, which then allows for very high-performance scalability.

What we see in the market these days are terms like "in-database analytics," "applications inside data," and all this is really talking about the same thing. It's the notion of bringing analytics logic to the data itself.

One of the biggest shifts is that we need to bring the analytics logic close to the data itself.



I'll let Jim add a lot more to that since he has developed a lot of expertise in this area.

Gardner: Jim, are we in a perfect world here, where we can take the existing BI applications and apply them to this new architecture of joining logic and data in proximity, or do we have to come up with whole new applications in order to enjoy this architectural benefit?

Kobielus: Let me articulate in a little bit more detail what MapReduce is and is not. MapReduce is, among other things, a set of extensions to SQL -- SQL/MapReduce (SQL/MR). So, you can build advanced analytic logic using SQL/MR that can essentially do the data prep, the data transformations, the regression analyses, the scoring, and so forth, against both structured data in your relational databases and unstructured data, such as content that you may source from RSS feeds and the like.

To the extent that we always, or for a very long time, have been programming database applications and accessing the data through standard SQL, SQL/MR isn't radically different from how BI applications have traditionally been written.

Maximum parallelization

But, these are extensions and they are extensions that are geared towards enabling maximum parallelization of these analytic processes, so that these processes can then be pushed out and be executed, not just in-databases, but in file systems, such as the Hadoop Distributed File System, or in cloud data warehouses.

MapReduce, as a programming model and as a language, in many ways, is agnostic as to the underlying analytic database, file system, or cloud environment where the information, as a whole lives, and how it's processed.

But no, you can't take your existing BI applications, in terms of the reporting, query, dashboarding, and the like, transparently move them, and use MapReduce without a whole lot of rewriting of these applications.

You can't just port your existing BI applications to MapReduce and database analytics. You're going to have to do some conversions, and you're going to have to rewrite your applications to take advantage of the parallelism that SQL/MR enables.

MapReduce, in many ways, is geared not so much for basic analytics. It's geared for advanced analytics. It's data mining and text mining. In many ways, MapReduce is the first open framework that the industry has ever had for programming the logic for both data mining and text mining in a seamless way, so that those two types of advanced analytic applications can live and breathe and access a common pool of complex data.

In the marriage of SQL with MapReduce, the real intent is to bring the power of MapReduce to the enterprise, so that SQL programmers can now use that technology.



MapReduce is an open standard that Aster clearly supports, as do a number of other database and data warehousing vendors. In the coming year and the coming decade, MapReduce and Hadoop -- and I won't go to town on what Hadoop is -- will become fairly ubiquitous within the analytics arena. And, that’s a good thing.

So, any advanced analytic logic that you build in one tool, in theory, you can deploy and have it optimized for execution in any MapReduce-enabled platform. That’s the promise. It’s not there yet. There are a lot of glitches, but that’s the strong promise.

Mulligan: I'd like to add a little bit to that Dana. In the marriage of SQL with MapReduce, the real intent is to bring the power of MapReduce to the enterprise, so that SQL programmers can now use that technology. MapReduce alone does require some sophistication in terms of programming skills to be able to utilize it. You may typically find that skill set in Web 2.0 companies, but often you don’t find developers who can work with that in the enterprise.

What you do find in enterprise organizations is that there are people who are very proficient at SQL. By bringing SQL together with MapReduce what enterprise organizations have is the familiarity of SQL and the ease of using SQL, but with the power of MapReduce analytics underneath that. So, it’s really letting SQL programmers leverage skills they already have, but to be able to use MapReduce for analytics.

Important marriage

Over time, of course, it’s possible that there will be more expertise developed within enterprise organizations to use MapReduce natively, but at this time and, we think, in the next couple of years, the SQL/MapReduce marriage is going to be very important to help bring MapReduce across the enterprise.

Hadoop, itself, obviously is an interesting platform too in being able to store lots of data cost effectively. However, often customers will also want some of the other characteristics of a data warehouse, like workload management, failover, backup recovery, etc., that the technology may not necessarily provide.

MapReduce right now, available with massive parallel processing (MPP), the new generation of MPP data warehouse is such a vast data solution, does bring kind of the best of both worlds. It brings what companies need in terms of the enterprise data warehouse capabilities. It lets you put application logic near data, as we talked about earlier. And, it brings MapReduce, but through the SQL/MapReduce framework, which really primarily is designed to ease adoption and use of MapReduce within the enterprise.

Gardner: Jim, we are on a journey. It’s going to be several years before we are getting to where we want to go, but there is more maturity in some areas than others. And, there is an opportunity to take technologies that are available now and do some real strong business outcomes and produce those outcomes.

Give me a sense of where you see the maturity of the architecture, of the SQL, and the tools and making these technologies converge? Who is mature? How is this shaking out a little bit?

Kobielus: Maturity is a best practice, in this case in-database analytics. As I said, it’s widely supported through proprietary approaches by many vendors.

In terms of the maturity, it's judged by adoption of an open industry framework with cross-vendor interoperability.



In terms of the maturity, it's judged by adoption of an open industry framework with cross-vendor interoperability. it's not mature yet, in terms of MapReduce and Hadoop. There are pioneering vendors like Aster, but there are a significant number of established big data warehousing vendors that have varying degrees of support now or in the near future for these frameworks. We're seeing strong indications. In fact, Teradata already is rolling out MapReduce and Hadoop support in their data warehousing offerings.

We're not yet seeing a big push from Oracle, or from Microsoft for that matter, in the direction of support for MapReduce or Hadoop, but we at Forrester believe that both of those vendors, in particular, will come around in 2010 with greater support.

IBM has made significant progress with its support for Hadoop and MapReduce, but it hasn’t yet been fully integrated into that particular vendor's platform.

Looking to 2010, 2011

If we look at a broad range of other data warehousing vendors like Sybase, Greenplum, and others, most vendors have it on their roadmap. To some degree, various vendors have these frameworks in in development right now. I think 2010 and 2011 are the years when most of the data warehousing and also data mining vendors will begin to provide mature, interoperable implementations of these standards.

There is a growing realization in the industry that advanced analytics is more than just being able to mine information at rest, which is what MapReduce and Hadoop are geared to doing. You also need to be able to mine and do predictive analytics against data in motion. That’s CEP. MapReduce and Hadoop are not really geared to CEP applications of predictive modeling.

There needs to be, and there will be over the next five years or so, a push in the industry to embed MapReduce and Hadoop. There are few vendors that are showing some progress toward CEP predictive modeling, but it’s not widely supported yet, and it’s in proprietary approaches.

In this coming decade, we're going to see predictive logic deployed into all application environments, be they databases, clouds, distributed file systems, CEP environments, business process management (BPM) systems, and the like. Open frameworks will be used and developed under more of a service-oriented architecture (SOA) umbrella, to enable predictive logic that’s built in any tool to be deployed eventually into any production, transaction, or analytic environment.

It will take at least 3 to 10 years for a really mature interoperability framework to be developed, for industry to adopt it, and for the interoperability issues to be worked out.



It will take at least 3 to 10 years for a really mature interoperability framework to be developed, for industry to adopt it, and for the interoperability issues to be worked out. It’s critically important that everybody recognizes that big data, at rest and in motion, needs to be processed by powerful predictive models that can be deployed into the full range of transactional applications, which is where the convergence of big data, analytics, and transactions come in.

Data warehouses, as the core of your analytics environment, need to evolve to become in their own right application servers that can handle both the analytic applications or traditional data warehousing in BI and data mining, as well as the transactional logic, and really handle it all with full security and workload isolation, failover, and so forth in a way that’s seamless.

I'm really excited, for example, by what Aster has rolled out with their latest generation, 4.0 of the Data-Application Server. I see a little bit of progress by Oracle on the Exadata V2. I'm looking forward to seeing if other vendors follow suit and provide a cloud-based platform for a broad range of transactional analytics.

Gardner: Sharmila, Jim has painted a very nice picture of where he expects things to go. He mentioned Aster Data 4.0. Tell us a little bit about that, and where you see the stepping stones lining up.

Mulligan: As I mentioned earlier, one of the biggest requirements in order to be able to do very advanced analytics on terabyte- and petabyte-level data sets, is to bring the application logic to the data itself. Earlier, I described why you need to do this. You want to eliminate as much data movement as possible, and you want to be able to do this analysis in as near real-time as possible.

What we did in Aster Data 4.0 is just that. We're allowing companies to push their analytics applications inside of Aster’s MPP database, where now you can run your application logic next to the data itself, so they are both collocated in the same system. By doing so, you've eliminated all the data movement. What that gives you is very, very quick and efficient access to data, which is what's required in some of these advanced analytics application examples we talked about.

Pushing the code

What kind of applications can you push down into the system? It can be any app written in Java, C, C++, Perl, Python, .NET. It could be an existing custom application that an organization has written and that they need to be able to scale to work on much larger data sets. That code can be pushed down into the apps database.

It could be a new application that a customer is looking to write to do a level of analysis that they could not do before, like real-time fraud analytics, or very deep customer behavior analysis. If you're trying to deliver these new generations of advanced analytics apps, you would write that application in the programming language of your choice.

You would push that application down into the Aster system, all your data would live inside of the Aster MPP database, and the application would run inside of the same system collocated with the data.

In addition to that, it could be a packaged application. So, it could be an application like software as a service (SaaS) that you want to scale to be able to analyze very large data sets. So, you could push a packaged application inside the system as well.

One of the fundamental things that we leverage to allow you to do more powerful analytics with these applications is MapReduce. You don’t have to MapReduce enable an application when you push it down into the apps system, but you could choose to and, by doing so, you automatically parallelize the application, which gives you very high performance and scalability when it comes to accessing large datasets. You also then leverage some of the analytics capabilities of MapReduce that are not necessary inherent in something like SQL.

That's a very attractive feature, because fundamentally the data warehousing cloud is an analytic application server.



The key components of 4.0 drive to where it's providing you a platform that can efficiently and cost effectively store massive amounts of data, plus give you a platform that allows you to do very advanced and sophisticated analytics. To run through those key things that we've done in 4.0, is first, the ability to push applications inside the system, so apps are collocated with the data.

We also offer SQL/MapReduce as the interface. Business analysts who are working with this application on a regular basis don’t have to learn MapReduce. They can use SQL/MR and leverage their existing SQL skills to work with that app. So, it makes it very easy for any number of business analysts in the organization to leverage their preexisting SQL skills and work with this app that's pushed down into the system.

Finally, in order to support the ability to run application inside a data, which as I said earlier is nontrivial, we added fundamental new capabilities like Dynamic Mix Workload Management. Workload Management in the Aster system works not just on data queries, but on the application processes as well, so you can balance workloads when you have a system that's managing data and applications.

Kobielus: Sharmila, I think the greatest feature of the 4.0 is simply the ability to run predictive models developed in SaaS or other tools in their native code without converting them necessarily to SQL/MR. That means that your customers can then leverage that huge installed piece of intellectual property or pool of intellectual property, all those models, bring it in, and execute it natively within your distributed grid or cloud, as a way of avoiding having to do that rewrite. Or, if they wish, they can migrate them or convert them over to SQL/MR. It's up to them.

That's a very attractive feature, because fundamentally the data warehousing cloud is an analytic application server. Essentially, you want that ability to be able to run disparate legacy models in parallel. That's just a feature that needs to be adopted by the industry as a whole.

The customer decides

Mulligan: Absolutely. I do want to clarify that the Aster 4.0 solution can be deployed in the cloud, or it can be installed in a standard implementation on-premise, or it could be adopted in an appliance mode. We support all three. It's up to the customer which of those deployment models they need or prefer.

To talk in a little bit more detail about what Jim is referring to, the ability to take an existing app, have to do absolutely no rewrite, and push that application down is, of course, very powerful to customers. It means that they can immediately take an analytics app they already have and have it operate on much larger data sets by simply taking that code and pushing it down.

That can be done literally within a day or two. You get the Aster system, you install it, and then, by the second day, you could be pushing your application down.

If you choose to leverage the MapReduce analytics capabilities, then as I said earlier, you would MapReduce enable an app. This simply means you take your existing application and, again, you don’t have to do any rewrite of that logic. You just add MapReduce functions to it and, by doing so, you have now MapReduce-enabled it. Then, you push it down and you have SQL/MR as an interface to that app.

The process of MapReduce enabling an app also is very simple. It's a couple of days process. This is not something that takes weeks and weeks to do. It literally can be done in a couple of days.

It means that they can immediately take an analytics app they already have and have it operate on much larger data sets by simply taking that code and pushing it down.



We had a retailer recently who took an existing app that they had already written, a new type of analytics application that they wanted to deploy. They simply added MapReduce capabilities to it and pushed it down into the Aster system, and it's now operating on very, very large data sets, and performing analytics that they weren't able to originally do.

The ease of application push down and the ease of MapReduce enabling is definitely key to what we have done in 4.0, and it allows companies to realize the value of this new type of platform right away.

Gardner: I know it's fairly early in the roll out. Do you have any sense of metrics, from some of these users? What do they get back? We talked earlier in the examples about what could be done and what should be done nowadays with analysis. Do you have any sense of what they have able to do with 4.0?

Reducing processing times

Mulligan: For example, we have talked about customers like comScore who are processing 1.6 billion rows of data on a regular basis, and their data volumes continue to grow. They have many business analysts who operate the system and run reports on a daily basis, and they are able to get results very quickly on a large data set.

We have customers who have gone from 5-10 minute processing times on their data set, to 5 seconds, as a result of putting the application inside of the system.

We have had fraud applications that would take 60-90 minutes to run in the traditional approach, where the app was running outside the database, and now those applications run in 60-90 seconds.

Literally, by collocating your application logic next to the data itself, you can see that you are immediately able to go from many minutes of processing time, down to seconds, because you have eliminated all the data movement altogether. You don’t have to move terabytes of data.

Add to that the fact that you can now access terabyte-sized data sets, versus what customers have traditionally been left with, which is only the ability to process data sets in the order of several tens of gigabytes or hundreds of gigabytes. Now, we have telcos, for example, processing four- or five-terabyte data sets with very fast response time.

We're talking about a collision of two cultures, or more than two cultures. Data warehousing professionals and data mining professionals live in different worlds, as it were.



It's the volume of data, the speed, the acceleration, and response time that really provide the fundamental value here. MapReduce, over and above that, allows you to bring in more analytics power.

Gardner: A final word to you, Jim Kobielus. This really is a good example of how convergence is taking place at a number of different levels. Maybe you could give us an insight into where you see convergence happening, and then we'll have to leave it there.

Kobielus: First of all, with convergence the flip side is collision. I just want to point out a few issues that enterprises and users will have to deal with, as they move toward this best practice called in-database analytics and convergence of the transactions and analytics.

We're talking about a collision of two cultures, or more than two cultures. Data warehousing professionals and data mining professionals live in different worlds, as it were. They quite often have an arm's length relationship to each other. The data warehouse traditionally is a source of data for advanced analytics.

This new approach will require a convergence, rapprochement, or a dialog to be developed between these two groups, because ultimately the data warehouse is where the data mining must live. That's going to have to take place, that coming together of the tribes. That's one of the best emerging practices that we're recommending to Forrester clients in that area.

Common framework

Also, transaction systems -- enterprise resource planning (ERP) and customer relationship management (CRM) -- and analytic systems -- BI and data warehousing -- are again two separate tribes within your company. You need to bring together these groups to work out a common framework for convergence to be able to take advantage of this powerful new architecture that Sharmila has sketched out here.

Much of your transactional logic will continue to live on source systems, the ERP, CRM, supply chain management, and the like. But, it will behoove you, as an organization, as a user to move some transactional logic, such as workflow, in particular, into the data warehousing cloud to be driven by real-time analytics and KPIs, metrics, and messages that are generated by inline models built with MapReduce, and so forth, and pushed down into the warehousing grid or cloud.

Workflow, and especially rules engines, increasingly we will find to be tightly integrated or brought into a warehousing or analytics cloud that's got inline logic.

Another key trend for convergence is that data mining and text mining are coming together as a single discipline. When you have structured and unstructured sources of information or you have unstructured information from new sources like social networks and Twitter, Facebook, and blogs, it's critically important to bring it together into your data mining environment. A key convergence also is that data at rest and data in motion are converging, and so a lot of this will be real-time event processing.

Those are the key convergence and collision avenues that we are looking at going forward.

Gardner: Very good. We've been discussing how new architectures for data and logic processing are ushering in this game-changing era of advanced analytics. We've been joined by Jim Kobielus, senior analyst at Forrester Research. Thanks so much, Jim.

Kobielus: No problem. I enjoyed it.

Gardner: Also, we have been talking with Sharmila Mulligan, executive vice president of marketing at Aster Data. Thank you Sharmila.

Mulligan: Thanks so much, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Aster Data Systems.

Transcript of a BriefingsDirect podcast on how new advances in collocating applications with data architecturally provides analytics performance breakthroughs. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in: