Monday, November 09, 2009

Part 3 of 4: Web Data Services--Here's Why Text-Based Content Access and Management Plays Crucial Role in Real-Time BI

Transcript of a sponsored BriefingsDirect podcast on information management for business intelligence, one of a series on web data services with Kapow Technologies.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Kapow Technologies.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today we present a sponsored podcast discussion on how text-based content and information from across web properties and activities are growing in importance to businesses. The need to analyze web-based text in real-time is rising to where structured data was in importance just several years ago.

Indeed, for businesses looking to do even more commerce and community building across the Web, text access and analytics forms a new mother lode of valuable insights to mine.

In Part 1 of our series on web data services with Kapow Technologies, we discussed how external data has grown in both volume and importance across the Internet, social networks, portals, and applications.

As the recession forces the need to identify and evaluate new revenue sources, businesses need to capture such web data services for their business intelligence (BI) to work better, deeper, and faster.

In Part 2, we dug even deeper into how to make the most of web data services for BI, along with the need to share those web data services inferences quickly and easily.

Now, in this podcast, Part 3 of the series, we discuss how an ecology of providers and a variety of content and data types come together in several use-case scenarios. We look specifically at how near real-time text analytics fills out a framework of web data services that can form a whole greater than the sum of the parts, and this brings about a whole new generation of BI benefits and payoffs.

Here to help explain the benefits of text analytics and their context in web data services, is Seth Grimes, principal consultant at Alta Plana Corp. Thanks for joining, Seth.

Seth Grimes: Thank you, Dana.

Gardner: We're also joined by Stefan Andreasen, co-founder and chief technology officer at Kapow Technologies. Welcome, Stefan.

Stefan Andreasen: Thank you, Dana.

Gardner: We have heard about text analytics for some time, but for many people it's been a bit complex, unwieldy, and difficult to manage in terms of volume and getting to this level of a "noise-free" text-based analytic form. Something is emerging that you can actually work with, and has now become quite important.

Let's go to you first, Seth. Tell us about this concept of noise free. What do we need to do to make text that's coming across the Web in sort of a fire hose something we can actually work with?

Difficult concept

Grimes: Dana, noise free is an interesting concept and a difficult concept, when you're dealing with text, because text is just a form of human communication. Whether it's written materials or spoken materials that have been transcribed into text, human communications are incredibly chaotic.

We have all kinds of irregularities in the way that we speak -- grammar, spelling, syntax. Putting aside any kind of irregularities, we have slang, sarcasm, abbreviations, and misspellings. Human communications are chaotic and they are full of "noise." So really getting to something that's noise-free is very ambitious.

I'm going to tell you straightforwardly, it's not possible with text analytics, if you are dealing with anything resembling the normal kinds of communications that you have with people. That's not to say that you can't aspire to a very high level of accuracy to getting the most out of the textual information that's available to you in your enterprise.

It's become an imperative to try to deal with the great volume of text -- the fire hose, as you said -- of information that's coming out. And, it's coming out in many, many different languages, not just in English, but in other languages. It's coming out 24 hours a day, 7 days a week -- not only when your business analysts are working during your business day. People are posting stuff on the web at all hours. They are sending email at all hours.

If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it.



Then, the volume of information that's coming out is huge. There are hundreds of millions of people worldwide who are on the Internet, using email, and so on. There are probably even more people who are using cell phones, text messaging, and other forms of communication.

If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it. You simply can't cope with the flood of information without them.

That's an experience that we went through in the last decades with transactional information from businesses. In order to apply BI or to get BI out of them, you have to apply automated methods with specialized software.

Fortunately, the software is now up to the job in the text analytics world. It's up to the job of making sense of the huge flood of information from all kinds of diverse sources, high volume, 24 hours a day. We're in a good place nowadays to try to make something of it with these technologies.

Gardner: Of course, we're seeing the mainstream media starts behaving more like bloggers and social media producers. We're starting to see that when events happen around the world, the first real credible information about them isn't necessarily from news organizations, but from witnesses. They might be texting. They might be using Twitter. It seems that if you want to get real-time information about what's going on, you need to be able to access those sorts of channels.

Text analytics

Grimes: That's a great point Dana, and it helps introduce the idea of the many different use-cases for text analytics. This is not only on the Web, but within the enterprise as well, and crossing the boundary between the Web and the inside of the enterprise.

Those use-cases can be the early warning of a Swine flu epidemic or other medical issues. You can be sure that there is text analytics going on with Twitter and other instant messaging streams and forums to try to detect what's going on.

You even have Google applying this kind of technology to look at the pattern of the searches that people are putting in. If people are searching on a particular medical issue centered in a particular geographic location, that's a good indicator that there's something unusual going on there.

It's not just medical cases. You also have brand and reputation management. If someone has started posting something very negative about your company or your products, then you want to detect that really quickly. You want early warning, so that you can react to it really quickly.

We have some great challenges out there, but . . . we have some great technologies to respond to those challenges.



We have a great use case in the intelligence world. That's one of the earliest adopters of text analytics technology. The idea is that if you are going to do something to prevent a terrorist attack, you need to detect and respond to the signals that are out there, that something is pending really quickly, and you have to have a high degree of certainty that you're looking at the right thing and that you're going to react appropriately.

We have some great challenges out there, but, as I said, we have some great technologies to respond to those challenges in a whole variety of business, government, and other applications.

Gardner: Stefan, I think there are very few people who argue with the fact that there is great information out there on the Web, across these different new channels that have become so prominent, but making that something that you can use is a far different proposition. Seth has been telling us about automated tools. Tell us what you see in terms of web data services and how we can make this information available to automated system.

Deep data

Andreasen: Thank you Dana. Let's just look at something like Google. You go there and do a search, and you think that you're searching the entire Internet. But, you're not, because you're probably not going to access data that's hidden behind logins, behind search forms, and so on.

There is a huge amount of what I call "deep web," very valuable information that you have to get to in some other way. That's where we come in and allow you to build robots that can go to the deep web and extract information.

I'd also like to talk a little bit more about the noise-free thing and go to the Google example. Let's say you go to Google and you search for "IBM software." You think that you will be getting an article that has something to do with IBM software.

You often actually find an article that has nothing to do with IBM software, but, because there are some advertisements from IBM, IBM was a hit. There is some other place that links to software, and you will find software. Basically, end up in something completely irrelevant.

Eliminating noise is getting rid of all this stuff around the article that is really irrelevant, so you get better results.

The other thing around noise-free is the structure. It would be great if you could say, "I want to search an article about IBM software which was dated after Oct. 7," or whatever, but that means you also need to have that additional structured information in it.

It's very important to have tools that can . . . understand where the content is within a page and what's the navigation on that page.



The key here is to get noise-free data and to get full data. It's not only to go to the deep web, but also get access to the data in a noise-free way, and in at least a semi-structured way, so that you can do better text analysis, because text analysis is extremely dependent on the quality of data.

Grimes: I have to agree with you there, Stefan. It's very important to have tools that can strip away not only the ads, but understand where the content is within a page and what's the navigation on that page.

We might not be interested in navigation elements, the fluff that's on a page. We want to focus on the content. In addition, nowadays on the Web, there's a big problem of duplication of material that's been hosted in multiple sites. If you're dealing with email or forums, then people typically quote previous items in their reprise, and you want to detect and strip that kind of stuff away and focus on the real relevant content. That is definitely part of the noise-free equation, getting to the authentic content.

Gardner: Stefan, you refer to the deep web. I imagine this also has a role, when it comes to organizations trying to uncover information inside of their firewalls, perhaps among their many employees and all the different tools that they're using. We used to call it the intranet, but is there an intranet effect here for this ability to gather noise-free text information that we can then start processing?

Extended intranet

Andreasen: Absolutely. I'd even say the extended intranet. If we're looking at a web browser, which is the way that most business analysts or other persons today are accessing business applications, we're accessing three different kinds of applications.

One involves applications inside the firewall. It could be the corporate intranet, etc. Then there are applications where you have to use a login, and this can be your partners. You're logging in to your supplier to see if some item is in stock. Or, it can be some federal reporting site or something.

The sites behind the login are like the extended enterprise. Then, of course, there is everything out of the World Wide Web -- more than 150 million web pages out there -- which have all kinds of data, and a lot of that is behind search forms, and so on.

Gardner: Seth, as a consultant and analyst, you've been focused on text analytics for some time, but perhaps a number of our listeners aren't that familiar with it. Could you maybe give us a brief primer on what it is that happens when you identify some information -- be it Internet, extended web, deep web? How do you go through some basic steps to analyze, cleanse, and then put data into a form that you can then start working with?

Grimes: Dana, I'm going to first give you an extremely short history lesson, a little factoid for you. Text analytics actually predates BI. The basic approaches to analyzing textual sources were defined in the late '50s. Actually, there is a paper from an IBM researcher from 1958, that defines BI as the analysis of textual sources.

People apply so-called machine-learning technologies in order to improve the accuracy of what they are doing.



What happened is that enterprises computerized their operations, their accounting, their sales, all of that in the 1960s. That numerical data from transactional systems is readily analyzable, where text is much more difficult to analyze. But, now we have come to the point, as I said earlier, where there is software and great methods for analyzing text.

What do they do? The front-end of any text analysis system is going to be information retrieval. Information retrieval is a fancy, academic type of term, meaning essentially the same thing as search. We want to take a subset of all of the information that's out there in the so-called digital universe and bring in only what's relevant to our business problems at hand. Having the infrastructure in place to do that is a very important aspect here.

Once we have that information in hand, we want to analyze it. We want to do what's called information extraction, entity extraction. We want to identify the names of people, geographical location, companies, products, and so on. We want to look for pattern-based entities like dates, telephone numbers, addresses. And, we want to be able to extract that information from the textual sources.

In order to do that, people usually apply a combination of statistical and linguistic methods. They look for language patterns in the text. They look for statistics like the co-occurrence of words in multiple text. When two words appear next to each other or close to each other in many different documents -- that can be web pages or other documents -- that indicates the degree of relationship. People apply so-called machine-learning technologies in order to improve the accuracy of what they are doing.

Suitable technologies

All of this sounds very scientific and perhaps abstruse -- and it is. But, the good message here is one that I have said already. There are now very good technologies that are suitable for use by business analysts, by people who aren't wearing those white lab coats and all of that kind of stuff. The technologies that are available now focus on usability by people who have business problems to solve and who are not going to spend the time learning the complexities of the algorithms that underlie them.

So, we're at the point now where you can even treat some of these technologies as black boxes. They just work. They produce the results that you need in the form that you need them. That can be in a form that extracts the information into databases, where you can do the same kind of BI that you have been used to for the last 20 years or so with BI tools.

It can be visualizations that allow you to see the interrelationships among the people, the companies, and the products that are identified in the text. If you're working in law enforcement or intelligence, that could be interrelationships among individuals, organizations, and incidents of various types. We have visualization technologies and BI technologies that work on top of this.

Then, we have one other really nice thing that's coming on the horizon, which is semantic web technology -- the ability to use text analytics to support building a web of data that can be queried and navigated by automated software tools. That makes it even easier for individuals to carry out everyday business and personal problems for that matter.

Obviously, any BI or any text analysis is no better than the data source behind it.



Gardner: I'd like to dig into some use-cases and understand a little bit better how this is being used productively in the field. Before we do that, Stefan, maybe you could explain from Kapow Technologies' perspective, how you relate to this text analytics field that Seth so nicely just described. Where does Kapow begin and end, and how do you play perhaps within an ecosystem of providers that help with text analytics?

Andreasen: Text analytics, exactly as Seth was saying, is really a form of BI. In BI, you are examining some data and drawing some conclusions, maybe even making some automated actions on it.

Obviously, any BI or any text analysis is no better than the data source behind it. There are four extremely important parameters for the data sources. One is that you have the right data sources.

There are so many examples of people making these kind of BI applications, text analytics applications, while settling for second-tier data sources, because they are the only ones they have. This is one area where Kapow Technologies comes in. We help you get exactly the right data sources you want.

The other thing that's very important is that you have a full picture of the data. So, if you have data sources that are relevant from all kinds of verticals, all kinds of media, and so on, you really have to be sure you have a full coverage of data sources. Getting a full coverage of data sources is another thing that we help with.

Noise-free data

We already talked about the importance of noise-free data to ensure that when you extract data from your data source, you get rid of the advertisements and you try to get the major information in there, because it's very valuable in your text analysis.

Of course, the last thing is the timeliness of the data. We all know that people who do stock research get real-time quotes. They get it for a reason, because the newer the quotes are, the surer they can look into the crystal ball and make predictions about the future in a few seconds.

The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future. If you are predicting what happens in two years, that doesn't really matter. You need to know what's happening tomorrow. So, the timeliness of the data is important.

Let me get to the approach that we're taking. Business analysts work with business applications through their web browser. They actually often cut and paste data out of business application into some spreadsheet.

The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future.



You can see our product as a web browser, where you can teach it how to interact with the website, how to only extract the data that's relevant, and how you can structure that data, and then repeat it. Our product can give you automated, real-time, and noise-free access to any data you see in a web browser.

How does that apply to text analytics? Well, it gives you the 100-percent covered, real-time data source, with all of those values that I just explained.

Gardner: I really was intrigued by this notion of the crystal ball, and not two years from now, but tomorrow. It seems to me that so many people are putting up so much information about their lives, their preferences. People in business are doing the same around their occupation. We have this virtual focus group going on around us all the time. If we could just suck out the right information based on our products, we could get that crystal ball polished up.

Let me go back to you, Stefan. Can you give us an example of where a market research, customer satisfaction, or virtual focus group benefit is being derived from these text analytics capabilities?

Knowing the customer

Andreasen: Absolutely. For any company selling services or products, the most important thing for them to know is what the customers think about their product. Are we giving our customers the right customer service? Are we packaging our products the right way? How do we understand the customer's buying behavior, the customer communications, and so on?

Intuit is a customer we have together with a text analysis company called Clarabridge. They use text analysis solution to understand the TurboTax customers.

Before they had a text analysis system, they had some people that did one percent coverage sampling of forums on the web, their own customer support system, and emails into their contact center to get some rudimentary overview of what the customer thought.

We went in, and with Kapow Technologies they can now get to all these data sources -- forums online, their own customer support center, and wherever there are networks of TurboTax users -- and extract all the information in near real-time. Then, they use the text-analysis engine to make much, much better predictions of what the customers think, and they actually having the finger on the pulse.

With the web, you don't have to get those people together, because they come together on their own and participate in social media forums of various types.



If a set of customers suddenly talk about a feature that doesn't work, or that is much better in the competitor's product -- and thereby looking into the near future of the crystal ball --they can react early and try to deal with this in the best possible way.

Gardner: Seth Grimes, is this an area where you have seen a lot of the text analytics work focused on these sort of virtual focus groups?

Grimes: Definitely. That's an interesting concept. The idea behind a focus group is that it's a traditional qualitative research tool for market research firms. They get a bunch of people into a room and they have the facilitator lead those people through a conversation to talk about brand names, marketing, positioning, and then get their reactions to it.

With the web, you don't have to get those people together, because they come together on their own and participate in social media forums of various types. There are a whole slew of them. Together they constitute a virtual focus group, as you say.

The important point here is to get at the so-called voice of the customer. In other words, what is the customer saying in his own voice, not in some form where you're forcing that person to tick off number one, two, three, four, or five, in order to rate your product. They can bring up the issues that are of interest to them, whether they are good or bad issues, and they can speak about those issues however they naturally do. That's very important.

I've actually been privileged to share a stage with the analytics manager from Intuit, Chris Jones, a number of times to talk about what he is doing, the technologies, and so on. It's really interesting stuff that amplifies what Stefan had to say.

Broad picture

The idea is that you can use these technologies, both to get a broad picture of the issues, and no longer have to bend those issues into categories that your business analysts have predefined. Now, you can generate the topics of most interest, using automated, statistical methods from what the people are actually saying. In other words, you let them have their own voice.

You also get the effect of not only looking at the aggregate picture, at the mass of the market, but also at the individual cases. If someone posts about a problem with one of the products to an online forum, you can detect that there's an issue there.

You can make sure that the issues gets to the right person, and the company can personally address each issue in order to really keep it from escalating and getting a lot of attention that you really don't want it to get. You get the reputation of being a very responsive company. That's a very important thing.

The goal here is not necessarily to make more money. The goal is to boost your customer satisfaction rating, Net Promoter score, or however you choose to measure it. These technologies, the text technologies, are a very important package and part of the overall package of responding to customer issues and boosting customer satisfaction.

While you're doing it, those people are going to buy more. They're going to reduce your support costs, all of that kind of stuff, and you are going to make more money. So, by doing the right thing, you're also doing something good for your own company.

What you really want to know is who this person knows in all kinds of social networks on the 'Net, and to try to make a network of who are the real influencers and who are the network centers.



Gardner: In business, you want to reduce the guesswork to do better by your customers. Stefan, as I understand it, Kapow Technologies has been quite successful in working with a variety of military, government, and intelligence agencies around the world on getting this real-time information as to what's going on, but perhaps with the stakes being a bit higher, things like terrorism, and even insurrections and uprising.

Tell us a little bit about a second use case scenario, where text analytics are being used by government agencies and intelligence agencies.

Andreasen: As Seth said, the voice of the customer is very interesting and very valuable use case with text analysis. I'll add one thing to what Seth said. He was talking about product input, and of course, we all know that developing products -- maybe not so much a product like TurboTax, but developing a car -- is extremely expensive. So, understanding what kind of product your customers want in the future is an important part of the voice of the customer.

With a lot of the customers in the military intelligence, it's similar. Of course, they would like to know what people are writing from a sentiment point of view, an opinion point of view, but another thing that's actually even more important in the intelligence community is what I will call relationships.

Seth mentioned relationships earlier, and also understanding the real influencers and who are the ones that have the most connections in these relationships. Let's say somebody writes an article about how you mix some chemicals together to make an efficient bomb. What you really want to know is who this person knows in all kinds of social networks on the 'Net, and to try to make a network of who are the real influencers and who are the network centers.

Finding relationships

We see a lot of uses of our product, going out to blogs, forums, etc., in all kinds of languages, translating it often into English, and doing this relationship analysis. A very popular product for that, which is a partner of ours, is Palantir Technologies. It has a very cool interactive way of finding relationships. I think this is also very relevant for normal enterprises.

Yesterday I met with one of the big record companies, which is also a customer of ours. As soon as I explained this relationship stuff, they said, "We can really use this for anti-piracy, because it is really just very few people who do the major work when it gets to getting copies of new films out in the 'Net. So, understanding this relationship can be very relevant for this kind of scenario as well.

Grimes: Dana, when you introduced our podcast today, you used the term ecology or ecosystem, and that's a real great concept that we can apply here in a number of dimensions. We do have an ecosystem in at least two dimensions.

Stefan mentioned one of the Kapow partners, Palantir. We earlier mentioned the text analytics partner, Clarabridge. We have the ability now through integration technologies like Kapow to bring together different information sources, very disparate, different information sources with different characteristics, to provide an ecosystem of information that can be analyzed and brought to bear to solve particular business or government problems.

I find that ecosystem concept to be very useful here in framing the discussions about how the text technologies fit into something that's a much larger picture.



We have a set of software technologies that can similarly be integrated into an ecosystem to help you solve those problems. That might be text analysis technologies. It might be traditional BI or data warehousing technologies. It might be visualization technologies, whatever it takes to handle your particular business problem.

As we've been discussing, we do see applications in a whole variety of business and government issues, whether it's customer or intelligence or many other things that we haven’t even discussed today. So, I find that ecosystem concept to be very useful here in framing the discussions about how the text technologies fit into something that's a much larger picture.

Gardner: So, we are looking at the ecologies. We are looking at some of these use-cases. It seems to me that we also want to be able to gather information from a variety of different players, perhaps in some sort of a supply chain, ecosystem, business process, channel partners, or value added partners. The ecology and ecosystem concept works not only in terms of what we do with this information, but how we can apply that information back out to activities that are multi-player, beyond the borders or boundaries of any one organization.

I'm thinking about product recall, health, and public-health types of issues. Seth, have you worked with any clients or do you have any insights into how text analytics is benefiting an extended supply chain of some sort, and how the ecosystem of insight into the text analytics solves some unique problems there?

Product recall

Grimes: Product recall is an interesting one. Let me give you an example there. This is, like most examples that we are going to discuss, a multifaceted one.

People are all familiar with the problems with Firestone tires back a number of years ago, early in this decade, where the tread was coming off tires. Well, there are a number of parties that are going to be interested in this problem.

I am sorry, but put aside the consumers who are obviously affected by it, very badly affected by it. But, we have the manufacturers, not only of the tires, but also of the vehicles, the Ford Explorer in this case.

We have the regulatory bodies in the government, parts of the U.S. Department of Transportation. We have the insurance industry. All of these are stakeholders who have an interest in early detection, early addressing, and early correction of problem.

You don't want to wait until there are just so many cases here that it's just obvious to everyone, the issues really spill out into the press, and there are questions of negligence, and so on. So, how can you address something like a problem with tires where the tread is coming off?

You don't want to wait until there are just so many cases here that it's just obvious to everyone, the issues really spill out into the press, and there are questions of negligence, and so on.



Well, one way is warranty claims. For example, someone might file a claim through the vehicle manufacturer, Ford in this case, or through the tire manufacturer, claiming a defective product. Sometimes, just an individual tire is defective, but sometimes that's an indication of manufacturing or design issues. So you have warranty claims.

You also have accident reports that are filed by police departments or other government agencies and find their way into databases in the Department of Transportation and other places. Then, you have news reports about particular incidents.

There are multiple sources of information. There are multiple stakeholders here. And, there are multiple ways of getting at this. But, like so many problems, you're going to get at the issue much faster, if you combine information from all of these different sources, rather than relying on a single source.

Again, that's where the importance of building up an ecosystem of different data sources that come to bear on your problem is really important, and that's just a typical use case. I know of other organizations, manufacturing organizations, that are using this technology in conjunction with data-mining technologies for warranty claims, for example. Consumer appliances is another area that I have heard a lot about, but really there is no limitation in where you can apply this.

Gardner: Stefan, from your perspective, for these extended supply chains, public health issues, etc., again we get down to this critical time element -- for example, the Swine flu outbreak last spring. If folks could identify through text analytics where this was starting to crop up, they didn't have to wait for the hospital reports necessarily. Is that an instance where some of these technologies can really play an important role?

Big pitfall

Andreasen: Absolutely. Before I get into some more real examples, I want to emphasize some of the things that Seth was saying. He's talking about getting to multiple data sources. I cannot stress enough that what I have seen out there as one of the biggest pitfalls when people are making a text analysis solution or actually any BI solution is that they look at what data sources they have and they settle for that.

They should have said, "What are the optimal data sources to get the best prediction and get the best outcome out of this text analysis?" They should settle for no less than that.

The example here will actually explain that. I also have a tire example. We actually have two different kinds of customers using our products looking at tires, tire explosions, and tire recalls.

One is a tire company itself. They go to automated forums and try to monitor if people are doing exactly what Seth is saying, filing claims or writing on an automotive blog: "I got this tire, and it exploded." "It's just really bad." "Don't buy it." All those kinds of information from different sources.

If you get enough of the data source and you get that data in real-time, you can actually go in and contain the situation of a potential tire recall before it happens, which of course could be very valuable for your company.

Many different players here can use the same kind of information for different purposes, and that makes this really interesting.



The other use case is stock research. We have a lot of customers doing financial and market research with our technology. One of them is using our product, for example, to go out and check the same forums, but their objective is to predict if there is a tire recall. Then, they can predict that the stock is going to get a crash, when that happens, and project that beforehand.

Many different players here can use the same kind of information for different purposes, and that makes this really interesting as well.

Gardner: Well, it really seems the age old part of this is that, getting information first has many, many advantages, but the new element is that more and more information is in the form of analytics out in the web.

I wonder if we could cap this discussion -- we are about out of time -- by looking at the future. Seth, you mentioned earlier the semantic web. How automated can this get, and what needs to take place in order for that vision of a semantic web to take place?

Grimes: Well, the semantic web right now is a dream. It's a dream that was first articulated over a decade ago by Tim Berners-Lee, the person who created the World Wide Web, but it is one that is on the fast track to being realized. Being realized in this case means creating meaning.

What Stefan was referring to earlier when he talked about the dates of a published article, the title, perhaps other metadata fields such as the author, creating information that describes what's out there on the web and in databases.

Machine processable

Rendering that information into a form that's machine processable, not only in the sense of analysis, but also in the sense of making interconnections among different pieces of information, is what the semantic web is really about. It's about structuring information that's out there on the Web. That can include what Stefan referred to as the deep web, and creating tools that allow people to search and issue other types of queries against that web data.

It's something that people are working hard on now, but I don't think will be really realized in terms of any broad business usable applications for a fair number of years. Not next year or the year after, but maybe three to five years out, we will really start to see a very broadly useful business application. There is going to be niche applications in the near term, but later something much broader.

It's a direction that really hits on the themes that we have been talking about today, integrating applications and data from multiple sources and of multiple types in order to create a whole that is much greater than each of the parts.

We need software technologies that can do that nowadays, and fortunately we have them.



We need software technologies that can do that nowadays, and fortunately we have them, as we have been discussing. We need a path that will evolve us towards something that really creates much greater value for much larger massive applications in the future, and fortunately the technologies that we have now are evolving in that direction.

Gardner: Very good. I think we have to leave it there. I want to thank both of our guests. We have been discussing the role of text analytics and how companies can take advantage of that and bring that into play with their BI and marketing and other activities, and how the mining of this information is now being done by tools and is increasingly being automated.

I want to thank Seth Grimes, principal consultant at Alta Plana Corp., for joining us. Thanks so much, Seth.

Grimes: Again, thank you Dana, and thanks to Kapow for making this possible.

Gardner: Also, Stefan Andreasen, co-founder and CTO at Kapow Technologies. Thanks again for sponsoring and joining us, Stefan.

Andreasen: Well, thank you. That was a great discussion. Thank you.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. This is Part Three of a series from Kapow Technologies on using BI and web data services in unique forms to increase business benefits.

You have been listening to a sponsored BriefingsDirect podcast. Thanks and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Kapow Technologies.

Transcript of a sponsored BriefingsDirect podcast on information management for business intelligence, one of a series on web data services with Kapow Technologies. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.

Friday, October 30, 2009

Business and Technical Cases Build for Data Center Consolidation and Modernization

Transcript of a sponsored BriefingsDirect podcast on how data center consolidation and modernization helps enterprises reduce cost, cut labor, slash energy use, and become more agile.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Akamai Technologies.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on how data-center consolidation and modernization of IT systems helps enterprises reduce cost, cut labor, slash energy use, and become more agile.

We'll look at the business and technical cases for reducing the numbers of enterprise data centers. Infrastructure advancements, standardization, performance density, and network services efficiencies are all allowing for bigger and fewer data centers that can carry more of the total IT requirements load.

These strategically architected and located facilities offer the ability to seek out best long-term outcomes for both performance and cost -- a very attractive combination nowadays. But, to gain the big payoffs from fewer, bigger, better data centers, the essential list of user expectations for performance and IT requirements for reliability need to be maintained and even improved.

Network services and Internet performance management need to be brought to bear, along with the latest data-center advancements to produce the full desired effect of topnotch applications and data delivery to enterprises, consumers, partners, and employees.

Here to help us better understand how to get the best of all worlds -- that is high performance and lower total cost from data center consolidation -- we're joined by our panel. Please join me in welcoming James Staten, Principal Analyst at Forrester Research. Welcome, James.

James Staten: Thanks for having me.

Gardner: We're also joined by Andy Rubinson, Senior Product Marketing Manager at Akamai Technologies. Welcome, Andy.

Andy Rubinson: Thank you, Dana. I'm looking forward to it.

Gardner: And, Tom Winston, Vice President of Global Technical Operations at Phase Forward, a provider of integrated data management solutions for clinical trials and drug safety, based in Waltham, Mass. Welcome, Tom.

Tom Winston: Hi, Dana. Thanks very much.

Gardner: Let me start off with James. Let's look at the general rationale for data-center modernization and consolidation. What are the business, technical, and productivity rationales for doing this?

Data-center sprawl

Staten: There is a variety of them, and they typically come down to cost. Oftentimes, the biggest reason to do this is because you've got sprawl in the data center. You're running out of power, you're running out of the ability to cool any more equipment, and you are running out of the ability to add new servers, as your business demands them.

If there are new applications the business wants to roll out, and you can't bring them to market, that's a significant problem. This is something the organizations have been facing for quite some time.

As a result, if they can start consolidating, they can start moving some of these workloads onto fewer systems. This allows them to reduce the amount of equipment they have to manage and the number of software licenses they have to maintain and lower their support costs. In the data center overall, they can lower their energy costs, while reducing some of the cooling required and getting rid of some of those power drops.

Gardner: James, isn't this sort of the equivalent of Moore's Law, but instead of at silicon clock-speed level, it's at a higher infrastructure abstraction? Are we virtualizing our way into a new Moore's Law era?

Staten: Potentially. We've always had this gap between how much performance a new CPU or a new server could provide and how much performance an application could take advantage of. It's partly a factor of how we have designed applications. More importantly, it's a factor of the fact that we, as human beings, can only consume so much at so fast a rate.

Most applications actually end up consuming on average only 15-20 percent of the server. If that's the case, you've got an awful lot of headroom to put other applications on there.

We were isolating applications on their own physical systems, so that they would be protected from any faults or problems with other applications that might be on the same system and take them down. Virtualization is the primary isolating technology that allows us to do that.

Gardner: I suppose there are some other IT industry types of effects here. In the past, we would have had entirely different platforms and technologies to support different types of applications, networks, storage, or telecommunications. It seems as if more of what we consider to be technical services can be supported by a common infrastructure. Is that also at work here?

Unique opportunity

Staten: That's mostly happening as well. The exception to that rule is definitely applications that just can't possibly get enough compute power or enough contiguous compute power. That creates the opportunity for unique products in the market.

More and more applications are being broken down into modules, and, much like the web services and web applications that we see today, they're broken into tiers. Individual logic runs on its own engine, and all of that can be spread across some more monetized, consistent infrastructure. We are learning these lessons from the dot-coms of the world and now the cloud-computing providers of the world, and applying them to the enterprise.

Gardner: I've heard quite a few numbers across a very wide spectrum about the types of payoffs that you can get from consolidating and modernizing your infrastructure and your data centers. Are there any rules of thumb that are typical types of paybacks, either in some sort of a technical or economic metric?

Staten: There's a wide range of choices from the fact that the benefits come from how bad off you are when you begin and how dramatically you consolidate. On average, across all the enterprises we have spoken to, you can realistically expect to see about a 20 percent cost reduction from doing this. But, as you said, if you've got 5,000 servers, and they're all running at 5 percent utilization, there are big gains to be had.

Gardner: The economic payoff today, of course, is most important. I suppose there is a twofold effect as well. If you're facing a capacity issue and you're thinking about spending $40 or $50 million for an additional data center, and if you can reduce the need to do that or postpone it, you're saving on capital costs. At the same time, you could, perhaps through better utilization, reduce your operating costs as well.

Staten: Absolutely. One of the biggest benefits you get from virtualization is flexibility. It's so much easier to patch a workload and simply keep it running, while you are doing that. Move it to another system, but apply the patch, make sure the patch worked, deploy a clone, and then turn off the old version.

That's much more powerful, and it gives a lot more flexibility to the IT shop to maintain higher service-level agreements (SLAs), to keep the business up and running, to roll out new things faster, and be able to roll them back more easily.

Gardner: Andy Rubinson, this certainly sounds like a no-brainer: Get better performance for less money and postpone large capital expenditures. What are some of the risks that could come into play while we are starting to look at this whole picture? I'm interested in what's holding people back.

Rubinson: I focus mainly on delivery over the Internet. There are definitely some challenges, if you're talking about using the Internet with your data center infrastructure -- things like performance latency, availability challenges from cable cuts, and things of that nature, as well as security threats on the Internet.

It's thinking about how can you do this, how can you deliver to a global user base with your data center, without having to necessarily build out data centers internationally, and to be able to do that from a consolidated standpoint.

Gardner: So, for those organizations that are not just going to be focused on employees, or, if they are, that they are a global organization, they need to be thinking the most wide area network (WAN) possible. Right?

Rubinson: Absolutely.

Gardner: Let's go to our practitioner, Tom Winston. Tom, what sort of effects were you dealing with at Phase Forward, when you were looking at planning and strategy around data center location, capacity, and utilization?

Early adopter

Winston: Well, we were in a somewhat different position, in that we were actually an early adopter of virtualization technology, and certainly had seen the benefits of using that to help contain our data-center sprawl. But, we were also growing extremely rapidly.

When I joined the organization, it had two different data centers -- one on the East Coast and one on the West Coast. We were facing the challenge of potentially having to expand into a European data center, and even potentially a Pacific Rim data center.

By continuing to expand our virtualization efforts, as well as to leverage some of the technologies that Andy just mentioned as far as, Internet acceleration, via some of the Akamai technologies, we were able to forego that data center expansion. In fact, we were able to consolidate our data center to one East Coast data center, which is now our primary hosting center for all of our applications.

So, it had a very significant impact for us by being able to leverage both that WAN acceleration, as well as virtualization, within our own four walls of the data center. [Editor's note: WAN here and in subsequent uses refers to public wide area networks and not private.]

Gardner: Tom, just for the edification of our listeners, tell us a little bit about Phase Forward. Where are your users and where do your applications need to go.

In an age where . . . people are expecting things to be moving extremely quickly and always available, it's very important for us to be able to provide that application all the time, and to perform at a very high level.



Winston: We run electronic data capture (EDC) software, and pharmacovigilance software for the largest pharmaceutical and clinical device makers in the world. They are truly global organizations in nature. So, we have users throughout the world, with more and more heavy population coming out of the Asia Pacific area.

We have a very large, diverse user base that is accessing our applications 24x7x365, and, as a result, we have performance needs all the time for all of our users.

In an age where, as James mentioned, people are expecting things to be moving extremely quickly and always available, it's very important for us to be able to provide that application all the time, and to perform at a very high level.

One of the things James mentioned from an IT perspective is being able to manage that virtual stack. Another thing that virtualization allows us to do is to provide that stack and to improve performance very quickly. We can add additional compute resources into that virtual environment very quickly to scale to the needs that our users may have.

Gardner: James Staten, back to you. Based on Tom's perspective of the combination of that virtualization and the elasticity that he gets from his data center, and the ability to locate it flexibly, thanks to some network optimization and reliability issues, how important is it for companies now, when they think about data center consolidation, to be flexible in terms of where they can locate?

All over the place

Staten: It's important that they recognize that their users are no longer all in the same headquarters. Their users are all over the place. Whether they are an internal employee, a customer, or a business partner, they need to get access to those applications, and they have a performance expectation that's been set by the Internet. They expect whatever applications they are interacting with will have that sort of local feel.

That's what you have to be careful about in your planning of consolidation. You can consolidate branch offices. You can consolidate down to fewer data centers. In doing so, you gain a lot of operational efficiencies, but you can potentially sacrifice performance.

You have to take the lessons that have been learned by the people who set the performance bar, the providers of Internet-based services, and ask, "How can I optimize the WAN? How can I push out content? How can I leverage solutions and networks that have this kind of intelligence to allow me to deliver that same performance level?" That's really the key thing that you have to keep in mind. Consolidation is great, but it can't be at the sacrifice of the user experience.

Gardner: When you find the means to deliver that user experience, that frees you up to then place your data centers strategically based on things like skills or energy availability or tax breaks, and so forth. Isn't that yet another economic incentive here?

Staten: You want to have fewer data centers, but they have to be in the right location, and the right location has to be optimized for a variety of factors. It has to be optimized for where the appropriate skill sets are, just as you described. It has to be optimized for the geographic constraints that you may be under.

We're able to take some of that load off of the servers, and do the work in the cloud, which also helps reduce them.



You may be doing business in a country in which all of the citizen information of the people who live in that country must reside in that country. If that's the case, you don't necessarily have to own a data center there, but you absolutely have to have a presence there.

Gardner: Andy, back to you. What are some of the pros and cons for this Internet delivery of these applications? I suppose you have to rearchitect, in order to take advantage of this as well.

Rubinson: There are two main areas from the positives, the benefits, and that's the cost efficiency of delivering over the Internet, as well as the responsiveness. From the cost perspective, we're able to eliminate unnecessary hardware. We're able to take some of that load off of the servers, and do the work in the cloud, which also helps reduce them.

A lot of cost efficiencies

There are a lot of cost efficiencies that we get, even as you look to Tom's statement about being able to actually eliminate a data center and avoid having to build out a new data center. Those are all huge areas, where it can help to use the Internet, rather than having to build out your own infrastructure.

Also, in terms of responsiveness, by using the Internet, you can deploy a lot more quickly. As Tom explained, it's being able to reach the users across the globe, while still consolidating those infrastructures and be able to do that effectively.

This is really important, as we have seen more and more users that are going outside of the corporate WANs. People are connecting to suppliers, to partners, to customers, and to all sorts of things now. So, the private WANs that many people are delivering their apps over are now really not effective in reaching those people.

Gardner: As James said earlier, we've got different workloads and different types of applications. Help me understand what Akamai can do. Do you just accelerate a web app, or is there a bit more in your quiver in terms of dealing with different types of loads of media, content, application types?

Rubinson: There are a variety of things that we are able to deliver over the Internet. It includes both web- and IP-based applications. Whether it's HTTP, HTTPS, or anything that's over TCP/IP, we're able to accelerate.

. . . The other key area where we have benefit is through the delivery of dynamic data. By optimizing the cloud, we're able to speed the delivery of information from the origin as well.



We also do streaming. One of the things to consider here is that we actually have a global network of servers that kind of makes up the cloud or is an overlay to the cloud. That is helping to not only deliver the content more quickly, but also uses some caching technology and other things that make it more efficient. It allows us to give that same type of performance, availability, and security that you would get from having a private WAN, but doing it over the much less expensive Internet.

Gardner: You're looking at specifics of an application in terms of what's going to be delivered at frequent levels versus more infrequent levels, and you can cache the data and gain the efficiency with that local data store. Is that how it works?

Rubinson: A lot of folks think about Akamai as being a content delivery network (CDN), and that's true. There is caching that we are doing. But, the other key area where we have benefit is through the delivery of dynamic data. By optimizing the cloud, we're able to speed the delivery of information from the origin as well. That's where it's benefiting folks like Tom, where he is able to not only cache information, but the information that is dynamic, that needs to get back from the data center, goes more quickly.

Gardner: Let's check in with Tom. How has that worked out for you? What sort of applications do you use with wide area optimization, and what's been your experience?

Flagship application

Winston: Our primary application, our flagship application, is a product called InForm, which is the main EDC product that our customers use across the Internet. It's accelerated using Akamai technology, and almost 100 percent of our content is dynamic. It has worked extremely well.

Prior to our deployment of Akamai, we had a number of concerns from a performance standpoint. As James mentioned, as you begin to virtualize, you also have to be very conscious of the potential performance hits. Certainly, one of the areas that we were constrained with was performance around the globe.

We had users in China who, due to the amount of traffic that had to traverse the globe, were not happy with the performance of the application. Specifically, we brought in Akamai to start with a very targeted group of users and to be able to accelerate for them the application in that region.

It literally cut the problem right out. It solved it almost immediately. At that point, we then began to spread the rest of that application acceleration product across the rest of our domains, and to continue to use that throughout the product set.

Having an application perform to the level of a Google is something that our end users expect, even though obviously it's a much different application in what it's attempting to solve and what it's attempting to do.



It was extremely successful for us and helped solve performance issues that our end users were having. I think some of the comments that James made are very important. We do live in a world where everybody expects every application across the Internet to perform like Google. You want to search and you expect it to be back in seconds. If it's not, people tend to be unhappy with the performance of the application.

In our application, it's a much more complex application. A lot more is going on behind the scenes -- database calls, whatever it may be. Having an application perform to the level of a Google is something that our end users expect, even though obviously it's a much different application in what it's attempting to solve and what it's attempting to do. So, the benefits that we were able to get from the acceleration servers were very critical for us.

Rubinson: Just to add to that, we recently commissioned a study with Forrester, looking at what is that tolerance threshold [for a page to load]. In the past it had been that people had tolerance for about four seconds. As of this latest study, it's down to two seconds. That's for business to consumer (B2C) users. What we have seen is that the business-to-business (B2B) users are even more intolerant of waiting for things.

It really has gotten to a point where you need that immediate delivery in order to drive the usage of the tools that are out there.

Gardner: I suppose that's just human nature. Our expectations keep going up. They usually don't go down.

Rubinson: True.

Gardner: Back to you, Tom. Tell me a little bit more about this application. Is this a rich Internet application (RIA)? Is this strictly a web interface? Tell us a little bit more about what the technical challenge was in terms of making folks in China get the same experience as those on the East Coast, who were a mile away from your data center.

Everything is dynamic

Winston: The application is one that has a web front-end, but all the information is being sent back to an Oracle database on the back-end. Literally, every button click that you make is making some type of database query or some type of database call, as I mentioned, with almost zero static content. Everything is dynamic.

There is a heavy amount of data that has to go back and forth between the end user and the application. As a result, prior to acceleration, that was very challenging when you were trying to go halfway around the globe. It was almost immediate for us to see the benefits by being able to hop onto the Akamai Global Network and to cut out a number of the steps across the Internet that we had to traverse from one point to our data center.

Gardner: So, it was clearly an important business metric, getting your far-flung customers happy with their response times. How did that however translate back when you reverse engineered from the experience to what your requirement would be within that data center? Was there sort of a meeting of the minds between what you now understand the network is capable of, with what then you had to deliver through your actual servers and infrastructure?

l guess I'm looking for an efficiency metric or response in terms of what the consolidation benefit was.

Winston: As I mentioned, we had already consolidated from a virtualization standpoint within the four walls of the data center. So, we were continuing to expand in that footprint. But, what it allowed us to do was forego having to put a data center in the Pacific Rim or put a data center in Europe to put the application closer to the end user.

Operating like a cloud is really operating in this more homogeneous, virtualized, abstracted world that we call server virtualization in most enterprises.



Gardner: Let's look to the future a little bit. James, when people think nowadays about cloud computing, that's a very nebulous discussion and topic set. It seems as if what we're talking about here is that more enterprises are going to have to themselves start behaving like what people think of as a cloud.

Staten: Yes, to a degree. There is obviously a positive aspect of cloud and one that can potentially be a negative.

Operating like a cloud is really operating in this more homogeneous, virtualized, abstracted world that we call server virtualization in most enterprises. You want to operate in this mode, so that you can be flexible and you can put applications where they need to be and so forth.

But, one of the things that cloud computing does not deliver is that if you run it in the cloud, you are not suddenly in all geographies. You are just in a shared data center somewhere in the United States or somewhere in your geography. If you want to be global, you still have to be global in the same sense that you were previously.

Cloud not a magic pill

Rubinson: Absolutely. Just putting yourself in the cloud doesn't mean that you're not going to have the same type of latency issues, delivering over the Internet. It's the same thing with availability in trying to reach folks who are far away from that hosted data center. So, the cloud isn't necessarily the answer. It's not a pill that you can take to fix that issue.

Gardner: Andy, I don't think you can mention names, but you are not only accelerating the experience for end users of enterprise applications like a Phase Forward. You're also providing similar services for at least several of the major cloud providers.

Rubinson: It really is anybody who is using the cloud for delivery. Whether it's a high-tech, a pharma company, or even a hosting provider in the cloud, they've all seen the value of ensuring that their end users are having a positive experience, especially folks like software-as-a-service (SaaS) providers.

We've had a lot of interest from SaaS companies that want to ensure that they are not only able to give a positive user experience, but even from a sales perspective, being able to demonstrate their software in other locations and other regions is very valuable.

Obviously, by using the best practices that we've adopted to have blazing fast websites and applying them to make sure that all of your applications, consumed by everyone, are still blazing fast means that you don't have to reinvent the wheel.



Gardner: Now, James, when a commercial cloud provider provides an SLA to their customers, they need to meet it, but they also need to keep their costs as low as possible. More and more enterprises are trying to behave like service providers themselves, whether it's through ITIL adoption, IT shared services or service-oriented architecture (SOA). Over time, we're certainly seeing movement toward a provider-supplier, consumer-subscription relationship of some kind.

If we can use this acceleration and the ability to use the network for that requirement of performance to a certain degree, doesn't this then free up the folks who have to meet those SLAs in terms of what they need to provide? I'm getting back to this whole consolidation issue.

Staten: To some degree. Obviously, by using the best practices that we've adopted to have blazing fast websites and applying them to make sure that all of your applications, consumed by everyone, are still blazing fast means that you don't have to reinvent the wheel. Those practices work for your website. You just apply them to more areas.

If you're applying practices you already know, then you can free up your staff to do other things to modernize the infrastructure, such as deploying ITIL more widely than you have so far. You can make sure that you apply virtualization to a larger percentage of your infrastructure and then deal with the next big issue that we see in consolidation, which is virtual machine (VM) sprawl.

Can get out of control

T
his is where you are allowing your enterprise customers, whether they are enterprise architects, developers, or business units to deploy new VMs much more quickly. Virtualization allows you to do that, but you can quickly get out of control with too many VMs to manage.

Dealing with that issue is what is front and center for a lot of enterprise IT professionals right now. If they haven't applied the best practices or performance to their application sets and to their consolidation practices, that's one more thing on their plate that they need to deal with.

Gardner: So, this also can relate to something that many of us are forecasting. Not much of it happening yet, but it's this notion of a hybrid approach to cloud and sourcing, where you might use your data center up to a certain utilization, and under certain conditions, where there is a spike in demand, you could just offload that to a third-party Cloud provider.

If you're assured from the WAN services that the experience is going to be the same, regardless of the sourcing, they are perhaps going to be more likely to pursue such a hybrid approach. Is that fair to say, James?

Staten: This is a really good point that you're bringing up. We wrote about this in a report we called "Hollow Out The MOOSE." MOOSE is Forrester's term for the Maintenance and Ongoing Operations, Systems, and Equipment, which is basically everything you are running in your data center that hasn't yet been deployed up to this point.

The real answer is that you need to choose the right type of solution for the right problem. We call this Strategic Rightsourcing . . .



The challenge most enterprises have is that MOOSE consumes 70 or 80 percent of their entire budget, leaving very little for new innovation and other things. They see things like cloud and they say, "This is great. I'll just move this stuff to the cloud, and suddenly it will save me money."

No. The real answer is that you need to choose the right type of solution for the right problem. We call this Strategic Rightsourcing, which says to take the things that others do better than you and have others do them, but know economically whether that's a positive tradeoff for you or not. It doesn't necessarily have to be cash positive, but it has to be an opportunity to be cost positive.

In the case of cloud computing, if I have something that I have to run myself, it's very unique to how I design it, and it's really best that I run it in my data center, you're not saving money by putting that in the cloud.

If it's an application that has a lot of elasticity, and you want it to have the ability to be on two virtual machines during the evening, and scale up to as many as 50 during the day, and then shrink back down to 2, that's an ideal use of cloud, because cloud is all about temporary capacity being turned on.

A lot of people think that it's about performance, and it's not. Sure, load balancing and the ability to spawn new VMs increases the performance of your application, but performance is experienced by the person at the end of the wire, and that's what has to be optimized. That's why those types of networks are still very valuable.

Gardner: Tom Winston, is this vision of this hybrid and the use of cloud for ameliorating spikes and therefore reducing your total cost appealing to you?

Has to be right

Winston: It is, but I couldn't agree more with what James just said. It has to be for the right situation. Certainly, we've started to look at some of our applications, potentially using them in a cloud environment, but right now our critical application, the one that I mentioned earlier, is something that we have to manage. It's a very complex environment. We manage it and we need to hold it very close to the vest.

People have the idea that, "Gee, if I put it in the cloud, my life just got a lot easier." I actually think the reverse might be true, because if you put it into the cloud, you lose some control that you have when it's inside your four walls.

Now, you lose the ability to be able to provide the level of service you want for your customers. Cloud needs to be for the right application and for the right situation, as James mentioned. I really couldn't agree more with that.

For Akamai, it's really about how we're able to accelerate that.



Gardner: So, the cloud is not the right hammer for all nails, but for when that nail is correct, that hybrid model can perhaps be quite a economic benefit. Andy, at Akamai, are you guys looking at that hybrid model, and is there something there that your services might foster?

Rubinson: This is really something that we are agnostic about. Whether it's in a data center owned by the customer or whether it's in a hosted facility, we are all about the means of delivery. It's delivering applications, websites, and so forth over the public Internet.

It's something we're able to do, if there are facilities that are being used for, say, disaster recovery, where it's the hybrid scenario that you are describing. For Akamai, it's really about how we're able to accelerate that. How we are able to optimize the routing and the other protocols on the Internet to make that get from wherever it's hosted to a global set of end users.

We don't care about where they are. They don't have to be on the corporate, private WANs. It's really about that global reach and giving the levels of performance to actually provide an SLA. Tell me who else out there provides an SLA for delivery over the Internet? Akamai does.

Gardner: Well, we'll have to leave it there. We've been discussing how data center consolidation and modernization can help enterprises cut costs, reduce labor, slash their energy use, and become more agile, but also keeping in mind the requirements about the performance across wide area networks.

We've been joined by James Staten, he is a Principal Analyst at Forrester Research. Thank you, James.

Staten: Thank you.

Gardner: We were also joined by Andy Rubinson, Senior Product Marketing Manager at Akamai Technologies. Thank you, Andy.

Rubinson: Thank you very much.

Gardner: Also, I really appreciate your input Tom Winston, Vice President of Global Technical Operations at Phase Forward.

Winston: Dana, thanks very much. Thanks for having me.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Akamai Technologies.

Transcript of a sponsored BriefingsDirect podcast on how data center consolidation and modernization helps enterprises reduce cost, cut labor, slash energy use, and become more agile. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.

Thursday, October 29, 2009

Separating Core from Context Brings High Returns in Legacy Application Transformation

Transcript of the second in a series of sponsored BriefingsDirect podcasts on the rationale and strategies for application transformation.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Hewlett-Packard.


Gain more insights into "Application Transformation: Getting to the Bottom Line" via a series of HP virtual conferences Nov. 3-5. For more on Application Transformation, and to get real time answers to your questions, register to the virtual conferences for your region:
Register here to attend the Asia Pacific event on Nov. 3.
Register here to attend the EMEA event on Nov. 4.
Register here to attend the Americas event on Nov. 5.


Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on separating core from context, when it comes to legacy enterprise applications and their modernization processes. As enterprises seek to cut their total IT costs, they need to identify what legacy assets are working for them and carrying their own weight, and which ones are merely hitching a high cost -- but largely unnecessary -- ride.

The widening cost-in-productivity division exists between older, hand-coded software assets, supported by aging systems, and replacement technologies on newer, more efficient standards-based systems. Somewhere in the mix, there are core legacy assets distinct from so-called contextal assets. There are peripheral legacy processes and tools that are costly vestiges of bygone architectures. There is legacy wheat and legacy chaff.

Today we need to identify productivity-enhancing resources and learn how to preserve and modernize them -- while also identifying and replacing the baggage or chaff. The goal is to find the most efficient and low-cost means to support them both, through up-to-date data-center architecture and off-the-shelf components and services.

This podcast is the second in a series of three to examine Application Transformation: Getting to the Bottom Line. We will discuss the rationale and likely returns from assessing the true role and character of legacy applications and their actual costs. The podcast, incidentally, runs in conjunction with some Hewlett-Packard (HP) webinars and virtual conferences on the same subject.

Register here to attend the Asia Pacific event on Nov. 3. Register here to attend the EMEA event on Nov. 4. Register here to attend the Americas event on Nov. 5.

With us to delve deeper into the low cost, high reward transformation of legacy enterprise applications is Steve Woods, distinguished software engineer at HP. Hello, Steve.

Steve Woods: Hello. How are you doing?

Gardner: Good. We are also joined by Paul Evans, worldwide marketing lead on Applications Transformation at HP. Hello, Paul.

Paul Evans: Hello, Dana. Thank you.

Gardner: We talked in the earlier podcast in our series, a case study, about transformation and why that's important through the example of a very large education organization in Italy and what they found. We looked at how this can work very strategically and with great economic benefit, but I think now we are trying to get into a bit more of the how.

Tell us a little bit, Paul, about what the stakes are. Why is it so important to do this now?

Evans: In a way, this podcast is about two types of IT assets. You talked before about core and context. That whole approach to classifying business processes and their associated applications was invented by Geoffrey Moore, who wrote Crossing the Chasm, Inside the Tornado, etc.

He came up with this notion of core and context applications. Core being those that provide the true innovation and differentiation for an organization. Those are the ones that keep your customers. Those are the ones that improve the service levels. Those are the ones that generate your money. They are really important, which is why they're called "core."

Lower cost

The "context" applications were not less important, but they are more for productivity. You should be looking to understand how that could be done in terms of lower cost provisioning. When these applications were invented to provide the core capabilities, it was 5, 10, 15, or 20 years ago. What we have to understand is that what was core 10 years ago may not be core anymore. There are ways of effectively doing it at a much different price point.

As Moore points out, organizations should be looking to build "core," because that is the unique intellectual property of the organization, and to then buy "context." They need to understand, how do I get the lowest-cost provision of something that doesn't make a huge difference to my product or service, but I need it anyway.

An human resources system may not be something that you are going to build your business model on, but you need one. You need to be able to service your employees and all the things they need. But, you need to do that at the lowest-cost provision. As time has gone on, this demarcation between core and context has gotten really confused.

As you said, we're putting together a series of events, and Moore will be the keynote speaker on these events. So, we will elucidate more around core and context.

The other speaker at the event is also an inventor, this time from inside HP, Steve Woods. Steve has taken this notion of core and context and has teamed it with some extremely exciting technology and very innovative thinking to develop some unique tools that we use inside the services from HP, which allow us then really to dive into this. That's going to be one of the sessions that we're also going to be delivering on this series of events.

Gardner: Okay, Steve Woods, we can use a lot of different terms here, "core and context," "wheat and chaff." I thought another metaphor would be "baby and bathwater." What happens is that it's difficult to separate the good from the potentially wasteful in the legacy inventory.

I think this has caused people to resist modernizing. They have resisted tinkering with legacy installations in the past. Why are they willing to do it now? Why the heightened interest at this time?

Woods: A good deal of it has to do with the pain that they're going through. We have had customers who had assessments with us before, as much as a year ago, and now they're coming back and saying they want to get started and actually do something. So, a good deal of the interest is caused by the need to drive down costs.

Also, there's the realization that a lot of these tools -- extract, transform, and load (ETL) tools, enterprise application integration (EAI) tools, reporting, and business process management (BPM) -- are proving themselves now. We can't say that there is a risk in going to these tools. They realize that the strength of these tools is that they bring a lot of agility, solve skill sets issues, and make you much more responsive to the business needs of the organization.

Gardner: This definition of core, as Paul said, is changing over time and also varies greatly from organization to organization. Is there no one size fits all approach to this?

Context not code

Woods: I don't think there really is a one size fits all, but as we use our tools to analyze code, we find sometimes as much as 65 percent or more of an application could really not be core. It could just be context.

As we make these discoveries, we find that in the organization there are political battles to be fought. When you identify these elements that are not core and that could be moved out of handwritten code, you're transferring power from the developers -- say, of COBOL -- to the users of the more modern tools, like the BPM tools.

So there is always an issue. What we try to do, when we present our findings, is to be very objective. You can't argue that we found that 65 percent of the application is not doing core. You can then focus the conversation on something more productive. What do we do with this? The worst thing you could possibly do is take a million lines of COBOL that's generating reports and rewrite that in Java or C# hard-written code.

We take the concept of core versus context not just to a possible off-the-shelf application, but at architectural component level. In many cases, we find that this is helpful for them to identify legacy code that could be moved very incrementally to these new architectures.

Gardner: What's been the holdup? What's difficult? You did mention politics, and we will get into that later, but what's been the roadblock from perspective of these tools? Why has that been decreasing in terms of the ability to automate and manage these large projects?

Woods: A typical COBOL application -- this is true of all legacy code, but particularly mainframe legacy code -- can be as much as 5, 10, or 15 million lines of code. I think the sheer idea of the size of the application is an impediment. There is some sort of inertia there. An object at rest tends to stay at rest, and it's been at rest for years, sometimes 30 years.

So, the biggest impediment is the belief that it's just too big and complex to move and it's even too big and complex to understand. Our approach is a very lightweight process, where we go in and answer to a lot of questions, remove a lot of uncertainty, and give them some very powerful visualizations and understanding of the source code and what their options are.

Gardner: So, as we've progressed in terms of the tools, the automation, and the ability to handle large sets of code, the inertia also involves the nontechnical aspects. What do we mean by politics? Are there fiefdoms? Are there territories? Is this strictly a traditional kind of human nature thing? Perhaps you could help us understanding that a bit better.

Doing things efficiently

Woods: Organizations that we go in have not been living in a vacuum, so many of have been doing greenfield development; when they start out to say they need a system that does primarily reporting, or a system that does primarily data integration. In most organizations those fiefdoms, if you will, have grown pretty robust, and they will continue to grow. The realization is that they actually can do those things quite efficiently.

When you go to the legacy side of the house, you start finding that 65 percent of this application is just doing ETL. It's just parsing files and putting them into databases. Why don't you replace that with a tool? The big resistance there is that, if we replace it with a tool, then the people who are maintaining the application right now are either going to have to learn that tool or they're not going to have a job.

So, there's a lot of resistance in the sense that we don't want to lose anymore ground to the target architecture fiefdom, so we are going to not identify this application as having so many elements of context functionality. Our process, in a very objective way, just says that these are the percentages that we're finding. We'll show you the code, you can agree or disagree that's what it is doing, and then let's make decisions based upon those facts.

If we get the facts on the table, particularly visually, then we find that we get a lot of consensus. It may be partial consensus, but it's consensus nonetheless, and we open up the possibilities and different options, rather than just continuing to move through with hand-written code.

If you look at this whole core-context thing, at the moment, organizations are still in survival mode.



Gardner: Paul, you've mentioned in the past that we've moved from the nice-to-have to the must-have, when it comes to legacy applications transformation and modernization. The economy has changed things in many respects, of course, but it seems as if the lean IT goal is no longer something that's a vision. It's really moved up the pecking order or the hierarchy of priorities.

Is this perhaps something that's going to impact this political logjam? Are other organizations and business and financial outcome folks, who are just going to steamroll these political issues?

Evans: Well, I totally think so, and it's happening already. If you look at this whole core-context thing, at the moment, organizations are still in survival mode. Money is still tight in terms of consumer spending. Money is still tight in terms of company spending. Therefore, you're in this position where keeping your customers or trying to get new customers is absolutely fundamental for staying alive. And, you do that by improving service levels, improving your services, and improving your product.

If you stay still and say, "Well, we'll just glide for the next 6 to 12 months and keep our fingers crossed," you're going to be in deep trouble. A lot of people are trying to understand how to use the newer technologies, whether it's things like Web 2.0 or social networking tools, to maintain that customer outreach.

Those of us who went to the business school, marketing school remember -- it takes $10 to get a customer into your store, but it only takes $1 to keep them coming back. People are now worrying about those dollars. How much do we have to spend to keep our customer base?

Therefore, the line-of-business people are now pushing on technology and saying, "You can't back off. You can't not give us what we want. We have to have this ability to innovate and differentiate, because that way we will keep our customers and we will keep this organization alive."

Public and private sectors

That applies equally to the public and private sectors. The public sector organizations have this mandate of improving service, whether it's in healthcare, insurance, tax, or whatever. So all of these commitments are being made and people have to deliver on them, albeit that the money, the IT budget behind it, is shrinking or has shrunk.

So, the challenge here is, "Last year I ran my IT department on my theoretical $100. I spent $80 on keeping things going, and $20 on improving things." That was never enough for the line-of-business manager. They will say, "I want to make a change. I want it now, or I want it next week. I don't want it in six months time. So explain to me how you are going to do that."

That was tough a year ago, but the problem now is that your $100 IT budget is now $80. Now, it's a bit of a challenge, because now all the money you have got you are going to spend on keeping the old stuff alive. I don't think the line-of-business managers, or whoever they are, are going to sit back and say, "That's okay. That's okay. We don't mind." They're going to come and say that they expect you to innovate more.

This goes back to what Steve was talking about, what we talked about, and what Moore will raise in the event, which is to understand what drives your company. Understand the values, the differentiation, and the innovations that you want and put your money on those and then find a way of dramatically reducing the amount of money you spend on the contextual stuff, which is pure productivity.

The point of the tools is that they allow us to see the code. They allow us to understand what's good and bad and to make very clear, rational, and logical decision.



Steve's tools are probably the best thing out there today that highlight to an organization, "You don't need this in handwritten code. You could put this to a low cost package, running on a low cost environment, as opposed to running it in COBOL on a mainframe." That's how people save money and that's how we've seen people get, as we have talked earlier, a return on investment (ROI) of 18 months or less.

So it is possible, it can be done, and it's definitely not as difficult as people think. The point of the tools is that they allow us to see the code. They allow us to understand what's good and bad and to make very clear, rational, and logical decision.

Gardner: Steve Woods, we spoke earlier about how the core assets are going to be variable from organization to organization, but are there some common themes with the contextual services? We certainly see a lot of very low-cost alternatives now creeping up through software as a service (SaaS), cloud-based, outsourced, mix-sourced, co-located, and lots of different options. Is there some common theme now among what is not core that organizations need to consider?

Woods: Absolutely. One of the things that we do find, when we're brought in to look at legacy applications, is that by virtue of the fact that they are still around, the applications have resisted all the waves of innovation that have preceded. Sometimes, they tend to be of a very definite nature.

A number of them tend to be big data hubs. One of the first things we ask is for the architectural topology diagram, if they have it, or we just draw it on a whiteboard,, they tend to be big spiders. There tends to be a central hub database and you see that they start drawing all these different lines to other different systems within the organization.

The things that have been left behind -- this is the good news -- tend to be the very things that are very amenable for moving to modern architecture in a very incremental way. It's not unusual to find 50-65 percent of an application is just doing ETL functionality.

A good thing

The real benefit to that -- and this is particularly true in a tough economy -- is that if I can identify 65 percent of the application that's just doing data integration, and I create or I have already established the data integration center of excellence within the organization, already have those technologies, or implement those technologies, then I can incrementally start moving that functionality over to the new architecture. When I say incrementally, that's a good thing, because that's beneficial in two ways.

It reduces my risk, because I am doing it a step at a time. It also produces a much better ROI, because the return on the incremental improvement is going to be trickling over time, rather than waiting for 18 months or two years for some big bang type of improvement. Identifying this context code can give you a lot of incremental ROI opportunities, and make you a much more solid IT investment decision picture.

Gardner: So, one of these innovations that's taken place for the past several years is the move towards more distributed data, hosting that data on lower-cost storage architectures, and virtualizing behind the database or the storage itself. That can reduce cost dramatically.

Woods: Absolutely. One of the things that we feel is that decentralizing the architecture improves your efficiency and your redundancy. There is much more opportunity for building a solid, maintainable architecture than there would be if you kept a sort of monolithic approach that's typical on the mainframe.

Gardner: Once we've done this exercise, variable as it may be from organization to organization, separating the core from the non-core, what comes next? What's the next step that typically happens as this transformation and modernization of legacy assets unfolds?

So, if you accept the premise of moving context code to componentized architecture, then the next thing you should be looking for is where is the clone code and how is it arranged?



Woods: That's a very good question. It's really important to understand this leap in logic here. If I accept the notion that a majority of the code in a legacy application can be moved to these model driven architectures, such as BPM and ETL tools, the next premise is, "If I go out and buy these tools, a lot of functionality is provided with these tools right out of the box. It's going to give me my monitoring code, my management code, and in many cases, even some of the testing capabilities are sort of baked into the product."

If that's true, then the next leap of logic is that in my 1.5 million lines of COBOL or my five million lines of COBOL there is a lot of code that's irrelevant, because it's performing management, monitoring, logging, tracing, and testing. If that's true, I need to know where it's at.

The way you find where it's at is identifying the duplicate source code, what we call clone code. Because when you find the clone code, in most cases, it's a superset of that code that's no longer relevant, if you are making this transformation from handwritten code to a model-driven architecture.

What I created at HP is a tool, an algorithm, that can go into any language legacy code and find the duplicate code, and not only find it, but visualize it in very compelling ways. That helps us drill down to identify what I call the unintended design. When we find these unintended designs, they lead us to ask very critical questions that are paramount to understanding how to design the transformation strategy.

So, if you accept the premise of moving context code to componentized architecture, then the next thing you should be looking for is where is the clone code and how is it arranged?

Gardner: Do we have any examples of how this has worked in practice? Are there use cases or an actual organization that you are familiar with? What have been some of the results of going through this process? How long did it take? What did they save? What were the business outcomes?

Viewing the application

Woods: We've often worked with financial services companies and insurance companies, and we have just recently worked with one that gave us an application that was around 1.2 or 1.5 million lines of code. They said, "Here is our application," and they gave us the source code. When we looked into the source code, we found that there were actually four applications, if you looked at just the way the code was structured, which was good news, because it gives us a way of breaking down the functionality.

In this one organization, we found that a high percentage of that code was really just taking files, as I said before, unbundling those files, parsing them, and putting them into databases. So they have kind of let that be the tip of the spear. They said, "That's our start point," because they're often asking themselves where to start.

When you take handwritten code and move it to an ETL tool, there's ample industry evidence that a typical ROI over the course of four years can be between 150 percent and 450 percent improvement in efficiencies. That's just the magic of taking all this difficult-to-maintain spaghetti code and moving it to a very visually oriented tool that gives you much more agility and allows you to respond to changes in the business and the business' needs much more quickly and with skill sets that are readily available.

Gardner: You know, Paul, I've heard a little different story from some of the actual suppliers of legacy systems. A lot of times they say that the last thing you want to do is start monkeying around with the code. What you really want to do is pull it off of an old piece of hardware and put it on a new piece of hardware, perhaps with a virtualization layer involved as well. Why is that not the right way to go?

Evans: Now you've put me in an interesting position. I suppose our view is that there are different strategies. We don't profess one strategy to help people transform or modernize their apps. The first thing they have to do is understand them, and that's what Steve's tools do.

The point is that we don't have a preconceived view of what this thing should run on. That's one thing. We're not wedded to one architectural style.



It is possible to take an approach that says that all we need to do is provide more horsepower. Somebody comes along and says, "Hey, transaction rates are dropping. Users are getting upset because an ATM transaction is taking a minute, when it should take 15 seconds. Surely all we need to do is just give the thing more horsepower and the problem goes away."

I would say the problem goes away -- for 12 months, maybe, or if you're lucky 18 -- but you haven't actually fixed the problem. You've just treated the symptoms.

At HP, we're not wedded to one style of computer architecture as the hub of what we do. We look at the customer requirement. Do we have systems that are equal in performance, if not greater, than a mainframe? Yeah, you bet we do. Our Superdome systems are like that. Are they the same price? No, they are considerably less. Do we have blades, PCs, and normal distributed service? Yeah.

The point is that we don't have a preconceived view of what this thing should run on. That's one thing. We're not wedded to one architectural style. We look at the customer's requirements and then we understand what's necessary in terms of the throughput TP rates or whatever it may be.

So, there is obviously an approach that people can say, "Don't jig around." It's very easy to inject fear into this and just say to put more power underneath it, don't touch the code, and life will be wonderful. We're totally against that approach, but it doesn't mean that one of our strategies is not re-hosting. There are organizations whose applications would benefit from that.

We still believe that can be done on relatively inexpensive hardware. We can re-host an application by keeping the business logic the same, keeping the language the same, but moving it from an expensive system to a less expensive system.

Freeing up cash

People use that strategy to free up cash very quickly. It's one of the fastest ROIs we have, and they are beginning to save instantly. They make the decision that says, "We need to put that money back in the bank, because we need to do that to keep our shareholders happy." Or, they can reinvest that into their next modernization project, and then they're on an upward spiral.

There are approaches to everything, which is why we have seven different strategies for modernization to suit the customer's requirement, but I think the view of just putting more horsepower underneath, closing your eyes, and hoping is not the way forward.

Gardner: Steve, do you have anything more to add to that, treating the symptom rather than the real issues?

Woods: As Paul said, if you treat this as a symptom, we refer to that as a short-term strategy, just to save money to reinvest into the business.

The only thing I would really add to that is that the problem is sometimes not nearly as big as it seems. If you look at the analogy of the clone codes that we find, and all the different areas that we can look at the code and say that it may not be as relevant to a transformation process as you think it is.

The subject matter experts and the stakeholders very slowly start to understand that this is actually possible. It's not as big as we thought.



I do this presentation called "Honey I Shrunk the Mainframe." If you start looking at these different aspects between the clone code and what I call the asymmetrical transformation from handwritten code to model driven architecture, you start looking at these different things. You start really seeing it.

We see this, when we go in to do the workshops. The subject matter experts and the stakeholders very slowly start to understand that this is actually possible. It's not as big as we thought. There are ways to transform it that we didn't realize, and we can do this incrementally. We don't have to do it all at once.

Once we start having those conversations, those who might have been arguing for a re-host suddenly realize that rearchitecting is not as difficult as they think, particularly if you do it asymmetrically. Maybe they should reconsider the re-host and consider going to those context-core concept and start moving the context to these well-proven platforms, such as the ETL tools, the reporting tools, and service-oriented architecture (SOA).

Gardner: Steve, tell us a little bit about how other folks can learn more about this, and then give us a sneak peek or preview into what you are going to be discussing at the upcoming virtual event.

Woods: That's one of the things that we have been talking about -- our tools called the Visual Intelligence Tools. It's a shame you can't see me, because I'm gesturing with my hands as I talk, and If I had the visuals in front of me, I would be pointing to them. This is something to really appreciate -- the images that we give to our customers when we do the analysis. You really have to see it with your own eyes.

We are going to be doing a virtual event on November 3, 4, and 5, and during this you will hear some of the same things I've been talking about, but you will hear them as I'm actually using the tools and showing you what's going to happen with those tools, what those images look like, and why they are meaningful to designing a transformation strategy.

Gardner: Very good. We've been learning more about Application Transformation: Getting to the Bottom Line, and we have been able to separate core from context, and appreciate better how that's an intriguing strategy for approaching this legacy modernization problem and begin to enjoy much greater economic and business benefits as a result.

Helping us weave through this has been Steve Woods, distinguished software engineer at HP. Thanks for your input, Steve.

Woods: Thank you.

Gardner: We've also been joined by Paul Evans, worldwide marketing lead on Applications Transformation at HP. Paul, you are becoming a regular on our show.

Evans: Oh, I'm sorry. I hope I am not getting too repetitive.

Gardner: Not at all. Thanks again for your input.

This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Hewlett-Packard.


Gain more insights into "Application Transformation: Getting to the Bottom Line" via a series of HP virtual conferences Nov. 3-5. For more on Application Transformation, and to get real time answers to your questions, register to the virtual conferences for your region:
Register here to attend the Asia Pacific event on Nov. 3.
Register here to attend the EMEA event on Nov. 4.
Register here to attend the Americas event on Nov. 5.


Transcript of the second in a series of sponsored BriefingsDirect podcasts on the rationale and strategies for application transformation. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.