Tuesday, January 05, 2010

Game-Changing Architectural Advances Take Data Analytics to New Performance Heights

Transcript of a BriefingsDirect podcast on how new advances in collocating applications with data architecturally provides analytics performance breakthroughs.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Aster Data Systems.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on how new architectures for data and logic processing are ushering in a game-changing era of advanced analytics.

These new approaches support massive data sets to produce powerful insights and analysis, yet with unprecedented price-performance. As we enter 2010, enterprises are including more forms of diverse data into their business intelligence (BI) activities. They're also diversifying the types of analysis that they expect from these investments.

We're also seeing more kinds and sizes of companies and government agencies seeking to deliver ever more data-driven analysis for their employees, partners, users, and citizens. It boils down to giving more communities of participants what they need to excel at whatever they're doing. By putting analytics into the hands of more decision makers, huge productivity wins across entire economies become far more likely.

But such improvements won’t happen if the data can't effectively reach the application's logic, if the systems can't handle the massive processing scale involved, or the total costs and complexity are too high.

In this discussion we examine how convergence of data and logic, of parallelism and MapReduce -- and of a hunger for precise analysis with a flood of raw new data -- all are setting the stage for powerful advanced analytics outcomes.

Here to help us learn how to attain advanced analytics and to uncover the benefits from these new architectural activities for ubiquitous BI, are Jim Kobielus, senior analyst at Forrester Research. Welcome, Jim.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: We're also joined by Sharmila Mulligan, executive vice president of marketing at Aster Data. Welcome, Sharmila.

Sharmila Mulligan: Thank you. Hello, everyone.

Gardner: Jim, let me start with you. We're looking at a shift now, as I have mentioned, in response to oceans of data and the need for analysis across different types of applications and activities. What needs to change? The demands are there, but what needs to change in terms of how we provide the solution around these advanced analytical undertakings?

Rethinking platforms

Kobielus: First, Dana, we need to rethink the platforms with which we're doing analytical processing. Data mining is traditionally thought of as being the core of advanced analytics. Generally, you pull data from various sources into an analytical data mart.

That analytical data mart is usually on a database that's specific to a given predictive modeling project, let's say a customer analytics project. It may be a very fast server with a lot of compute power for a single server, but quite often what we call the analytical data mart is not the highest performance database you have in your company. Usually, that high performance database is your data warehouse.

As you build larger and more complex predictive models -- and you have a broad range of models and a broad range of statisticians and others building, scoring, and preparing data for these models -- you quickly run into resource constraints on your existing data-mining platform, really. So, you have to look for where you can find the CPU power, the data storage, and the I/O bandwidth to scale up your predictive modeling efforts. That's the number one thing. The data warehouse is the likely suspect.

Also, you need to think about the fact that these oceans of data need to be prepared, transformed, cleansed, meshed, merged, and so forth before they can be brought into your analytical data mart for data mining and the like.

Quite frankly, the people who do predictive modeling are not specialists at data preparation.



Quite frankly, the people who do predictive modeling are not specialists at data preparation. They have to learn it and they sometimes get very good at it, but they have to spend a lot of time on data mining projects, involved in the grunt work of getting data in the right format just to begin to develop the models.

As you start to rethink your whole advanced analytics environment, you have to think through how you can automate to a greater degree all these data preparation, data loading chores, so that the advanced analytics specialists can do what they're supposed to do, which is build and tune models of various problem spaces. Those are key challenges that we face.

But, there is one third challenge, which is advanced analytics producing predictive models. Those predictive models increasingly are deployed in-line to transactional applications, like your call center, to provide some basic logic and rules that will drive such important functions as "next best offer" being made to customers based on a broad variety of historical and current information.

How do you inject predictive logic into your transactional applications in a fairly seamless way? You have to think through that, because, right now, quite often analytical data models, predictive models, in many ways are not built for optimal embedding within your transactional applications. You have to think through how to converge all these analytical models with the transactional logic that drives your business.

Gardner: Okay. Sharmila, are your users or the people that you talk to in the market aware that this shift is under way? Do they recognize that the same old way of doing things is not going to sustain them going forward?

New data platform

Mulligan: What we see with customers is that the advanced analytics needs and the new generation of analytics that they are trying to do is driving the need for a new data platform.

Previously, the choice of a data management platform was based primarily on price-performance, being able to effectively store lots of data, and get very good performance out of those systems. What we're seeing right now is that, although price performance continues to be a critical factor, it's not necessarily the only factor or the primary thing driving their need for a new platform.

What's driving the need now, and one of the most important criteria in the selection process, is the ability of this new platform to be able to support very advanced analytics.

Customers are very precise in terms of the type of analytics that they want to do. So, it's not that a vendor needs to tell them what they are missing. They are very clear on the type of data analysis they want to do, the granularity of data analysis, the volume of data that they want to be able to analyze, and the speed that they expect when they analyze that data.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.



They are very clear on what their requirements are, and those requirements are coming from the top. Those new requirements, as it relates to data analysis and advanced analytics, are driving the selection process for a new data management platform.

There is a big shift in the market, where customers have realized that their preexisting platforms are not necessarily suitable for the new generation of analytics that they're trying to do.

Gardner: Let's take a pause and see if we can't define these advanced analytics a little better. Jim, what do we mean nowadays when we say "advanced analytics?"

Kobielus: Different people have their definitions, but I'll give you Forrester's definition, because I'm with Forrester. And, it makes sense to break it down into basic analytics versus advanced analytics.

What is basic analytics? Well, that's BI. It's the core of BI that you build your decision support environment on. That's reporting, query, online analytical processing, dashboarding, and so forth. It's fairly clear what's in the core scope of BI.

Traditional basic analytics is all about analytics against deep historical datasets and being able to answer questions about the past, including the past up to the last five seconds. It's the past that's the core focus of basic analytics.

What's likely to happen

Advanced analytics is focused on how to answer questions about the future. It's what's likely to happen -- forecast, trend, what-if analysis -- as well as what I like to call the deep present, really current streams for complex event processing. What's streaming in now? And how can you analyze the great gushing streams of information that are emanating from all your applications, your workflows, and from social networks?

Advanced analytics is all about answering future-oriented, proactive, or predictive questions, as well as current streaming, real-time questions about what's going on now. Advanced analytics leverages the same core features that you find in basic analytics -- all the reports, visualizations, and dashboarding -- but then takes it several steps further.

First and foremost, it's all about amassing a data warehouse or a data mart full of structured and unstructured information and being able to do both data mining against the structured information, and text analytics or content analytics against the unstructured content.

Then, in the unstructured content, it's being able to do some important things, like natural language processing to look for entities and relationships and sentiments and the voice of the customer, so you can then extrapolate or predict what might happen in the future. What might happen if you make a given offer to a given customer at a given time? How are they likely to respond? Are they likely to jump to the competition? Are they likely to purchase whatever you're offering? All those kinds of questions.

The query and reporting aspect continues to be very important, but the difference now is that the size of the data set is far larger than what the customer has been running with before.



Gardner: Sharmila, do you have anything to offer further on defining advanced analytics in this market?

Mulligan: Before I go into advanced analytics, I'd like to add to what Jim just talked about on basic analytics. The query and reporting aspect continues to be very important, but the difference now is that the size of the data set is far larger than what the customer has been running with before.

What you've got is a situation where they want to be able to do more scalable reporting on massive data sets with very, very fast response times. On the reporting side, in terms of the end result to the customer, it is similar to the type of report they are trying to achieve, but the difference is that the quantity of data that they're trying to get at, and the amount of data that these reports are filling up is far greater than what they had before.

That's what's driving a need for a new platform underneath some of the preexisting BI tools that are, in themselves, good at reporting, but what the BI tools need is a data platform beneath them that allows them to do more scalable reporting than you could do before.

Kobielus: I just want to underline that, Sharmila. What Forrester is seeing is that, although the average data warehouse today is in the 1-10 terabyte range for most companies, we foresee the average warehouse size going, in the middle of the coming decade, into the hundreds of terabytes.

In 10 years or so, we think it's possible, and increasingly likely, that petabyte-scale data warehouses or content warehouses will become common. It's all about unstructured information, deep history, and historical information. A lot of trends are pushing enterprises in the direction of big data.

Managing big data

Mulligan: Absolutely. That is obviously the big topic here, which is, how do you manage big data? And, big data could be structured or it could be unstructured. How do you assimilate all this in one platform and then be able to run advanced analytics on this very big data set?

Going back to what Jim discussed on advanced analytics, we see two big themes. One is
the real-time nature of what our customers want to do. There are particular use cases, where what they need is to be able to analyze this data in near real-time, because that's critical to being able to get the insights that they're looking for.

Fraud analytics is a good example of that. Customers have been able to do fraud analytics, but they're running fraud checks after the fact and discovering where fraud took place after the event has happened. Then, they have to go back and recover from that situation. Now, what customers want, is to be able to run fraud analytics in near real-time, so they can catch fraud while it's happening.

What you see is everything from cases in financial services companies related to product fraud, as well as, for example, online gaming sites, where users of the system are collaborating on the site and trying to commit fraud. Those type of scenarios demand a system that can return the fraud analysis data near real-time, so it can block these users from conducting fraud while it's happening.

The other big thing we see is the predictive nature of what customers are trying to do. Jim talked about predictive analytics and modeling analytics. Again, that's a big area that we see massive new opportunity and a lot of new demand. What customers are trying to do there is look at their own customer base to be able to analyze data, so that they can predict trends in the future.

. . . The other big theme we see is the push toward analysis that's really more near real time than what they were able to do before.



For example, what are the buying trends going to be, let's say at Christmas, for consumers who live in a certain area? There is a lot around behavior analysis. In the telco space, we see a lot of deep analysis around trying to model behavior of customers on voice usage of their mobile devices versus data usage.

By understanding some of these patterns and the behavior of the users in more depth, these organizations are now able to better service their customers and offer them new product offerings, new packages, and a higher level or personalization, by understanding the behavior of their customers in more depth.

Predictive analytics is a term that's existed for a while, and is something that customers have been doing, but it's really reaching new levels in terms of the amount of data that they're trying to analyze for predictive analytics, and in the granularity of the analytics itself in being able to deliver deeper predictive insight and models.

As I said, the other big theme we see is the push toward analysis that's really more near real time than what they were able to do before. This is not a trivial thing to do when, it comes to very large data sets, because what you are asking for is the ability to get very, very quick response times and incredibly high performance on terabytes and terabytes of data to be able to get these kind of results in real-time.

Gardner: Jim, these examples that Sharmila has shared aren't just rounding errors. This isn't a movement toward higher efficiency. These are game changers. These are going to make or break your business. This is going to allow you to adjust to a changing economy and to shifting preferences by your customers. We're talking about business fundamentals here.

Social network analysis

Kobielus: We certainly are. Sharmila was discussing behavioral analysis, for example, and talking about carrier services. Let's look at what's going to be a true game changer, not just for business, but for the global society. It's a thing called social network analysis.

It's predictive models, fundamentally, but it's predictive models that are applied to analyzing the behaviors of networks of people on the web, on the Internet, Facebook, and Twitter, in your company, and in various social network groupings, to determine classification and clustering of people around common affinities, buying patterns, interests, and so forth.

As social networks weave their way into not just our consumer lives, but our work lives, our life lives, social network analysis -- leveraging all the core advanced analytics of data mining and text analytics -- will take the place of the focus group. In an online world, everything is virtual. As a company, you're not going to be able, in any meaningful way, to bring together your users into a single room and ask them what they want you to do or provide for them.

What you're going to do, though, is listen to them. You're going to listen to all their tweets and their Facebook updates and you're going to look at their interactions online through your portal and your call center. Then, you're going to take all that huge stream of event information -- we're talking about complex event processing (CEP) -- you're going to bring it into your data warehousing grid or cloud.

You're also going to bring historical information on those customers and their needs. You're going to apply various social network behavioral analytics models to it to cluster people into the categories that make us all kind of squirm when we hear them, things like yuppie and Generation X and so forth. Professionals in the behavioral or marketing world are very good at creating segmentation of customers, based on a broad range of patterns.

They can get a sense of how a product or service is being perceived in real-time, so that the the provider of that product or service can then turn around and tweak that marketing campaign . . .



Social network analysis becomes more powerful as you bring more history into it -- last year, two years, five years, 10 years worth of interactions -- to get a sense for how people will likely respond likely to new offers, bundles, packages, campaigns, and programs that are thrown at them through social networks.

It comes down to things like Sharmila was getting at, simple things in marketing and sales, such as a Hollywood studio determining how a movie is being perceived by the marketplace, by people who go out to the theater and then come out and start tweeting, or even tweeting while they are in the theater -- "Oh, this movie is terrible" or "This movie rocks."

They can get a sense of how a product or service is being perceived in real-time, so that the the provider of that product or service can then turn around and tweak that marketing campaign, the pricing, and incentives in real-time to maximize the yield, the revenue, or profit of that event or product. That is seriously powerful and that's what big data architectures allow you to do.

If you can push not just the analytic models, but to some degree bring transactional applications, such as workflow, into this environment to be triggered by all of the data being developed or being sifted by these models, that is very powerful.

Gardner: We know that things are shifting and changing. We know that we want to get access to the data and analytics. And, we know what powerful things those analytics can do for us. Now, we need to look at how we get there and what's in place that prevents us.

Let's look at this architecture. I'm looking into MapReduce more and more. I am even hearing that people are starting to write MapReduce into their requests for proposals (RFPs), as they're looking to expand and improve their situation. Sharmila, what's wrong with the current environment and why do we need to move into something a bit different?

Moving the data

Mulligan: One of the biggest issues that the preexisting data pipeline faces is that the data lives in a repository that's removed from where the analytics take place. Today, with the existing solutions, you need to move terabytes and terabytes of data through the data pipeline to the analytics application, before you can do your analysis.

There's a fundamental issue here. You can't move boulders and boulders of data to an application. It's too slow, it's too cumbersome, and you're not factoring in all your fresh data in your analysis, because of the latency involved.

One of the biggest shifts is that we need to bring the analytics logic close to the data itself. Having it live in a completely different tier, separate from where the data lives, is problematic. This is not a price-performance issue in itself. It is a massive architectural shift that requires bringing analytics logic to the data itself, so that data is collocated with the analytics itself.

MapReduce, which you brought up earlier, plays a critical role in this. It is a very powerful technology for advanced analytics and it brings capabilities like parallelization to an application, which then allows for very high-performance scalability.

What we see in the market these days are terms like "in-database analytics," "applications inside data," and all this is really talking about the same thing. It's the notion of bringing analytics logic to the data itself.

One of the biggest shifts is that we need to bring the analytics logic close to the data itself.



I'll let Jim add a lot more to that since he has developed a lot of expertise in this area.

Gardner: Jim, are we in a perfect world here, where we can take the existing BI applications and apply them to this new architecture of joining logic and data in proximity, or do we have to come up with whole new applications in order to enjoy this architectural benefit?

Kobielus: Let me articulate in a little bit more detail what MapReduce is and is not. MapReduce is, among other things, a set of extensions to SQL -- SQL/MapReduce (SQL/MR). So, you can build advanced analytic logic using SQL/MR that can essentially do the data prep, the data transformations, the regression analyses, the scoring, and so forth, against both structured data in your relational databases and unstructured data, such as content that you may source from RSS feeds and the like.

To the extent that we always, or for a very long time, have been programming database applications and accessing the data through standard SQL, SQL/MR isn't radically different from how BI applications have traditionally been written.

Maximum parallelization

But, these are extensions and they are extensions that are geared towards enabling maximum parallelization of these analytic processes, so that these processes can then be pushed out and be executed, not just in-databases, but in file systems, such as the Hadoop Distributed File System, or in cloud data warehouses.

MapReduce, as a programming model and as a language, in many ways, is agnostic as to the underlying analytic database, file system, or cloud environment where the information, as a whole lives, and how it's processed.

But no, you can't take your existing BI applications, in terms of the reporting, query, dashboarding, and the like, transparently move them, and use MapReduce without a whole lot of rewriting of these applications.

You can't just port your existing BI applications to MapReduce and database analytics. You're going to have to do some conversions, and you're going to have to rewrite your applications to take advantage of the parallelism that SQL/MR enables.

MapReduce, in many ways, is geared not so much for basic analytics. It's geared for advanced analytics. It's data mining and text mining. In many ways, MapReduce is the first open framework that the industry has ever had for programming the logic for both data mining and text mining in a seamless way, so that those two types of advanced analytic applications can live and breathe and access a common pool of complex data.

In the marriage of SQL with MapReduce, the real intent is to bring the power of MapReduce to the enterprise, so that SQL programmers can now use that technology.



MapReduce is an open standard that Aster clearly supports, as do a number of other database and data warehousing vendors. In the coming year and the coming decade, MapReduce and Hadoop -- and I won't go to town on what Hadoop is -- will become fairly ubiquitous within the analytics arena. And, that’s a good thing.

So, any advanced analytic logic that you build in one tool, in theory, you can deploy and have it optimized for execution in any MapReduce-enabled platform. That’s the promise. It’s not there yet. There are a lot of glitches, but that’s the strong promise.

Mulligan: I'd like to add a little bit to that Dana. In the marriage of SQL with MapReduce, the real intent is to bring the power of MapReduce to the enterprise, so that SQL programmers can now use that technology. MapReduce alone does require some sophistication in terms of programming skills to be able to utilize it. You may typically find that skill set in Web 2.0 companies, but often you don’t find developers who can work with that in the enterprise.

What you do find in enterprise organizations is that there are people who are very proficient at SQL. By bringing SQL together with MapReduce what enterprise organizations have is the familiarity of SQL and the ease of using SQL, but with the power of MapReduce analytics underneath that. So, it’s really letting SQL programmers leverage skills they already have, but to be able to use MapReduce for analytics.

Important marriage

Over time, of course, it’s possible that there will be more expertise developed within enterprise organizations to use MapReduce natively, but at this time and, we think, in the next couple of years, the SQL/MapReduce marriage is going to be very important to help bring MapReduce across the enterprise.

Hadoop, itself, obviously is an interesting platform too in being able to store lots of data cost effectively. However, often customers will also want some of the other characteristics of a data warehouse, like workload management, failover, backup recovery, etc., that the technology may not necessarily provide.

MapReduce right now, available with massive parallel processing (MPP), the new generation of MPP data warehouse is such a vast data solution, does bring kind of the best of both worlds. It brings what companies need in terms of the enterprise data warehouse capabilities. It lets you put application logic near data, as we talked about earlier. And, it brings MapReduce, but through the SQL/MapReduce framework, which really primarily is designed to ease adoption and use of MapReduce within the enterprise.

Gardner: Jim, we are on a journey. It’s going to be several years before we are getting to where we want to go, but there is more maturity in some areas than others. And, there is an opportunity to take technologies that are available now and do some real strong business outcomes and produce those outcomes.

Give me a sense of where you see the maturity of the architecture, of the SQL, and the tools and making these technologies converge? Who is mature? How is this shaking out a little bit?

Kobielus: Maturity is a best practice, in this case in-database analytics. As I said, it’s widely supported through proprietary approaches by many vendors.

In terms of the maturity, it's judged by adoption of an open industry framework with cross-vendor interoperability.



In terms of the maturity, it's judged by adoption of an open industry framework with cross-vendor interoperability. it's not mature yet, in terms of MapReduce and Hadoop. There are pioneering vendors like Aster, but there are a significant number of established big data warehousing vendors that have varying degrees of support now or in the near future for these frameworks. We're seeing strong indications. In fact, Teradata already is rolling out MapReduce and Hadoop support in their data warehousing offerings.

We're not yet seeing a big push from Oracle, or from Microsoft for that matter, in the direction of support for MapReduce or Hadoop, but we at Forrester believe that both of those vendors, in particular, will come around in 2010 with greater support.

IBM has made significant progress with its support for Hadoop and MapReduce, but it hasn’t yet been fully integrated into that particular vendor's platform.

Looking to 2010, 2011

If we look at a broad range of other data warehousing vendors like Sybase, Greenplum, and others, most vendors have it on their roadmap. To some degree, various vendors have these frameworks in in development right now. I think 2010 and 2011 are the years when most of the data warehousing and also data mining vendors will begin to provide mature, interoperable implementations of these standards.

There is a growing realization in the industry that advanced analytics is more than just being able to mine information at rest, which is what MapReduce and Hadoop are geared to doing. You also need to be able to mine and do predictive analytics against data in motion. That’s CEP. MapReduce and Hadoop are not really geared to CEP applications of predictive modeling.

There needs to be, and there will be over the next five years or so, a push in the industry to embed MapReduce and Hadoop. There are few vendors that are showing some progress toward CEP predictive modeling, but it’s not widely supported yet, and it’s in proprietary approaches.

In this coming decade, we're going to see predictive logic deployed into all application environments, be they databases, clouds, distributed file systems, CEP environments, business process management (BPM) systems, and the like. Open frameworks will be used and developed under more of a service-oriented architecture (SOA) umbrella, to enable predictive logic that’s built in any tool to be deployed eventually into any production, transaction, or analytic environment.

It will take at least 3 to 10 years for a really mature interoperability framework to be developed, for industry to adopt it, and for the interoperability issues to be worked out.



It will take at least 3 to 10 years for a really mature interoperability framework to be developed, for industry to adopt it, and for the interoperability issues to be worked out. It’s critically important that everybody recognizes that big data, at rest and in motion, needs to be processed by powerful predictive models that can be deployed into the full range of transactional applications, which is where the convergence of big data, analytics, and transactions come in.

Data warehouses, as the core of your analytics environment, need to evolve to become in their own right application servers that can handle both the analytic applications or traditional data warehousing in BI and data mining, as well as the transactional logic, and really handle it all with full security and workload isolation, failover, and so forth in a way that’s seamless.

I'm really excited, for example, by what Aster has rolled out with their latest generation, 4.0 of the Data-Application Server. I see a little bit of progress by Oracle on the Exadata V2. I'm looking forward to seeing if other vendors follow suit and provide a cloud-based platform for a broad range of transactional analytics.

Gardner: Sharmila, Jim has painted a very nice picture of where he expects things to go. He mentioned Aster Data 4.0. Tell us a little bit about that, and where you see the stepping stones lining up.

Mulligan: As I mentioned earlier, one of the biggest requirements in order to be able to do very advanced analytics on terabyte- and petabyte-level data sets, is to bring the application logic to the data itself. Earlier, I described why you need to do this. You want to eliminate as much data movement as possible, and you want to be able to do this analysis in as near real-time as possible.

What we did in Aster Data 4.0 is just that. We're allowing companies to push their analytics applications inside of Aster’s MPP database, where now you can run your application logic next to the data itself, so they are both collocated in the same system. By doing so, you've eliminated all the data movement. What that gives you is very, very quick and efficient access to data, which is what's required in some of these advanced analytics application examples we talked about.

Pushing the code

What kind of applications can you push down into the system? It can be any app written in Java, C, C++, Perl, Python, .NET. It could be an existing custom application that an organization has written and that they need to be able to scale to work on much larger data sets. That code can be pushed down into the apps database.

It could be a new application that a customer is looking to write to do a level of analysis that they could not do before, like real-time fraud analytics, or very deep customer behavior analysis. If you're trying to deliver these new generations of advanced analytics apps, you would write that application in the programming language of your choice.

You would push that application down into the Aster system, all your data would live inside of the Aster MPP database, and the application would run inside of the same system collocated with the data.

In addition to that, it could be a packaged application. So, it could be an application like software as a service (SaaS) that you want to scale to be able to analyze very large data sets. So, you could push a packaged application inside the system as well.

One of the fundamental things that we leverage to allow you to do more powerful analytics with these applications is MapReduce. You don’t have to MapReduce enable an application when you push it down into the apps system, but you could choose to and, by doing so, you automatically parallelize the application, which gives you very high performance and scalability when it comes to accessing large datasets. You also then leverage some of the analytics capabilities of MapReduce that are not necessary inherent in something like SQL.

That's a very attractive feature, because fundamentally the data warehousing cloud is an analytic application server.



The key components of 4.0 drive to where it's providing you a platform that can efficiently and cost effectively store massive amounts of data, plus give you a platform that allows you to do very advanced and sophisticated analytics. To run through those key things that we've done in 4.0, is first, the ability to push applications inside the system, so apps are collocated with the data.

We also offer SQL/MapReduce as the interface. Business analysts who are working with this application on a regular basis don’t have to learn MapReduce. They can use SQL/MR and leverage their existing SQL skills to work with that app. So, it makes it very easy for any number of business analysts in the organization to leverage their preexisting SQL skills and work with this app that's pushed down into the system.

Finally, in order to support the ability to run application inside a data, which as I said earlier is nontrivial, we added fundamental new capabilities like Dynamic Mix Workload Management. Workload Management in the Aster system works not just on data queries, but on the application processes as well, so you can balance workloads when you have a system that's managing data and applications.

Kobielus: Sharmila, I think the greatest feature of the 4.0 is simply the ability to run predictive models developed in SaaS or other tools in their native code without converting them necessarily to SQL/MR. That means that your customers can then leverage that huge installed piece of intellectual property or pool of intellectual property, all those models, bring it in, and execute it natively within your distributed grid or cloud, as a way of avoiding having to do that rewrite. Or, if they wish, they can migrate them or convert them over to SQL/MR. It's up to them.

That's a very attractive feature, because fundamentally the data warehousing cloud is an analytic application server. Essentially, you want that ability to be able to run disparate legacy models in parallel. That's just a feature that needs to be adopted by the industry as a whole.

The customer decides

Mulligan: Absolutely. I do want to clarify that the Aster 4.0 solution can be deployed in the cloud, or it can be installed in a standard implementation on-premise, or it could be adopted in an appliance mode. We support all three. It's up to the customer which of those deployment models they need or prefer.

To talk in a little bit more detail about what Jim is referring to, the ability to take an existing app, have to do absolutely no rewrite, and push that application down is, of course, very powerful to customers. It means that they can immediately take an analytics app they already have and have it operate on much larger data sets by simply taking that code and pushing it down.

That can be done literally within a day or two. You get the Aster system, you install it, and then, by the second day, you could be pushing your application down.

If you choose to leverage the MapReduce analytics capabilities, then as I said earlier, you would MapReduce enable an app. This simply means you take your existing application and, again, you don’t have to do any rewrite of that logic. You just add MapReduce functions to it and, by doing so, you have now MapReduce-enabled it. Then, you push it down and you have SQL/MR as an interface to that app.

The process of MapReduce enabling an app also is very simple. It's a couple of days process. This is not something that takes weeks and weeks to do. It literally can be done in a couple of days.

It means that they can immediately take an analytics app they already have and have it operate on much larger data sets by simply taking that code and pushing it down.



We had a retailer recently who took an existing app that they had already written, a new type of analytics application that they wanted to deploy. They simply added MapReduce capabilities to it and pushed it down into the Aster system, and it's now operating on very, very large data sets, and performing analytics that they weren't able to originally do.

The ease of application push down and the ease of MapReduce enabling is definitely key to what we have done in 4.0, and it allows companies to realize the value of this new type of platform right away.

Gardner: I know it's fairly early in the roll out. Do you have any sense of metrics, from some of these users? What do they get back? We talked earlier in the examples about what could be done and what should be done nowadays with analysis. Do you have any sense of what they have able to do with 4.0?

Reducing processing times

Mulligan: For example, we have talked about customers like comScore who are processing 1.6 billion rows of data on a regular basis, and their data volumes continue to grow. They have many business analysts who operate the system and run reports on a daily basis, and they are able to get results very quickly on a large data set.

We have customers who have gone from 5-10 minute processing times on their data set, to 5 seconds, as a result of putting the application inside of the system.

We have had fraud applications that would take 60-90 minutes to run in the traditional approach, where the app was running outside the database, and now those applications run in 60-90 seconds.

Literally, by collocating your application logic next to the data itself, you can see that you are immediately able to go from many minutes of processing time, down to seconds, because you have eliminated all the data movement altogether. You don’t have to move terabytes of data.

Add to that the fact that you can now access terabyte-sized data sets, versus what customers have traditionally been left with, which is only the ability to process data sets in the order of several tens of gigabytes or hundreds of gigabytes. Now, we have telcos, for example, processing four- or five-terabyte data sets with very fast response time.

We're talking about a collision of two cultures, or more than two cultures. Data warehousing professionals and data mining professionals live in different worlds, as it were.



It's the volume of data, the speed, the acceleration, and response time that really provide the fundamental value here. MapReduce, over and above that, allows you to bring in more analytics power.

Gardner: A final word to you, Jim Kobielus. This really is a good example of how convergence is taking place at a number of different levels. Maybe you could give us an insight into where you see convergence happening, and then we'll have to leave it there.

Kobielus: First of all, with convergence the flip side is collision. I just want to point out a few issues that enterprises and users will have to deal with, as they move toward this best practice called in-database analytics and convergence of the transactions and analytics.

We're talking about a collision of two cultures, or more than two cultures. Data warehousing professionals and data mining professionals live in different worlds, as it were. They quite often have an arm's length relationship to each other. The data warehouse traditionally is a source of data for advanced analytics.

This new approach will require a convergence, rapprochement, or a dialog to be developed between these two groups, because ultimately the data warehouse is where the data mining must live. That's going to have to take place, that coming together of the tribes. That's one of the best emerging practices that we're recommending to Forrester clients in that area.

Common framework

Also, transaction systems -- enterprise resource planning (ERP) and customer relationship management (CRM) -- and analytic systems -- BI and data warehousing -- are again two separate tribes within your company. You need to bring together these groups to work out a common framework for convergence to be able to take advantage of this powerful new architecture that Sharmila has sketched out here.

Much of your transactional logic will continue to live on source systems, the ERP, CRM, supply chain management, and the like. But, it will behoove you, as an organization, as a user to move some transactional logic, such as workflow, in particular, into the data warehousing cloud to be driven by real-time analytics and KPIs, metrics, and messages that are generated by inline models built with MapReduce, and so forth, and pushed down into the warehousing grid or cloud.

Workflow, and especially rules engines, increasingly we will find to be tightly integrated or brought into a warehousing or analytics cloud that's got inline logic.

Another key trend for convergence is that data mining and text mining are coming together as a single discipline. When you have structured and unstructured sources of information or you have unstructured information from new sources like social networks and Twitter, Facebook, and blogs, it's critically important to bring it together into your data mining environment. A key convergence also is that data at rest and data in motion are converging, and so a lot of this will be real-time event processing.

Those are the key convergence and collision avenues that we are looking at going forward.

Gardner: Very good. We've been discussing how new architectures for data and logic processing are ushering in this game-changing era of advanced analytics. We've been joined by Jim Kobielus, senior analyst at Forrester Research. Thanks so much, Jim.

Kobielus: No problem. I enjoyed it.

Gardner: Also, we have been talking with Sharmila Mulligan, executive vice president of marketing at Aster Data. Thank you Sharmila.

Mulligan: Thanks so much, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Aster Data Systems.

Transcript of a BriefingsDirect podcast on how new advances in collocating applications with data architecturally provides analytics performance breakthroughs. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in:

Monday, December 21, 2009

HP's Cloud Assure for Cost Control Takes Elastic Capacity Planning to Next Level

Transcript of a BriefingsDirect podcast on the need to right-size and fine-tune applications for maximum benefits of cloud computing.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Learn more. Download the transcript. Sponsor: Hewlett-Packard.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on the economic benefits of cloud computing -- of how to use cloud-computing models and methods to control IT cost by better supporting application workloads.

Traditional capacity planning is not enough in cloud-computing environments. Elasticity planning is what’s needed. It’s a natural evolution of capacity planning, but it’s in the cloud.

We'll look at how to best right-size applications, while matching service delivery resources and demands intelligently, repeatedly, and dynamically. The movement to pay-per-use model also goes a long way to promoting such matched resources and demand, and reduces wasteful application practices.

We'll also examine how quality control for these applications in development reduces the total cost of supporting applications, while allowing for a tuning and an appropriate way of managing applications in the operational cloud scenario.

To unpack how Cloud Assure services can take the mystique out of cloud computing economics and to lay the foundation for cost control through proper cloud methods, we're joined by Neil Ashizawa, manager of HP's Software-as-a-Service (SaaS) Products and Cloud Solutions. Welcome to BriefingsDirect, Neil.

Neil Ashizawa: Thanks very much, Dana.

Gardner: As we've been looking at cloud computing over the past several years, there is a long transition taking place of moving from traditional IT and architectural method to this notion of cloud -- be it private cloud, at a third-party location, or through some combination of the above.

Traditional capacity planning therefore needs to be refactored and reexamined. Tell me, if you could, Neil, why capacity planning, as people currently understand it, isn’t going to work in a cloud environment?

Ashizawa: Old-fashioned capacity planning would focus on the peak usage of the application, and it had to, because when you were deploying applications in house, you had to take into consideration that peak usage case. At the end of the day, you had to be provisioned correctly with respect to compute power. Oftentimes, with long procurement cycles, you'd have to plan for that.

In the cloud, because you have this idea of elasticity, where you can scale up your compute resources when you need them, and scale them back down, obviously that adds another dimension to old-school capacity planning.

Elasticity planning

The new way look at it within the cloud is elasticity planning. You have to factor in not only your peak usage case, but your moderate usage case and your low level usage case as well. At the end of the day, if you are going to get the biggest benefit of cloud, you need to understand how you're going to be provisioned during the various demands of your application.

Gardner: So, this isn’t just a matter of spinning up an application and making sure that it could reach a peak load of some sort. We have a new kind of a problem, which is how to be efficient across any number of different load requirements?

Ashizawa: That’s exactly right. If you were to take, for instance, the old-school capacity-planning ideology to the cloud, what you would do is provision for your peak use case. You would scale up your elasticity in the cloud and just keep it there. If you do it that way, then you're negating one of the big benefits of the cloud. That's this idea of elasticity and paying for only what you need at that moment.

If I'm at a slow period of my applications usage, then I don’t want to be over provisioned for my peak usage. One of the main factors why people consider sourcing to the cloud is because you have this elastic capability to spin up compute resources when usage is high and scale them back down when the usage is low. You don’t want to negate that benefit of the cloud by keeping your resource footprint at its highest level.

Gardner: I suppose also the holy grail of this cloud-computing vision that we've all been working on lately is the idea of being able to spin up those required instances of an application, not necessarily in your private cloud, but in any number of third-party clouds, when the requirements dictate that.

Ashizawa: That’s correct.

Gardner: Now, we call that hybrid computing. Is what you are working on now something that’s ready for hybrid or are you mostly focused on private-cloud implementation at this point?

Ashizawa: What we're bringing to the market works in all three cases. Whether you're a private internal cloud, doing a hybrid model between private and public, or sourcing completely to a public cloud, it will work in all three situations.

Gardner: HP announced, back in the spring of 2009, a Cloud Assure package that focused on things like security, availability, and performance. I suppose now, because of the economy and the need for people to reduce cost, look at the big picture about their architectures, workloads, and resources, and think about energy and carbon footprints, we've now taken this a step further.

Perhaps you could explain the December 2009 announcement that HP has for the next generation or next movement in this Cloud Assure solution set.

Making the road smoother

Ashizawa: The idea behind Cloud Assure, in general, is that we want to assist enterprises in their migration to the cloud and we want to make the road smoother for them.

Just as you said, when we first launched Cloud Assure earlier this year, we focused on the top three inhibitors, which were security of applications in the cloud, performance of applications in the cloud, and availability of applications in the cloud. We wanted to provide assurance to enterprises that their applications will be secure, they will perform, and they will be available when they are running in the cloud.

The new enhancement that we're announcing now is assurance for cost control in the cloud. Oftentimes enterprises do make that step to the cloud, and a big reason is that they want to reap the benefits of the cost promise of the cloud, which is to lower cost. The thing here, though, is that you might fall into a situation where you negate that benefit.

If you deploy an application in the cloud and you find that it’s underperforming, the natural reaction is to spin up more compute resources. It’s a very good reaction, because one of the benefits of the cloud is this ability to spin up or spin down resources very fast. So no more procurement cycles, just do it and in minutes you have more compute resources.

The situation, though, that you may find yourself in is that you may have spun up more resources to try to improve performance, but it might not improve performance. I'll give you a couple of examples.

You can find yourself in a situation where your application is no longer right-sized in the cloud, because you have over-provisioned your compute resources.



If your application is experiencing performance problems because of inefficient Java methods, for example, or slow SQL statements, then more compute resources aren't going to make your application run faster. But, because the cloud allows you to do so very easily, your natural instinct may be to spin up more compute resources to make your application run faster.

When you do that, you find yourself in is a situation where your application is no longer right-sized in the cloud, because you have over provisioned your compute resources. You're paying for more compute resources and you're not getting any return on your investment. When you start paying for more resources without return on your investment, you start to disrupt the whole cost benefit of the cloud.

Gardner: I think we need to have more insight into the nature of the application, rather than simply throwing additional instances of the application. Is that it at a very simple level?

Ashizawa: That’s it at a very simple level. Just to make it even simpler, applications need to be tuned so that they are right-sized. Once they are tuned and right-sized, then, when you spin up resources, you know you're getting return on your investment, and it’s the right thing to do.

Gardner: Can we do this tuning with existing applications -- you mentioned Java apps, for example -- or is this something for greenfield applications that we are creating newly for these cloud scenarios?

Java and .NET

Ashizawa: Our enhancement to Cloud Assure, which is Cloud Assure for cost control, focuses more on the Java and the .NET type applications.

Gardner: And those would be existing applications or newer ones?

Ashizawa: Either. Whether you have existing applications that you are migrating to the cloud, or new applications that you are deploying in the cloud, Cloud Assure for cost control will work in both instances.

Gardner: Is this new set software, services, both? Maybe you could describe exactly what it is that you are coming to market with.

Ashizawa: Cloud Assure for cost control solution comprises both HP Software and HP Services provided by HP SaaS. The software itself is three products that make up the overall solution.

Once you've right-sized it, you know that when you scale up your resources you're getting return on your investment.



The first one is our industry-leading Performance Center software, which allows you to drive load in an elastic manner. You can scale up the load to very high demands and scale back load to very low demand, and this is where you get your elasticity planning framework.

The second solution from a software’s perspective is HP SiteScope, which allows you to monitor the resource consumption of your application in the cloud. Therefore, you understand when compute resources are spiking or when you have more capacity to drive even more load.

The third software portion is HP Diagnostics, which allows you to measure the performance of your code. You can measure how your methods are performing, how your SQL statements are performing, and if you have memory leakage.

When you have this visibility of end user measurement at various load levels with Performance Center, resource consumption with SiteScope, and code level performance with HP Diagnostics, and you integrate them all into one console, you allow yourself to do true elasticity planning. You can tune your application and right-size it. Once you've right-sized it, you know that when you scale up your resources you're getting return on your investment.

All of this is backed by services that HP SaaS provides. We can perform load testing. We can set up the monitoring. We can do the code level performance diagnostics, integrate that all into one console, and help customers right-size the applications in the cloud.

Gardner: That sounds interesting, and, of course, harkens back to the days of distributed computing. We're just adding another level of complexity, that is to say, a sourcing continuum of some sort that needs to be managed as well. It seems to me that you need to start thinking about managing that complexity fairly early in this movement to cloud.

Ashizawa: Definitely. If you're thinking about sourcing to the cloud and adopting it, from a very strategic standpoint, it would do you good to do your elasticity planning before you go into production or you go live.

Tuning the application

The nice thing about Cloud Assure for cost control is that, if you run into performance issues after you have gone live, you can still use the service. You could come in and we could help you right-size your application and help you tune it. Then, you can start getting the global scale you wish at the right cost.

Gardner: One of the other interesting aspects of cloud is that it affects both design time and runtime. Where does something like the Cloud Assure for cost control kick in? Is it something that developers should be doing? Is it something you would do before you go into production, or if you are moving from traditional production into cloud production, or maybe all the above?

Ashizawa: All of the above. HP definitely recommends our best practice, which is to do all your elasticity planning before you go into production, whether it’s a net new application that you are rolling out in the cloud or a legacy application that you are transferring to the cloud.

Given the elastic nature of the cloud, we recommend that you get out ahead of it, do your proper elasticity planning, tune your system, and right-size it. Then, you'll get the most optimized cost and predictable cost, so that you can budget for it.

One of the side benefits obviously to right-sizing applications and controlling cost is to mitigate risk.



Gardner: It also strikes me, Neil, that we're looking at producing a very interesting and efficient feedback loop here. When we go into cloud instances, where we are firing up dynamic instances of support and workloads for application, we can use something like Cloud Assure to identify any shortcomings in the application.

We can take that back and use that as we do a refresh in that application, as we do more code work, or even go into a new version or some sort. Are we creating a virtual feedback loop by going into something like Cloud Assure?

Ashizawa: I can definitely see that being that case. I'm sure that there are many situations where we might be able to find something inefficient within the code level layer or within the database SQL statement layer. We can point out problems that may not have surfaced in an on-premise type deployment, where you go to the cloud, do your elasticity planning, and right-size. We can uncover some problems that may not have been addressed earlier, and then you can create this feedback loop.

One of the side benefits obviously to right-sizing applications and controlling cost is to mitigate risk. Once you have elasticity planned correctly and once you have right-sized correctly, you can deploy with a lot more confidence that your application will scale to handle global class and support your business.

Gardner: Very interesting. Because this is focused on economics and cost control, do we have any examples of where this has been put into practice, where we can examine the types of returns? If you do this properly, if you have elasticity controls, if you are doing planning, and you get across this life cycle, and perhaps even some feedback loops, what sort of efficiencies are we talking about? What sort of cost reductions are possible?

Ashizawa: We've been working with one of our SaaS customers, who is doing more of a private-cloud type implementation. What makes this what I consider a private cloud is that they are testing various resource footprints, depending on the load level.

They're benchmarking their application at various resource footprints. For moderate levels, they have a certain footprint in mind, and then for their peak usage, during the holiday season, they have an expanded footprint in mind. The idea here is that, they want to make sure they are provisioned correctly, so that they are optimizing their cost correctly, even in their private cloud.

Moderate and peak usage

We have used our elastic testing framework, driven by Performance Center, to do both moderate levels and peak usage. When I say peak usage, I mean thousands and thousands of virtual users. What we allow them to do is that true elasticity planning.

They've been able to accomplish a couple of things. One, they understand what benchmarks and resource footprints they should be using in their private cloud. They know that they are provisioned perfectly at various load levels. They know that, because of that, they're getting all of the cost benefits of their private cloud At the end of the day, they're mitigating their business risk by ensuring that their application is going to scale to their global cost scale to support their holiday season.

Gardner: And, they're going to be able to scale, if they use cloud computing, without necessarily having to roll out more servers with a forklift. They could find the fabric either internally or with partners, which, of course, has a great deal of interest from the bean counter side of things.

Ashizawa: Exactly. Now, we're starting to relay this message and target customers that have deployed applications in the public cloud, because we feel that the public cloud is where you may fall into that trap of spinning up more resources when performance problems occur, where you might not get the return on your investment.

So as more enterprises migrate to the cloud and start sourcing there, we feel that this elasticity planning with Cloud Assure for cost control is the right way to go.

Once it’s predictable, then there will be no surprises. You can budget for it and you could also ensure that you are getting the right performance at the right price.



Gardner: Also, if we're billing people either internally or through these third-parties on a per-use basis, we probably want to encourage them to have a robust application, because to spin up more instances of that application is going to cost us directly. So, there is also a built-in incentive in the pay-per-use model toward these more tuned, optimized, and planned-for cloud types of application.

Ashizawa: You said it better than I could have ever said it. You used the term pay-per-use, and it’s all about the utility-based pricing that the cloud offers. That’s exactly why this is so important, because whenever it’s utility based or pay-per-use, then that introduces this whole notion of variable cost. It’s obviously going to be variable, because what you are using is going to differ between different workloads.

So, you want to get a grasp of the variable-cost nature of the cloud, and you want to make this variable cost very predictable. Once it’s predictable, then there will be no surprises. You can budget for it and you could also ensure that you are getting the right performance at the right price.

Gardner: Neil, is this something that’s going to be generally available in some future time, or is this available right now at the end of 2009?

Ashizawa: It is available right now.

Gardner: If people were interested in pursuing this concept of elasticity planning, of pursuing Cloud Assure for cost benefits, is this something that you can steer them to, even if they are not quite ready to jump into the cloud?

Ashizawa: Yes. If you would like more information for Cloud Assure for cost control, there is a URL that you can go to. Not only can you get more information on the overall solution, but you can speak to someone who can help you answer any questions you may have.

Gardner: Let's look to the future a bit before we close up. We've looked at cloud assurance issues around security, performance, and availability. Now, we're looking at cost control and elasticity planning, getting the best bang for the buck, not just by converting an old app, sort of repaving an old cow path, if you will, but thinking about this differently, in the cloud context, architecturally different.

What comes next? Is there another shoe to fall in terms of how people can expect to have HP guide them into this cloud vision?

Ashizawa: It’s a great question. Our whole idea here at HP and HP Software-as-a-Service is that we're trying to pave the way to the cloud and make it a smoother ride for enterprises that are trying to go to the cloud.

So, we're always tackling the main inhibitors and the main obstacles that make it more difficult to adopt the cloud. And, yes, where once we were tackling security, performance, and availability, we definitely saw that this idea for cost control was needed. We'll continue to go out there and do research, speak to customers, understand what their other challenges are, and build solutions to address all of those obstacles and challenges.

Gardner: Great. We've been talking about moving from traditional capacity planning towards elasticity planning, and a series of announcements from HP around quality and cost controls for cloud assurance and moving to cloud models.

To better understand these benefits, we've been talking with Neil Ashizawa, manager of HP's SaaS Products and Cloud Solutions. Thanks so much, Neil.

Ashizawa: Thank you very much.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Learn more. Download the transcript. Sponsor: Hewlett-Packard.

Transcript of a BriefingsDirect podcast on the need to right-size and fine-tune applications for maximum benefits of cloud computing. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.

You may also be interested in:

Friday, December 18, 2009

Careful Advance Planning Averts Costly Snafus in Data Center Migration Projects

Transcript of a sponsored BriefingsDirect podcast on proper planning for data-center transformation and migration.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Hewlett-Packard.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on the crucial migration phase when moving or modernizing data centers. So much planning and expensive effort goes into building new data centers or in conducting major improvements to existing ones, but too often there's short shrift in the actual "throwing of the switch" -- in the moving and migrating existing applications and data.

But, as new data center transformations pick up -- due to the financial pressures to boost overall IT efficiency -- so too should the early-and-often planning and thoughtful execution of the migration itself get proper attention. Therefore, our podcast at hand examines the best practices, risk mitigation tools, and requirements for conducting data center migrations properly, in ways that ensure successful overall data center improvement.

To help pave the way to making data center migrations come off nearly without a hitch, we're joined by three thought leaders from Hewlett-Packard (HP). Please join me in welcoming Peter Gilis, data center transformation architect for HP Technology Services. Welcome to the show, Peter.

Peter Gilis: Thank you. Hello, everyone.

Gardner: We're also joined by John Bennett, worldwide director, Data Center Transformation Solutions at HP. Welcome back, John.

John Bennett: Thank you very much, Dana. It's a delight to be here.

Gardner: Arnie McKinnis, worldwide product marketing manager for Data Center Modernization at HP Enterprise Services. Thanks for joining us, Arnie.

Arnie McKinnis: Thank you for including me, Dana. I appreciate it.

Gardner: John, tell me why migration, the process around the actual throwing of the switch -- and the planning that leads up to that -- are so essential nowadays?

New data centers

Bennett: Let's start by taking a look at why this has arisen as an issue. It makes the reasons almost self-evident. We see a great deal of activity in the marketplace right now of people designing and building new data centers. Of course for everyone who has successfully built their new data center, they have this wonderful new showcase site, and they have to move into it.

The reasons for this growth, the reasons for moving to other data centers, are fueled by a lot of different activities. Oftentimes, multiple factors come into play at the same organization.

In many cases it's related to growth. The organization and the business have been growing. The current facilities were inadequate for purpose, because of space or energy capacity reasons or because they were built 30 years ago, and so the organization decides that it has to either build a new data center or perhaps make use of a hosted data center. As a result, they are going to have to move into it.

It might be that they're engaged in a data-center strategy project as part of a data-center transformation, where they might have had too many data centers -- that was the case at Hewlett-Packard -- and consciously decided that they wanted to have fewer data centers built for the purposes of the organization. Once that strategy is put into place and executed, then, of course, they have to move into it.

We see in many cases that customers are looking at new data centers -- either ones they've built or are hosted and managed by others -- because of green strategy and green initiatives. They see that as a more cost-effective way for them to meet their green initiatives than to build their own data centers.

There are, of course, cost reductions. In many cases, people are investing in these types of activities on the premise that they will save substantial CAPEX and OPEX cost over time by having invested in new data centers or in data center moves.

Whether they're moving to a data center they own, moving to a data center owned and managed by someone else, or outsourcing their data center to a vendor like HP, in all cases you have to physically move the assets of the data center from one location to another.

The impact of doing that well is awfully high. If you don't do it well, you're going to impact the services provided by IT to the business. You're very likely, if you don't do it well, to impact your service level agreements (SLAs). And, should you have something really terrible happen, you may very well put your own job at risk.

So, the objective here is not only to take advantage of the new facilities or the new hosted site, but also to do so in a way that ensures the right continuity of business services. That ensures that service levels continue to be met, so that the business, the government, or the organization continues to operate without disruption, while this takes place. You might think of it, as our colleagues in Enterprise Services have put it, as changing the engine in the aircraft while it's flying.

Gardner: Peter, tell me, when is the right time to begin planning for this migration?

Migration is the last phase

Gilis: The planning starts, when you do a data-center transformation, and migration is actually the last phase of that data center transformation. The first thing that you do is a discovery, making sure that you know all about the current environment, not only the servers, the storage, and the network, but the applications and how they interact. Based on that, you decide how the new data center should look.

John, here is something where I do not completely agree with you. Most of the migrations today are not migration of the servers, the assets, but actually migration of the data. You start building a next-generation data center, most of the time with completely new assets that better fit what your company wants to achieve. This is not always possible, when your current environment is something like four or five years old, or sometimes even much older than that.

Gardner: Peter, how do you actually pull this off? How do you get that engine changed on the plane while keeping it flying? Obviously, most companies can't afford to go down for a week while this takes place.

Gilis: You should look at it in different ways. If you have a disaster strategy, then you have multiple days to recover. Actually, if you plan the disaster in a good fashion, then it will be easy to migrate.

On the other side, if you build your new engine, your new data center, and you have all the new equipment inside, the only thing that you need to do is migrate the data. There are a lot of techniques to migrate data online, or at least synchronize current data in the current data centers with the new data center.

Usually, what you find out is that you did not do a good enough job of assessing the current situation, whether that was the assessment of a hardware platform, server platform, or the assessment of a facility.



So, the moment you switch off the computer in the first data center, you can immediately switch it on in the new data center. It may not be changing the engines online, but at least near-online.

Gardner: Arnie, tell me about some past disasters that have given us insights into how this should go properly? Are there any stories that come to mind about how not to do this properly?

McKinnis: There are all sorts of stories around not doing it properly. In most cases, you start doing the decompose of what went wrong during a project. Usually, what you find out is that you did not do a good enough job of assessing the current situation, whether that was the assessment of a hardware platform, server platform, or the assessment of a facility.

It may even be as simple as looking at a changeover process that is currently in place seeing how that affects what is going to be the new changeover process. Potentially, there is some confusion. But it usually all goes back to not doing a proper assessment of the current mode of operations, or the current mode of that operating platform as it exists today.

Gardner: Now, Arnie, this must provide to you a unique opportunity -- as organizations are going to be moving from one data center to another -- to take a hard look at what they have got. I'm going to assume that not everything is going to go to the new data center.

Perhaps you're going to take an opportunity to sunset some apps, replace some with commodity services, or outsource others. So, this isn't just a one-directional migration. We're probably talking about a multi-headed dragon going in multiple directions. Is that the case?

Thinking it through

McKinnis: It's always the case. That's why, from Enterprise Services' standpoint, we look at it from who is going to manage it, if the client hasn't completely thought that out? In other words, potentially they haven't thought out the full future mode of what they want their operating environment to look like.

We're not necessarily talking about starting from a complete greenfield, but people have come to us in the past and said, "We want to outsource our data centers." Our next logical question is, "What do you mean by that?"

So, you start the dialog that goes down that path. And, on that path you may find out that what they really want to do is outsource to you, maybe not only their mission-critical applications, but also the backup and the disaster recovery of those applications.

When they first thought about it, maybe they didn't think through all of that. From an outsourcing perspective, companies don't always do 100 percent outsourcing of that data-center environment or that shared computing environment. It may be part of it. Part of it they keep in-house. Part of it they host with another service provider.

What becomes important is how to manage all the multiple moving parts and the multiple service providers that are going to be involved in that future mode of operation. It's accessing what we currently have, but it's also designing what that future mode needs to look like.

What becomes important is how to manage all the multiple moving parts and the multiple service providers that are going to be involved in that future mode of operation.



Gardner: Back to you, Peter. You mentioned the importance of data, and I imagine that when we go from traditional storage to new modes of storage, storage area networks (SANs) for example, we've got a lot of configuration and connection issues with how storage and data are used in conjunction with applications and processes. How do you manage that sort of connection and transformation of configuration issues?

Gilis: Well, there's not that much difference between local storage, SAN storage, or network attached storage (NAS) and what you designed. The only thing that you design or architect today is that basically every server or every single machine, virtual or physical, gets connected to a shared storage, and that shared storage should be replicated to a disaster recovery site.

That's basically the way you transfer the data from the current data centers to the new data centers, where you make sure that you build in disaster recovery capabilities from the moment you do the architecture of the new data center.

Gardner: Again, this must come back to a function of proper planning to do that well?

Know where you're going

Gilis: That's correct. If you don't do the planning, if you don't know where you're starting from and where you're going to, then it's like being on the ocean. Going in any direction will lead you anywhere, but it's probably not giving you the path to where you want to go. If you don't know where to go to, then don't start the journey.

Gardner: John Bennett, another tricky issue here is that when you transition from one organizational facility to another, or one sourcing set to another larger set, we're also dealing here with ownership trust. I guess that boils down to politics -- who controls what. We're not just managing technology, but we're managing people. How do we get a handle on that to make that move smoothly?

Bennett: Politics, in this case, is just the interaction and the interrelationship between the organizations and the enterprise. They're a fact of life. Of course, they would have already come into play, because getting approval to execute a project of this nature would almost of necessity involve senior executive reviews, if not board of director approval, especially if you're building your own data center.

But, the elements of trust come in, whether you're building a new data center or outsourcing, because people want to know that, after the event takes place, things will be better. "Better" can be defined as: a lot cheaper, better quality of service, and better meeting the needs of the organization.

This has to be addressed in the same way any other substantial effort is addressed -- in the personal relationships of the CIO and his or her senior staff with the other executives in the organization, and with a business case. You need measurement before and afterward in order to demonstrate success. Of course, good, if not flawless, execution of the data center strategy and transformation are in play here.

Be aware of where people view their ownership rights and make sure you are working hand-in-hand with them instead of stepping over them.



The ownership issue may be affected in other ways. In many organizations it's not unusual for individual business units to have ownership of individual assets in the data center. If modernization is at play in the data center strategy, there may be some hand-holding necessary to work with the business units in making that happen. This happens whether you are doing modernization and virtualization in the context of existing data centers or in a migration. By the way, it's not different.

Be aware of where people view their ownership rights and make sure you are working hand-in-hand with them instead of stepping over them. It's not rocket science, but it can be very painful sometimes.

Gardner: Again, it makes sense to be doing that early rather than later in the process.

Bennett: Oh, you have to do a lot of this before you even get approval to execute the project. By the time you get to the migration, if you don't have that in hand, people have to pray for it to go flawlessly.

Gardner: People don't like these sorts of surprises when it comes to their near and dear responsibilities?

Bennett: We can ask both Peter and Arnie to talk to this. Organizational engagement is very much a key part of our planning process in these activities.

Gardner: Arnie, tell us a little bit more about that process. The planning has to be inclusive, as we have discussed. We're talking about physical assets. We're talking about data, applications, organizational issues, people, and process. We haven’t talked about virtualization, but moving from physical to virtualized instances is also there. Give us a bit of a rundown of what HP brings to the table in trying to manage such a complex process.

It's an element of time

McKinnis: First of all, we have to realize that one of the things that happens in this whole process is that it's time. A client, at least when they start working with us from an outsourcing perspective, has come to the conclusion that they believe that a service provider can probably do it more efficiently and effectively and at a better price point than they can internally.

There are all sorts of decisions that go around that from a client perspective to get to that decision. In many cases, if you look at it from a technology standpoint, the point of decision is something around getting to an end of life on a platform or an application. Or, there is a new licensing cycle, either from a support standpoint or an operating system standpoint.

There is usually something that happens from a technology standpoint that says, "Hey look, we've got to make a big decision anyway. Do we want to invest going this way, that we have gone previously, or do we want to try a new direction?"

Once they make that decision, we look at outside providers. It can take anywhere from 12 to 18 months to go through the full cycle of working through all the proposals and all the due diligence to build that trust between the service provider and the client. Then, you get to the point, where you can actually make the decision of, "Yes, this is what we are going to do. This is the contract we are going to put in place." At that point, we start all the plans to get it done.

. . . There are times when deals just fall apart, sometimes in the middle, and they never even get to the contracting phase.



As you can see, it's not a trivial deal. We've seen some of these deals get half way through the process, and then the client decides, perhaps through personnel changes on the client side, or the service providers may decide that this isn't going quite the way that they feel it can be most successful. So, there are times when deals just fall apart, sometimes in the middle, and they never even get to the contracting phase.

There are lots of moving parts, and these things are usually very large. That's why, even though outsourcing contracts have changed, they are still large, are still multi-year, and there are still lots of moving parts.

When we look at the data center world, it just is one of those things where all of us take steps to make sure that we're always looking at the best case. We're always looking at what is the real case. We're always building toward what can happen and trying not to get too far ahead of ourselves.

This is little bit different than when you're just doing consulting and pure transformation and building that to the future environment. You can be a little bit more greenfield in your environment and the way you do things.

Gardner: I suppose the tendency is to get caught up in planning all about where you're ending up, your destination, and not focusing as much as you should on that all-important interim journey of getting there?

Keeping it together

McKinnis: From an outsourcing perspective, our organization takes it mostly from that state, probably more so than you could do in that future mode. For us, it's all about making sure that things do not fall apart while we are moving you forward. There are a lot of dual systems that get put in place. There are a lot of things that have to be kept running, while we are actually building that next environment.

Gilis: But, Arnie, that's exactly the same case when you don't do outsourcing. When you work with your client, and that's what it all comes down to, it should be a real partnership. If you don't work together, you will never do a good migration, whether it's outsourcing or non-outsourcing. At the end, the new data center must receive all of the assets or all of the data -- and it must work.

Most of the time, the people that know best how it used to work are the customers. If you don't work with and don't partner directly with the customer, then migration will be very, very difficult. Then, you'll hit the difficult parts that people know will fail, and if they don't inform you, you will have to solve the problem.

Gardner: Peter, as an architect, you must see that these customers you're dealing with are not all equal. There are going to be some in a position to do this better than others. I wonder whether there's something that they've done or put in place. Is it governance, change management, portfolio management, or configuration databases with a common repository of record? Are there certain things that help this naturally?

You have small migration and huge migrations. The best thing is to cut things into small projects that you can handle easily.



Gilis: As you said, there are different customers. You have small migration and huge migrations. The best thing is to cut things into small projects that you can handle easily. As we say, "Cut the elephant in pieces, because otherwise you can't swallow it."

Gardner: But, even the elephant itself might differ. How about you, John Bennett? Do you see some issues where there is some tendency toward some customers to have adopted certain practices, maybe ITIL, maybe service-oriented architecture (SOA), that make migration a bit smoother?

Bennett: There are many ways to approach this. Cutting up the elephant so you can eat it is a more interesting way of advising customers to build out their own roadmap of projects and activities, but, in the end, implement their own transformation.

In an ideal data center project, because it's such a significant effort, it's always very useful to take into consideration other modernization and technology initiatives, before and during, in order to make the migration effective.

For example, if you're going to do modernization of the infrastructure, have the new infrastructure housed in the new data center, and now you are just migrating data and applications instead of physical devices, then you have much better odds of it happening successfully.

Cleaning up internally

If you can do work with your applications or your business processes before you initiate the move, what you are doing is cleaning up the operations internally. Along the way, it's a discovery process, which Peter articulated as the very first step in the migration project. But, you're making the discovery process easier, because there are other activities you have to do.

Gardner: A lot of attention is being given to cloud computing at almost abstract level, but not too far-fetched. Taking advantage of cloud computing means being able to migrate a data center; large chunks of that elephant moving around. Is this something people are going to be doing more often?

Bennett: It's certainly a possibility. Adopting a cloud strategy for specific business services would let you take advantage of that, but in many of these environments today cloud isn't a practical solution yet for the broad diversity of business services they're providing.

We see that for many customers it's the move from dedicated islands of infrastructure, to a shared infrastructure model, a converged infrastructure, or an adaptive infrastructure. Those are significant steps forward with a great deal of value for them, even without getting all the way to cloud, but cloud is definitely on the horizon.

What we're moving toward, if done properly, is a breaking off, especially in the enterprise, of the security and compliance issues around data.



Gardner: Can we safely say, though, that we're seeing more frequent migrations and perhaps larger migrations?

McKinnis: In general, what we've seen is the hockey stick that's getting ready to happen with shared compute. I'll just throw it out there as what this stuff is in the data centers, kind of a shared-compute environment. What we're moving toward, if done properly, is a breaking off, especially in the enterprise, of the security and compliance issues around data.

There is this breaking off of what can be done, what should be done at the desktop or user level, what should be kept locally, and then what should be kept at a shared compute or a shared-services level.

Gardner: Perhaps we're moving toward an inflection point, where we're going to see a dramatic uptake in the need for doing migration activities?

McKinnis: I think we will. Cloud has put things back in people's heads around what can be put out there in that shared environment. I don't know that we've quite gotten through the process of whether it should be at a service provider location, my location, or within a very secure location at an outsourced environment.

Where to hold data

I don't think they've gotten to that at the enterprise level. But, they're not quite so convinced about giving users the ability to retain data and do that processing, have that application right there, held within that confinement of that laptop, or whatever it happens to be that they are interacting with. They're starting to see that it potentially should be held someplace else, so that the risk of that data isn't held at the local level. Do you understand where I am going with that?

Gardner: I do. I think we are seeing greater responsibility now being driven toward the data center, which is going to then force the re-architecting and the capacity issues, which will ultimately then require choices about sourcing, which will then of course require a variety of different migration activities.

McKinnis: Right. It's not just about a new server or a new application. Sometimes it's as much about, "How do I stay within compliance? Am I a public company or am I am a large government entity? How do I stay within my compliance and my regulations? How do I hold data? How do I have to process it?"

Even in the world of global service delivery, there are a lot of rules and regulations around where data can be stored. In that leveraged environment that a service provider provides, potentially storage is in somewhere in Eastern Europe, India, or in South America. There are plenty of compliance issues around where data can actually be held within certain governmental regulations, depending on where you are -- in country or out of country.

Planning is key -- not only planning the migration itself, but also doing "plan B" -- what if it doesn't work -- because then you have to go back to the old rule as soon as possible and within the time frame given.



Gardner: Let's move to Peter. Tell me a bit about some examples. Moving back to the migration itself, can you give us a sense of how this is done well, and if there are some metrics of success, when it is done well?

Gilis: As we already said in the beginning, it all depends on planning. Planning is key -- not only planning the migration itself, but also doing "plan B" -- what if it doesn't work -- because then you have to go back to the old rule as soon as possible and within the time frame given.

First, you need to plan, "Is my application suitable for a migration?" Sometimes, if you migrate your data centers from place A to place B -- as we've done in EMEA, from Czech Republic to Austria -- the distance of 350 kilometers gives an extra latency. If your programs, and we have tested them for the customer, already have performance problems, the little extra latency can just kill your program when you migrate.

One of the things we have done in that case is that we've tested it using a network simulator on a real-life machine. We found that the application was not adaptive, or the server was not adaptive for migration. If you know this beforehand, then you remove a risk by just migrating it on its own.

In another customer I saw that people had divided the whole migration process into multiple streams, but there was a lack of coordination between the streams. This means that if you have a shared application related to more than one stream, the planning of the one stream was totally in conflict with the planning of another stream. This means that the application and the data moved without informing the other streams, causing huge delays in real life, because the other applications were not synchronized anymore in the same way they used to be, assuming they were synchronized before.

So, if you don't plan and work together, you will definitely have failures.

Gardner: You mentioned something that was interesting about trying to do this on a test basis. I suppose that for that application development process, you'd want to have a test and dev and use some sort of a testbed, something that's up before you go into full production. Perhaps we also want to put some of these servers, data sets, and applications through some sort of a test to see if they are migration ready. Is that an important and essential part of this overall process?

Directly to the site

Gilis: If you can do it, it's excellent, but sometimes we still see in real life that not all customers have a complete test and dev environment, or not even an acceptance environment. Then, the only way to do it is to move the real-life machine directly to the new site.

I've actually seen it. It wasn't really a migration, but an upgrade of an SAP machine. Because of performance problems, the customer needed to migrate to a new, larger server. And, because of the pressure of the business, they didn't have time to move from test and dev, to acceptance, and to production. They started immediately with production.

At two o'clock in the morning we found that there was a bug in the new version and we had to roll back the whole migration and the whole upgrade. That's not the best time in the middle of the weekend.

Gardner: John Bennett, we've heard again and again today about how important it is to do this planning, to get it done upfront, and to get that cooperation as early as possible. So the big question for me now is how do you get started?

Bennett: How you get started depends on what your own capabilities and expertise are. If these are projects that you've undertaken before, there's no reason not to implement them in a similar manner. If they are not, it starts with the identification of the business services and the sequencing of how you want them to be moved into the new data center and provisioned over there.

We have successfully undertaken customer data center migration projects, which had minimal or zero operational disruption, by making clever use of short-term leases to ensure that business services continue to run, while they are transitioned to a new data center.



In order to plan that level of detail, you need to have, as Peter highlighted earlier, a really good understanding of everything you have. You need to fully build out a model of the assets you have, what they are doing, and what they are connected to, in order to figure out the right way to move them. You can do this manually, or you can make use of software like HP's Discovery and Dependency Mapping software.

If the size of this project is a little daunting to you, then of course the next step is to take advantage of someone like HP. We have Discovery Services, and, of course, we have a full suite of migration services available, with people trained and experienced in doing this to help customers move and migrate data centers, whether it's to their own or to an outsourced data center.

Peter talked about planning this with a disaster in mind to understand what downtime you can plan for. We have successfully undertaken customer data center migration projects, which had minimal or zero operational disruption, by making clever use of short-term leases to ensure that business services continue to run, while they are transitioned to a new data center. So, you can realize that too.

But, I'd also ask both Peter and Arnie here, who are much more experienced in this, to highlight the next level of detail. Just what goes into that effective planning, and how do you get started?

Gardner: I'd also like to hear that, Peter. In the future, I expect that, as always, new technologies will be developed to help on these complex issues. Looking forward, are there some hopeful signs that there is going to be a more automated way to undertake this?

Migration factory

Gilis: If you do a lot of migrations, and that's actually what most of the service companies like HP are doing, we know how to do migrations and how to treat some of the applications migrated as part of a "migration factory."

We actually built something like a migration factory, where teams are doing the same over and over all the time. So, if we have to move Oracle, we know exactly how to do this. If we have to move SAP, we know exactly how to do this.

That's like building a car in a factory. It's the same thing day in and day out, everyday. That's why customers are coming to service providers. Whether you go to an outsourcing or non-outsourcing, you should use a service provider that builds new data centers, transforms data centers, and does migration of data centers nearly every day.

Gardner: I'm afraid we're just about out of time and we're going to have to leave it there. I want to thank our guests for an insightful set of discussion points around data center migration.

As we said earlier, major setups and changes with data-center facilities often involve a lot of planning and expense, but sometimes not quite enough planning goes into the migration itself. Here to help us better understand and look towards better solutions around data center migration, we have been joined by Peter Gilis, data center transformation architect for HP Technology Services. Thanks so much, Peter.

Gilis: Thank you.

Gardner: Also John Bennett, worldwide director, Data Center Transformation Solutions at HP. Thanks, John.

Bennett: You're most welcome, Dana.

Gardner: And lastly, Arnie McKinnis, worldwide product marketing manager for Data Center Modernization in HP Enterprise Services. Thanks for your input, Arnie.

McKinnis: Thank you, Dana. I've enjoyed being included here.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for listening and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Hewlett-Packard.

Transcript of a sponsored BriefingsDirect podcast on proper planning for data-center transformation and migration. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.