Tuesday, August 21, 2012

New Levels of Automation and Precision Needed to Optimize Backup and Recovery in Virtualized Environments

Transcript of a BriefingsDirect podcast on the relationship between increased virtualization and the need for data backup and recovery.

Listen to the podcast. Find it on iTunes/iPod. Download the transcript. Sponsor: Quest Software.

Dana Gardner: Hi, this is Dana Gardner, Principal Analyst at Interarbor Solutions, and you're listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on the relationship between increasingly higher levels of virtualization and the need for new data backup and recovery strategies.

We'll examine how the era of major portions of servers now being virtualized, has provided an on-ramp to attaining data lifecycle benefits and efficiencies. And at the same time, these advances are helping to manage complex data environments that consist of both physical and virtual systems.

What's more, the elevation of data to the lifecycle efficiency level is also forcing a rethinking of the culture of data, of who owns data, and when, and who is responsible for managing it in a total lifecycle across all applications and uses.

This is different from the previous and current system where it’s often a fragmented approach, with different oversight for data across far-flung instances and uses.

Lastly, our discussion focuses on bringing new levels of automation and precision to the task of solving data complexity, and of making always-attainable data the most powerful asset that IT can deliver to the business.

Here to share insights on where the data availability market is going and how new techniques are being adopted to make the value of data ever greater, we're joined by John Maxwell, Vice President of Product Management for Data Protection, at Quest Software. Welcome back, John. [Disclosure: Quest Software is a sponsor of BriefingsDirect podcasts.]

John Maxwell: Hi, Dana. Thanks. It’s great to be here to talk on a subject that's near and dear to my heart.

Gardner: Let’s start at a high level. Why have virtualization and server virtualization become a catalyst to data modernization? Is this an unintended development or is this something that’s a natural evolution?

Maxwell: I think it’s a natural evolution, and I don’t think it was even intended on the part of the two major hypervisor vendors, VMware and Microsoft with their Hyper-V. As we know, 5 or 10 years ago, virtualization was touted as a means to control IT costs and make better use of servers.

Utilization was in single digits, and with virtualization you could get it much higher. But the rampant success of virtualization impacted storage and the I/O where you store the data.

Upped the ante

I
f you look at the announcements that VMware did around vSphere 5, around storage, and the recent launch of Windows Server 2012, Hyper-V, where Microsoft even upped the ante and added support for Fibre Channel with their hypervisor, storage is at the center of the virtualization topic right now.

It brings a lot of opportunities to IT. Now, you can separate some of the choices you make, whether it has to do with the vendors that you choose or the types of storage, network-attached storage (NAS), shared storage and so forth. You can also make the storage a lot more economical with thin disk provisioning, for example.

There are a lot of opportunities out there that are going to allow companies to make better utilization of their storage just as they've done with their servers. It’s going to allow them to implement new technologies without necessarily having to go out and buy expensive proprietary hardware.

From our perspective, the richness of what the hypervisor vendors are providing in the form of APIs, new utilities, and things that we can call on and utilize, means there are a lot of really neat things we can do to protect data. Those didn't exist in a physical environment.

It’s really good news overall. Again, the hypervisor vendors are focusing on storage and so are companies like Quest, when it comes to protecting that data.

Gardner: As we move towards that mixed environment, what is it about data that, at a high level, people need to think differently about? Is there a shift in the concept of data, when we move to virtualization at this level?

First of all, people shouldn’t get too complacent.



Maxwell: First of all, people shouldn’t get too complacent. We've seen people load up virtual disks, and one of the areas of focus at Quest, separate from data protection, is in the area of performance monitoring. That's why we have tools that allow you to drill down and optimize your virtual environment from the virtual disks and how they're laid out on the physical disks.

And even hypervisor vendors -- I'm going to point back to Microsoft with Windows Server 2012 -- are doing things to alleviate some of the performance problems people are going to have. At face value, your virtual disk environment looks very simple, but sometimes you don’t set it up or it’s not allocated for optimal performance or even recoverability.

There's a lot of education going on. The hypervisor vendors, and certainly vendors like Quest, are stepping up to help IT understand how these logical virtual disks are laid out and how to best utilize them.

Gardner: It’s coming around to the notion that when you set up your data and storage, you need to think not just for the moment for the application demands, but how that data is going to be utilized, backed up, recovered, and made available. Do you think that there's a larger mentality that needs to go into data earlier on and by individuals who hadn’t been tasked with that sort of thought before?

See it both ways

Maxwell: I can see it both ways. At face value, virtualization makes it really easy to go out and allocate as many disks as you want. Vendors like Quest have put in place solutions that make it so that within a couple of mouse clicks, you can expose your environment, all your virtual machines (VMs) that are out there, and protect them pretty much instantaneously.

From that aspect, I don't think there needs to be a lot of thought, as there was back in the physical days, of how you had to allocate storage for availability. A lot of it can be taken care of automatically, if you have the right software in place.

That said, a lot of people may have set themselves up, if they haven’t thought of disaster recovery (DR), for example. When I say DR, I also mean failover of VMs and the like, as far as how they could set up an environment where they could ensure availability of mission-critical applications.

For example, you wouldn’t want to put everything, all of your logical volumes, all your virtual volumes, on the same physical disk array. You might want to spread them out, or you might want to have the capabilities of replicating between different hypervisor, physical servers, or arrays.

Gardner: I understand that you've conducted a survey to try to find out more about where the market is going and what the perceptions are in the market. Perhaps you could tell us a bit about the survey and some of the major findings.

Our survey showed that 70 percent of organizations now consider at least 50 percent of their data mission critical.



Maxwell: One of the findings that I find most striking, since I have been following this for the past decade, is that our survey showed that 70 percent of organizations now consider at least 50 percent of their data mission critical.

That may sound ambiguous at first, because what is mission critical? But from the context of recoverability, that generally means data that has to be recovered in less than an hour and/or has to be recovered within an hour from a recovery-point perspective.

This means that if I have a database, I can’t go back 24 hours. The least amount of time that I can go back is within an hour of losing data, and in some cases, you can’t go back even a second. But it really gets into that window.

I remember in the days of the mainframe, you'd say, "Well, it will take all day to restore this data, because you have tens or hundreds of tapes to do it." Today, people expect everything to be back in minutes or seconds.

The other thing that was interesting from the survey is that one-third of IT departments were approached by their management in the past 12 months to increase the speed of the recovery time. That really dovetails with the 50 percent of data being mission critical. So there's pressure on the IT staff now to deliver better service-level agreements (SLAs) within their company with respect to recovering data.

Terms are synonymous

The other thing that's interesting is that data protection and the term backup are synonymous. It's funny. We always talk about backup, but we don't necessarily talk about recovery. Something that really stands out now from the survey is that recovery or recoverability has become a concern.

Case in point: 73 percent of respondents, or roughly three quarters, now consider recovering lost or corrupted data and restoring those mission critical applications their top data-protection concern. Only 4 percent consider the backup window the top concern. Ten years ago, all we talked about was backup windows and speed of backup. Now, only 4 percent considered backup itself, or the backup window, their top concern.

So 73 percent are concerned about the recovery window, only 4 percent about the backup window, and only 23 percent consider the ability to recover data independent of the application their top concerns.

Those trends really show that there is a need. The beauty is that, in my opinion, we can get those service levels tighter in virtualized environments easier than we can in physical environments.

Gardner: We seem to have these large shifts in the market, one around virtualization of servers and storage and the implications of first mixed, and then perhaps a majority, or vast majority, of virtualized environments.

A company has to look at which policies or which solutions to put in place to address the criticality of data, but then there is a cost associated with it.



The second shift is the heightened requirements around higher levels of mission-critical allocation or designation for the data and then the need for much greater speed in recovering it.

Let's unpack that a little bit. How do these fit together? What's the relationship between moving towards higher levels of virtualization and being able to perhaps deliver on these requirements, and maybe even doing it with some economic benefit?

Maxwell: You have to look at a concept that we call tiered recovery. That's driven by the importance now of replication in addition to traditional backup, and new technology such as continuous data protection and snapshots.

That gets to what I was mentioning earlier. Data protection and backup are synonymous, but it's a generic term. A company has to look at which policies or which solutions to put in place to address the criticality of data, but then there is a cost associated with it.

For example, it's really easy to say, "I'm going to mirror 100 percent of my data," or "I'm going to do synchronous replication of my data," but that would be very expensive from a cost perspective. In fact, it would probably be just about unattainable for most IT organizations.

Categorize your data

What you have to do is understand and categorize your data, and that's one of the focuses of Quest. We're introducing something this year called NetVault Extended Architecture (NetVault XA), which will allow you to protect your data based on policies, based on the importance of that data, and apply the correct solution, whether it's replication, continuous data protection, traditional backup, snapshots, or a combination.

You can't just do this blindly. You have got to understand what your data is. IT has to understand the business, and what's critical, and choose the right solution for it.

Gardner: It's interesting to me that if we're looking at data and trying to present policies on it, based on its importance, these policies are going to be probably dynamic and perhaps the requirements for the data will be shifting as well. This gets to that area I mentioned earlier about the culture around data, thinking about it differently, perhaps changing who is responsible and how.

So when we move to this level of meeting our requirements that are increasing, dealing in the virtualization arena, when we need to now think of data in perhaps that dynamic fluid sense of importance and then applying fit-for-purpose levels of support, backup, recoverability, and so forth, whose job is that? How does that impact how the culture of data has been and maybe give us some hints of what it should be?

Maxwell: You've pointed out something very interesting, especially in the area of virtualization, just as we have noticed over the seven years of our vRanger product, which invented the backup market for virtualized environments.

What we see now are the traditional people who were responsible for physical storage taking over the responsibility of virtual storage.



It used to be, and it still is in some cases, that the virtual environment was protected by the person, usually the sys admin, who was responsible for, in the case of VMware, the ESXi hypervisors. They may not necessarily have been aligned with the storage management team within IT that was responsible for all storage and more traditional backups.

What we see now are the traditional people who were responsible for physical storage taking over the responsibility of virtual storage. So it's not this thing that’s sitting over on the side and someone else does it. As I said earlier, virtualization is now such a large part of all the data, that now it's moving from being a niche to something that’s mainstream. Those people now are going to put more discipline on the virtual data, just as they did the physical.

Because of the mission criticality of data, they're going from being people who looked at data as just a bunch of volumes or arrays, logical unit numbers (LUNs), to "these are the applications and this is the service level associated with the applications."

When they go to set up policies, they are not just thinking of, "I'm backing up a server" or "I'm backing up disk arrays,", but rather, "I'm backing up Oracle Financials," "I'm backing up SAP," or "I'm backing up some in-house human resources application."

Adjust the policy

And the beauty of where Quest is going is, what if those rules change? Instead of having to remember all the different disk arrays and servers that are associated with that, say the Oracle Financials, I can go in and adjust the policy that's associated with all of that data that makes up Oracle Financials. I can fine-tune how I am going to protect that and the recoverability of the data.

Gardner: That to me brings up the issue about ease of use, administration, interfaces, making these tools something that can be used by more people or a different type of person. How do we look at this shift and think about extending that policy-driven and dynamic environment at the practical level of use?

Maxwell: It's interesting that you bring that up too, because we've had many discussions about that here at Quest. I don't want to use the term consumerization of IT, because it has been used almost too much, but what we're looking at is, with the increased amount of virtual data out there, which just adds to the whole pot of heterogeneous environments, whether you have Windows and Linux, MySQL, Oracle, or Exchange, it's impossible for these people who are responsible for the protection and the recoverability of data to have the skills needed to know each one of those apps.

We want to make it as easy to back up and recover a database as it is a flat file. The fine line that we walk is that we don't want to dumb the product down. We want to provide intuitive GUIs, a user experience that is a couple of clicks away to say, "Here is a database associated with the application. What point do I want to recover to?" and recover it.

If there needs to be some more hands-on or more complicated things that need to be done, we can expose features to maybe the database administrator (DBA), who can then use the product to do more complex recovery or something to that effect.

It's impossible for these people who are responsible for the protection and the recoverability of data to have the skills needed to know each one of those apps.



We've got to make it easy for this generalist, no matter what hypervisor -- Hyper-V or VMware, a combination of both, or even KVM or Xen -- which database, which operating system, or which platform.

Again, they're responsible for everything. They're setting the policies, and they shouldn't have to be qualified. They shouldn't have to be an Exchange administrator, an Oracle DBA, or a Linux systems administrator to be able to recover this data.

We're going to do that in a nice pretty package. Today, there are many people here at Quest who walk around with a tablet PC as much as they do with their laptop. So our next-generation user interface (UI) around NetVault XA is being designed with a tablet computing scenario, where you can swipe data, and your toolbar is on the left and right, as if you are holding it using your thumb -- that type of thing.

Gardner: So, it's more access when it comes to the endpoint, and as we move towards supporting more of these point applications and data types with automation and a policy-driven approach or an architecture, that also says to me that we are elevating this to the strategic level. We're looking at data protection as a concept holistically, not point by point, not source by source and so forth.

Again, it seems that we have these forces in the market, virtualization, the need for faster recovery times, dealing with larger sets of data. That’s pushing us, whether we want to or even are aware of it, towards this level of a holistic or strategic approach to data.

Let me just see if you have any examples, at this point, of companies that are doing this and what it's doing for them. How are they enjoying the benefits of elevating this to that strategic or architecture level?

Exabyte of data

Maxwell: We have one customer, and I won't mention their name, but they are one of the top five web properties in the world, and they have an exabyte of data. Their incremental backups are almost 500 petabytes, and they have an SLA with management that says 96 percent of backups will run well, because they have so much data that changes in a week’s time.

You can't miss a backup, because that gets to the recoverability of the application. They're using our NetVault product to back up that data, using both traditional methods and integrated snapshots. Snapshot was on the technology tier as far as having tiered recovery scenario. They used NetVault in conjunction with hardware snapshots, where there is no backup window. The backup to the application is, for all practical purposes, instantaneous.

Then, they use NetVault to manage and even take that data that’s on disk and eventually move it to tape. The snapshots allow them to do that very quickly for massive amounts of data. And by massive amounts of data, I'm talking 100 million files associated with one application. To put that back in place at any point in time very quickly with NetVault orchestrating that hardware snapshot technology, that’s pretty mind blowing.

Gardner: That does give us a sense of the scale and complexity and how it's being managed and delivered.

You mentioned how Quest is moving towards policy-driven approaches, improving UIs, and extending those UIs to mobile tier. Are there any other technology approaches that Quest is involved with that further explain how some of these challenges can be met? I'm very interested in agentless, and I'm also looking at how that automation gets extended across more of these environments.

We're envisioning customer environments where they're going to have multiple hypervisors, just as today people have multiple operating system databases.



Maxwell: There are two things I want to mention. Today, Quest protects VMware and Microsoft Hyper-V environments, and we'll be expanding the hypervisors that we're supporting over the next 12 months. Certainly, there are going to be a lot of changes around Windows Server 2012 or Hyper-V, where Microsoft has certainly made it a lot more robust.

There are a lot more things for us exploit, because we're envisioning customer environments where they're going to have multiple hypervisors, just as today people have multiple operating system databases.

We want to take care of that, mask some complexity and allow people to possibly have cross-hypervisor recoverability. So, in other words, we want to enable safe failover of a VMware ESXi system to Microsoft Hyper-V, or vice versa..

There's another thing that’s interesting and is a challenge for us and it's something that has challenged engineers here at Quest. This gets into the concepts of how you back up or protect data differently in virtual environments. Our vRanger product is the market leader with more than 40,000 customers, and it’s completely agentless.

As we have evolved the product over the past seven years, we've had three generations of the product and have exploited various APIs. But with vRanger, we've now gone to what is called a virtual appliance architecture. We have a vRanger service that performs backup and replication for one or hundreds of VMs that exist either on that one physical server or in a virtual cluster. So this VM can even protect VMs that exist on other hardware.

Scalability

The beauty of this is first the scalability. I have one software app that’s running that’s highly controllable. You can control what resources are replicating, protecting, and recovering all of my VMs. So that’s easy to manage, versus having to have an agent installed in every one of those VMs.

Two, there's no overhead. The VMs don’t even know, in most cases, that a backup is occurring. We use the services, in the case of VMware, of ESXi, that allows us to go out there, snapshot the virtual volumes called VMDKs, and back up or replicate the data.

Now, there is one thing that we do that’s different than some others. Some vendors do this and some don’t, and I think one of those things you have to look at when you choose a virtual backup or virtual data protection vendor is their technical prowess in this area. If you're backing up a VM that has an application such as Exchange or SharePoint, that’s a live application, and you want to be able to synchronize the hypervisor snapshot with the application that’s running.

There’s a service in Windows called Volume Shadow Copy Service, or VSS for short, and one of the unique things that Quest does with our backup software is synchronize the virtual snapshot of the virtual disks with the application of VSS, so we have a consistent point-in-time backup.

To communicate, we dynamically inject binaries into the VM that do the process and then remove themselves. So, for a very short time, there's something running in that VM, but then it's gone, and that allows us to have consistent backup.

One of the beauties of virtualization is that I can move data without the application being conscious of it happening.



That way, from that one image backup that we've done, I can restore an entire VM, individual files, or in the case of Microsoft Exchange or Microsoft SharePoint, I can recover a mailbox, an item, or a document out of SharePoint.

Gardner: So the more application-aware the solution is, it seems the more ease there is in having this granular level of restore choices. So that's fit for purpose, when it comes to deciding what level of backup and recovery and support for the data lifecycle is required.

This also will be able to fit into some larger trends around moving a data center to a software level or capability. Any thoughts of how what you're doing at Quest fits into this larger data-center trend. It seems to me that it’s at the leading or cutting edge?

Maxwell: One of the beauties of virtualization is that I can move data without the application being conscious of it happening. There's a utility, for example, within VMware called vMotion Storage that allows them to move data from A to B. It's a very easy way to migrate off of an older disk array to a new one, and you never have to bring the app down. It's all software driven within the hypervisor, and it's a lot of control. Basically it’s a seamless process.

What this opens up, though, is the ability for what we're looking at doing at Quest. If there's a means to move data around, why can't I then create an environment where I could do DR, whether it's within the data center for hardware redundancy or whether it's like what we do here at Quest.

Replicate data


W
e replicate data amongst various Quest facilities. Then, we can bring up an application that was running in location A in point B, on unlike hardware. It can be completely different storage, completely different servers, but since they're VMs, it doesn’t matter.

That kind of flexibility that virtualization brings is going to give every IT organization in the world the type of failover capabilities that used to only exist for the Global 1000, where they used to have to set up a hot site or had to have a data center. They would use very expensive proprietary hardware-based replication and things like that. So you had to have like arrays, like servers, and all that, just to have availability.

Now, with virtualization, it doesn’t matter, and of course, we have plenty of bandwidth, especially here in the United States. So it’s very economical, and this gets back to our survey that showed that for IT organizations, 73 percent were concerned about recovering data, and that’s not just recovering a file or a database.

Here in California, we're always talking about the big one. Well, when the big one happens, whole bunches of server racks may fall over. In the case of Quest, we want to be able to bring those applications up in an environment that's in a different part of the country, with no fault zones and that type of thing, so we can continue our business.

Gardner: We just saw a recent example of unintended or unexpected circumstances with the Mid-Atlantic states and some severe thunderstorms, which caused some significant disruption. So we always need to be thoughtful about the unexpected.

Now, we are talking about actually putting data protection products in the cloud, so you can back up the data locally within the cloud.



Another thing that occurred to me while you were discussing these sort of futuristic scenarios, which I am imagining aren’t that far off, is the impact that cloud computing another big trend in the market, is bringing to the table.

It seems to me that bringing some of the cloud models, cloud providers, service models into play with what you have described also expands what can be done across larger sets of organizations and maybe even subsets of groups within companies. Any thoughts briefly on where some of the cloud provider scenarios might take this?

Maxwell: It’s funny. Two years ago, when people talked about cloud and data protection, it was just considering the cloud as a target. I would back up the cloud or replicate the cloud. Now, we are talking about actually putting data protection products in the cloud, so you can back up the data locally within the cloud and then maybe even replicate it or back it up back to on-prem, which is kind of a novel concept if you think about it.

If you host something up in cloud, you can back it up locally up there and then actually keep a copy on-prem. Also, the cloud is where we're certainly looking at having generic support for being able to do failover into the cloud and working with various service providers where you can pre-provision, for example, VMs out there.

You're replicating data. You sense that you have had a failure, and all you have to do is, via software, bring up those VMs, pointing them at the disk replicas you put up there.

Different cloud providers

Then, there's the concept of what you do if a certain percentage of all your IT apps are hosted in cloud by different cloud providers. Do you want to be able to replicate the data between cloud vendors? Maybe you have data that's hosted at Amazon Web Services. You might want to replicate it to Microsoft Azure or vice versa or you might want to replicate it on-premise (on-prem).

So there's going to be a lot of neat hybrid options. The hybrid cloud is going to be a topic that we're going to talk about a lot now, where you have that mixture of on-prem, off-prem, hosted applications, etc., and we are preparing for that.

Gardner: I'm afraid we're about out of time. You've been listening to a sponsored BriefingsDirect podcast discussion on the relationship between increasingly higher levels of virtualization and the need for new backup and recovery strategies.

We've seen how solving data complexity and availability in the age of high virtualization is making always attainable data the most powerful asset that an IT organization can deliver to its users.

I'd like to thank our guest. We've been joined by John Maxwell, Vice President of Product Management and Data Protection at Quest Software.

The cloud is where we're certainly looking at having generic support for being able to do failover into the cloud.



John, would you like to add anything else, maybe in terms of how organizations typically get started. This does seem like a complex undertaking. It has many different entry points. Are there some best practices you've seen in the market about how to go about this, or at least to get going?

Maxwell: The number one thing is to find a partner. At Quest, we have hundreds of technology partners that can help companies architect a strategy utilizing the Quest data protection solutions.

Again, choose a solution that hits all the key points. In the case of VMware, you can go to VMware’s site and look for VMware Ready-Certified Solutions. Same thing with Microsoft, whether it’s Windows Server 2008 or 2012 certified. Make sure that you are getting a solution that’s truly certified. A lot of products say they support virtual environments, but then they don’t have that real certification, and a result, they can’t do lot of the innovative things that I’ve been talking about .

So find a partner who can help, or, we at Quest can certainly help you find someone who can help you architect your environment and even implement the software for you, if you so choose. Then, choose a solution that is blessed by the appropriate vendor and has passed their certification process.

Gardner: I should also point out that VMworld is coming up next week. I expect that you'll probably have a big presence there, and a lot of the information that we have been talking about will be available in more detail through the VMworld venue or event.

Maxwell: Absolutely, Dana. Quest will have a massive presence at VMworld, both in San Francisco and Barcelona. We'll be demonstrating technologies we have today and also we will be making some major announcements and previewing some real exciting software at the show.

Gardner: Well, great. This is Dana Gardner, Principal Analyst at Interarbor Solutions. I'd like to thank our audience for listening, and invite them to come back next time.

Listen to the podcast. Find it on iTunes/iPod. Download the transcript. Sponsor: Quest Software.

Transcript of a BriefingsDirect podcast on the relationship between increased virtualization and the need for data backup and recovery. Copyright Interarbor Solutions, LLC, 2005-2012. All rights reserved.

You may also be interested in:

Thursday, August 16, 2012

Columbia Sportswear Extends Deep Server Virtualization to Improved ERP Operations, Disaster Recovery Efficiencies

Transcript of a sponsored BriefingsDirect podcast on how Columbia Sportswear has harnessed virtualization to provide a host of benefits for its business units.

Listen to the podcast. Find it on iTunes/iPod. Download the transcript. Sponsor: VMware.

Dana Gardner: Hi, this is Dana Gardner, Principal Analyst at Interarbor Solutions, and you're listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on how outerwear and sportswear maker and distributor Columbia Sportswear has used virtualization techniques and benefits to improve their business operations.

We’ll see how Columbia Sportswear’s use of deep virtualization assisted in rationalizing its platforms and data center, as well as led to benefits in their enterprise resource planning (ERP) implementation. We’ll also see how it formed a foundation for improved disaster recovery (DR) best practices.

Stay with us now to learn more about how better systems make for better applications that deliver better business results. Here to share their virtualization journey is Michael Leeper, Senior Manager of IT Engineering at Columbia Sportswear in Portland, Oregon. Welcome, Michael. [Disclosure: VMware is a sponsor of BriefingsDirect podcasts.]

Michael Leeper: Good morning, Dana.

Gardner: We’re also here with Suzan Frye, Manager of Systems Engineering at Columbia Sportswear. Welcome to BriefingsDirect, Suzan.

Suzan Frye: Good morning, Dana.

Gardner: Let’s start with you, Michael. Tell me a little bit about how you got into virtualization. What were some of the requirements that you needed to fulfill at the data center level? Then we’ll dig down into where that went and what it paid off.

Leeper: Pre-2009, we'd experimented with virtualization. It'd be one of those things that I had my teams working on, mostly so we could tell my boss that we were doing it, but there wasn’t a significant focus on it. It was a nice toy to play with in the corner and it helped us in some small areas, but there were no big wins there.

In mid-2009, the board of directors at Columbia decided that we, as a company, needed a much stronger DR plan. That included the construction of a new data center for us to house our production environments offsite.

As we were working through the requirements of that project with my teams, it became pretty clear for us that virtualization was the way we were going to make that happen. For various reasons, we set off on this path of virtualization for our primary data center, as we were working through issues surrounding multiple data centers and DR processes.

Our technologies weren't based on the physical world any more. We were finding more issues in physical than we were in virtual. So we started down this path to virtualize our entire production world. By that point, mid-2010 had come around, and we were ready to go. We had built our DR stack that virtualized our primary data centers taking us to the 80 percent to 90 percent virtual machine (VM) rate.

Extremely successful


We were extremely successful in that process. We were able to move our primary data center over a couple of weekends with very little downtime to the end users, and that was all built on VMware technology.

About a week after we had finished that project, I got a call from our CIO, who said he had purchased a new ERP system, and Columbia was going to start down the path of a fully new ERP implementation.

I was being asked at that time what platform we should run it on, and we had a clean slate to look everywhere we could to find what our favorite, what we felt was the most safe and stable platform to run the crown jewels of the company which is ERP. For us that was going to be the SAP stack.

So it wasn't a hard decision to virtualize ERP for us. We were 90 percent virtual anyway. That’s what we were good at, and that’s where teams were staffed and skilled at. What we did was design the platform that we felt was going to meet our corporate standards and really meet our goals. For us that was running ERP on VMware.

Gardner: It sounds as if you had a good rationale for moving into a highly virtualized environment, but that it made it easier for you to do other things. Am I reading too much into it, or would you really say that your migration for ERP was much easier as a result of being highly virtualized?

It wasn't a hard decision to virtualize ERP for us. We were 90 percent virtual anyway.



Leeper: There are a couple of things there. Specifically in the migration to virtualization, we knew we were going to have to go through the effort of moving operating systems from one site to another. We determined that we could do that once on the physical side, relatively easily, and probably the same amount of effort as doing it once by converting physical to virtual.

The problem was that the next time we wanted to move services back from one facility to another in the physical world, we're going to have to do that work again. In the virtual space, we never had to do it again.

To make the teams go through the effort of virtualizing a server to then move it to another data center, we all need to do is do the work once. For my engineers, any time we get them to do the mundane stuff once it's better than doing it multiple times. So we got that effort taken care of in that early phase of the project to virtualize our environments.

For the ERP platform specifically, this was a net new implementation. We were converting from a JD Edwards environment running on IBM big iron to a brand-new SAP stack. We didn’t have anything to migrate. This was really built from scratch.

So we didn’t have to worry about a lot of the legacy configurations or legacy environments that may have been there for us. We got to build it new. And by that point in our journey, virtualized was the only way for us to do it. That’s what we do, it’s how we do it, and that's what we’re good at.

Across the board


Gardner: Just for the benefit of our audience, let’s hear a bit more about Columbia Sportswear. You’re manufacturing, distributing, and retailing. I assume you’re doing an awful lot online. Give us a sense of the business requirements behind your story around virtualization, DR, and ERP.

Leeper: Columbia Sportswear is based in Portland, Oregon. We're the worldwide leader in apparel and accessories. We sell primarily outerwear and sportswear products, and a little bit of footwear, globally. We have about 4,000 employees, 50 some-odd physical locations, not counting retail, around the world. The products are primarily manufactured in Asia with sales distribution happening in both Europe and United States.

My teams out of the U.S. manage our global footprint, and we are the sole source of IT support globally from here.

Gardner: Let’s go to Suzan. Suzan, tell me a little bit about the pace at which you were able to embark on this virtualization journey. I saw some statistics that you went from 25 percent to 75 percent in about eight months which was really impressive, and as Michael pointed out, now over 90 percent. How did you get the pace and what was important in keeping that pace going?

Frye: The only way we could do it was with virtualization and using the efficiencies we gained with that. We centrally manage all of IT and engineering globally out of our headquarters in Portland. When we were given the initial project to move our data center and not only move our data center but provide DR services as well, it was a really easy sell to the business.

We could go to the business and explain to them the benefits of virtualization and what it would mean for their application. They wouldn’t have to rebuild and they wouldn’t have to bring in the vendor or any consultants. We can just take their systems, virtualize them, move them to our new data center, and then provide that automatic DR with Site Recovery Manager (SRM).

We had nine months to move our data center and we basically were all hands on deck, everybody on the server engineering team, storage, and networking teams as well. And we had executive support and sponsorship. It was very easy for us to go to the business market virtualization to the business and start down that path where we were socializing the idea. A lot of people, of course, were dragging their feet a little bit. We all know that story.

Once they realized that we could move their application, bring it back up, and then move it between data centers almost seamlessly, it was an instant win for us.



But once they realized that we could move their application, bring it back up, and then move it between data centers almost seamlessly, it was an instant win for us. We went from that 20 percent to 30 percent virtualization. We had about 75 percent when we were in the middle of our DR project, and today we’re actually at around 93 percent.

Gardner: One of the things I hear a lot from people that are doing multiple things with virtualization, like you did, is where to start, how to do this in the right order? Is there anything that you could come back with from your experience on how to do it in the order that incentivizes people to adopt, as you pointed out, but then also allows you to move into these other benefits in a way that compounds the return on investment (ROI)?

Frye: I think it surprises people that we have a "virtualize first" strategy today. Now it’s assumed that your system will be virtual and then all the benefits, the flexibility, the portability, the optimization, and the efficiencies that come with it.

But like most companies, we had to start with some of our lower tier or lower service-level agreement (SLA) systems, our development systems, and start working with the business on getting them to understand some of the benefits that they could gain by working with virtual systems.

Performance is there

Again people are always surprised. Will you have SQL virtualized? Do you have SAP virtualized? And the answer is yes, today we do, and the performance is there, the optimization is there, and that flexibility is there.

If you’re just starting out today, my advice would be to go ahead and start small. Give the business what they want, do it right, and give it the resources it needs to have. Don’t under-promise, over-deliver, and let the business start seeing the efficiencies that they can realize, and some of those hidden efficiencies as well.

We can support DR testing. We can support almost instant data refreshes, cloning, and snapping, so their upgrades are more seamless, and they have an easier back-out plan.

From an engineering and development perspective, we're giving them technologies that they could only dream of four or five years ago. And it’s really benefited the business in that we’re auto-provisioning. We’re provisioning in minutes versus days. We’re granting resources when needed.

It’s a more dynamic process for the business, and we’re really seeing that people are saying, "You’re not just a cost center anymore. You’re enabling us, you’re helping us to do what we need to do and basically doing it on-demand." So our team has really started shining these last few years, especially because of our high virtualization percentage.

If you set off trying to truly attack an entire data center virtualization project, you’re probably not going to be really successful at it



Leeper: For a company that's looking to move to this virtualization space, they’ve got to get some wins. You’ve got to tackle some environments or some projects that you can be successful at, and hopefully by partnering with some business users and business owners who are willing to take a little bit of a chance.

If you set off trying to truly attack an entire data center virtualization project, you’re probably not going to be really successful at it. There are a lot of ways that the business, application vendors, and various things can throw some roadblocks in this.

Once you start chipping away at a couple of them and get above the easy stuff, go find one that maybe on paper is a little difficult, but go get that one done. Then you can very quickly point back to success on that piece and start working your way through the rest of them.

Gardner: Yeah, one of those roadblocks that you mentioned I've heard people refer to is issues around licensing and tracking and audits. How did you deal with that? Was that an issue for you when you got into moving onto a virtualized environment?

Leeper: Sure. It’s one of the first things that always comes up. I'm going to separate VMware and the VMware licensing from app and application licensing. On the application side of the house, it’s getting better today than it was two or three years ago when we started this process.

Be confident

You have to be confident in your ability to deal with vendors and demand support on virtualization layers, work with them to help them understand their virtual licensing packages, and be very confident in your ability to get there.

Early on, we had to just look at some vendors straight in the eye and tell them we were going to do this, because this was the best thing for our business, and they needed to figure out how to support us. In some cases, that's just having your team, when you call them support, not have to open with "We’re running this on a VM."

We know we can replicate and then duplicate things in the background when we need to, but sometimes you just have to be smart about how you engage application partners that may not be quite as advanced as we are and work through that.

On the VMware side, it came down to their understanding where our needs were and how to properly license some of the stuff and work through some of those complexities. But it wasn't anything we spent significant amount of time on.

Gardner: You both mentioned this importance of getting the buy-in on the business side and showing wins early, that sort of thing. Because it’s hard many times to put a concrete connection between something that happens in IT and then a business benefit, was there anything that you can think of specifically that benefited your business that you could then turn around and bring back and say, "Well that’s because we did X, Y, and Z with virtualization?"

I had the pleasure of calling the finance VP and informing him that his environments were ready.



Leeper: One of the cool ones we’ve talked about and used for one of our key wins involves our entire architecture obviously with virtualization being key to that.

We had a business unit acquire an SAP module, specifically the BPC for BW module. That was independent of our overall SAP project and they were being run out of a separate business group.

They came to IT in the very late stages of this purchase and said, "These are our needs and requirements," and it was a fairly intense set of equipment. It was multiple servers, multiple environments, kind of up and down the stack, and they were bringing in outside consultants to help them with their implementation.

The interesting thing was, they had spec'd their statement of work (SOW) with these consultants to not start for the 4 to 6 weeks, because they really believed that's how long it was going to take IT to get them their environments and their hardware, using some of their old understanding of IT’s capabilities.

And reality was that we could provide them their test and developement environments that they needed to start with these consultants within a matter of hours, not weeks, and we were able to do so. I had the pleasure of calling the finance VP and informing him that his environments were ready and they were just probably going to sit idle for the next 4-6 weeks until the consultants actually showed up, which surprised all sorts of people.

Add things later


W
e didn't have all their production capacities, but those are things we could add later. They didn’t need production capacity in the first month of the project anyway. So our ability to have that virtualized infrastructure and be able to rapidly deploy to meet business requirements is one of the really kind of cool things we can do these days.

Gardner: Suzan, you’ve mentioned that as an enabler, not a roadblock. So being able to keep up with the speed of business, I suppose, is the best way to characterize this?

Frye: Absolutely. Going back to SRM, another big win for us was, as we were rolling out on some of our Tier 1 mission-critical applications, it was decided by the business that they wanted to test DR. They were going down the path of doing that the old-fashioned way by backing up databases, restoring databases, and taking weeks to do that, days and weeks.

We said, "We think we have a better way with SRM and our replication technologies. We have that data here. Why don't you let us clone that data and stand it up for you?" Literally, within 10 seconds, they had a replica of their data.

So we were enabling them to do their DR testing with SRM, on demand, when they wanted to do that, as well as giving them the benefit of doing the faster cloning and data refreshes. That was just a day-to-day, operational activity that they had no idea we could do for them.

It goes back to working with business and letting them know what you can do. From a day-to-day, practical perspective that was one of our biggest wins.



It goes back to working with business and letting them know what you can do. From a day-to-day, practical perspective that was one of our biggest wins. It's going to specific business units and application owners and saying, "We think we have a better way. What do you think about this?" Once they got their hands on it, just looking at their faces was really a good moment for us.

Gardner: Sure, and of course, as an online retailer, having that dependability that DR provides has to be something that lets you sleep a little better at night.

Frye: Just a little bit.

Gardner: Let's talk a little bit about where you go now. Another thing that I often hear in the market is that the benefits of virtualization are ongoing. It's a journey that keeps providing milestones. It doesn't really end.

Do you have any plans around private cloud perhaps, getting more elasticity and fit-for-purpose benefits out of your implementations? Perhaps you're looking to bring other applications into the fold, or maybe you’ve got some other plans around delivering on business applications at lower cost.

So where do you go next with your virtualization payoff?

Private cloud

Leeper: We consider ourselves having up a private cloud on-site. My team will probably start laughing at me for using that term, but we do believe we have a very flexible and dynamic environment to deploy, based on business request on premises, and we're pretty proud of that. It works pretty well for us.

Where we go next is all over the place. One of the things we're pretty happy about is the fact that we can think about things a little differently now than probably a lot of our peers, because of how migratory our workloads can be, given the virtualization.

We started looking into things like hybrid cloud approaches and the idea of maybe moving some of our workloads out of our premises, our own data facilities, to a cloud provider somewhere else.

For us, that's not necessarily the discussion around the classic public cloud strategies for scalability and some of those things. For us, it's a temporary space at times, if we are, say, moving an office, we want to be able to provide zero downtime, and we have physical equipment on-premises.

It would be nice to be able to shutdown their physical equipment, move their data, move their workloads up to a temporary spot for four or five weeks, and then bring it back at some point, and let users never see an outage while they are working from home or on the road.

There are some interesting scenarios around DR for us and locations where we don't have real-time DR set up.



There are some interesting scenarios around significant DR for us and locations where we don't have real-time DR set up. For instance, we were looking into some issues in Japan, when Japan unfortunately a year or so ago was dealing with the earthquake and the tsunami fallout in power.

We were looking at how we can possibly move our data out of the country for a period of time, while the infrastructure was stabilizing, specifically power, and then maybe bring it back when things settle down again.

Unfortunately we weren't quite virtual on the edge yet there, but today we think that's something we could do. Thinking about how and where we move data to be at the right place at the right time is where we think the next big win for us.

Then, we get into the application profiles that users are asking for and their ability to spin up environments very quickly to just test something. It lets us get out of having IT as being the roadblock to innovation. A lot of times the business or part of our innovation teams come up with some idea on a concept, an application, or whatever it is. They don't have to wait for IT to fulfill their needs. The environments are right there for them.

So I challenge the teams routinely to think a little bit differently about how we've done things in the past, because our architecture is dramatically different than it was even two years ago.

Gardner: Well, great. We have to leave it there. We've been talking about how outerwear and sportswear maker, Columbia Sportswear has used virtualization technologies and models to improve their business operations. We’ve also seen how better systems makes for better applications that can deliver better business results.

So I’d like to thank our guests for joining this BriefingsDirect podcast. We have been here with Michael Leeper, Senior Manager of IT Engineering at Columbia Sportswear in Portland, Oregon. Thank you so much, Michael.

Leeper: Thank you.

Gardner: And we have been joined by Suzan Frye, Manager of Systems Engineering, also there at Columbia Sportswear. Thanks to you, Suzan.

Frye: Thanks, Dana.

Gardner: This is Dana Gardner, Principal Analyst at Interarbor Solutions. Thanks to you all audience for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod. Download the transcript. Sponsor: VMware.

Transcript of a sponsored BriefingsDirect podcast on how Columbia Sportswear has harnessed virtualization to provide a host of benefits for its business units. Copyright Interarbor Solutions, LLC, 2005-2012. All rights reserved.

You may also be interested in:

Monday, August 13, 2012

Ocean Observatories Initiative: Cloud and Big Data Come Together to Give Scientists Unprecedented Access to Essential Climate Information

Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer climate researchers a treasure trove of ongoing, real-time information.

Listen to the podcast. Find it on iTunes/iPod. Download the transcript. Sponsor: VMware.

Dana Gardner: Hi, this is Dana Gardner, Principal Analyst at Interarbor Solutions, and you're listening to BriefingsDirect.

Today, we present a sponsored podcast discussion on a fascinating global ocean studies initiative that defines some of the superlatives around big data, cloud, and middleware integration capabilities.

We'll be exploring the Ocean Observatories Initiative (OOI) and its accompanying Cyberinfrastructure Program. This undertaking by the National Science Foundation aims to provide an unprecedented ability to study the Earth's oceans and climate using myriad distributed data centers and literally oceans' worth of data.

The scale and impact of the science's importance is closely followed by the magnitude of the computer science needed to make that data accessible and actionable by scientists. In a sense, the OOI and its infrastructure program are constructing a big data-scale programmable and integratable cloud fabric.

We’ve gathered three leaders to explain the OOI and how the Cyberinfrastructure Program may not only solve this set of data and compute problems, but perhaps establish a path to how future massive data and analysis problems are solved.

Here to share their story on OOI are our guests:
  • Matthew Arrott, Project Manager at the OOI Cyberinfrastructure. Matthew's career spans more than 20 years in design leadership and engineering management for software and network systems. He’s held leadership positions at Currenex, DreamWorks SKG, Autodesk, and the National Center for Supercomputing Applications. His most recent work has been with the University of California as e-Science Program Manager while focusing on delivering the OOI Cyberinfrastructure capabilities.
  • Michael Meisinger, Managing Systems Architect for the Ocean Observatories Initiative Cyberinfrastructure. Since 2007, Michael has been employed by the University of California, San Diego. He leads a team of systems architects on the OOI Project. Prior to UC San Diego, Michael was a lead developer in an Internet startup, developing a platform for automated customer interactions and data analysis. Michael holds a master's degree in computer science from the Technical University of Munich and will soon complete a PhD in formal services-oriented computing and distributed systems architecture.
Michael Meisinger, could you sum up the OOI for our audience? Let us know a little bit about how it came about.

Ocean Observatories Initiative


Michael Meisinger: Thanks, Dana. The Ocean Observatories Initiative is a large project. It's a US National Science Foundation project that is intended to build a platform for ocean sciences end users and communities interested in this form of data for an operational life span of 30 years.

It comprises a construction period of five years and will integrate a large number of resources and assets. These range from typical oceanographic assets, like instruments that are mounted on buoys deployed in the ocean, to networking infrastructure on the cyberinfrastructure side. It also includes a large number of sophisticated software systems.

I'm the managing architect for the cyberinfrastructure, so I'm primarily concerned with the interfaces through the oceanographic infrastructure, including beta interfaces, networking interfaces, and then primarily, the design of the system that is the network hardware and software system that comprises the cyberinfrastructure.

As I said, OOI’s goals include serving the science and education communities with their needs for receiving, analyzing, and manipulating ocean sciences and environmental data. This will have a large impact on the science community and the overall public, as a whole, because ocean sciences data is very important in understanding the changes and processes of the earth, the environment, and the climate as a whole.

Ocean sciences, as a discipline, hasn't yet received as much infrastructure and central attention as other communities. So the OOI initiative is a very important to bring this to the community. It has an almost $400 million construction budget, and an annual operations budget of $70 million for a planned lifetime of 25 to 30 years.

Gardner: Matthew Arrott, what is the big hurdle here in terms of a compute issue that you've faced. Obviously, it's a tremendously important project with a tremendous amount of data, but from a purely compute requirements perspective, what makes this so challenging?

Matthew Arrott: It has a number of key aspects that we had to address. It's best to start at the top of the functional requirements, which is to provide interactive mission planning and control of the overall instrumentation on the 65 independent platforms that are deployed throughout the ocean.

The issue there is how to provide a standard command-and-control infrastructure over a core set of 800 instruments, about 50 different classes of instrumentation, as well as be able to deploy -- over the 30-year lifecycle -- new instrumentation brought to us by different scientific communities for experimentation.

The next is that the mission planning and control is meant to be interactive and respond to emergent changes. So we needed an event-response infrastructure that allowed us to operate on scales from microseconds to hours in being able to detect and respond to the changes. We needed an ability to move computing throughout the network to deal with the different latency requirements that were needed for the event-response analysis.

Finally, we have computational nodes all the way down in the ocean, as well as on the shore stations, that are accepting or acquiring the data coming off the network. And we're distributing that data in real time to any one who wants to listen to the signals to develop their own sense-and-response mechanisms, whether they're in the cloud, in their local institutions, or on their laptop.

Domain of control

The fundamental challenge was the ability to create a domain of control over instrumentation that is deployed by operators and for processing and data distribution to be agile in its deployment anywhere in the global network.

Gardner: Alexis Richardson, it sounds like a very interesting problem to solve. Why is this a good time to try to solve this? Of course, big data, cloud, doing tremendous amounts of services orientation across middleware and a variety of different formats and transports, is all very prominent in the enterprise now. Given that, what makes this, such an interesting pursuit for you in thinking about this from a software distribution and data distribution perspective?

Alexis Richardson: It really comes down to the scale of the system and the ability of technologies to meet the scale need today. If we had been talking about this 12 years ago, in the year 2000, we would have been talking about companies like Google and Yahoo, which we would not have considered to be of moderate scale.

Since then, many companies have appeared. For example, Facebook, which has many hundreds of millions of users connecting throughout the world, shares vast amounts of data all the time.

It's that scale that's changed the architecture and deployment patterns that people have been using for these applications. In addition to that, many of these companies have brought out essentially a platform capability, whereby others, such as Zynga, in the case of Facebook, can create applications that run inside these networks -- social networks in the case of Facebook.

We can see the OOI project is essentially bringing the science needed to collaborate between vast numbers of sensors and signals and a comparatively smaller number of scientists, research institutions, and scientific applications to do analytics in a similar way as to how Facebook combines what people say, what pictures they post, what music they listen to with everybody’s friends, and then allow an application to be attached to that.

So it’s a huge technology challenge that would have been simply infeasible 12 years ago in the year 2000, when we thought things were big, but they were not. Now, when we talk about big data being masses of terabytes and petabytes that need to be analyzed all the time, then we’re starting to glimpse what's possible with the technology that’s been created in the last 10 years.

It’s a huge technology challenge that would have been simply infeasible 12 years ago.



Arrott: I’d like to actually go one step further than that. The challenge goes beyond just the big data challenge. It also now introduces, as Alexis talked about, the human putting in what they say in their pictures. It introduced that the concept of the instrument as an equal partner with the human in the participation in the network.

So you now have to think about what it means to have a device that’s acting like a human in the network, and the notion that the instrument is, in fact, owned by someone and must be governed by someone, which is not the case with the human, because the human governs themselves. So it represents the notion of an autonomous agent in the network, as well as that agent having a notion of control that has to stay on the network.

Gardner: I’d like to try to explain for our audience a bit more about what is going on here. We understand that we have a tremendous diversity of sensors gathering in real-time a tremendous scale of data. But we’re also talking about automating the gathering and distribution of that data to a variety of applications.

Numerical framework

We’re talking about having applications within this fabric, so that the output is not necessarily data, but is a computational numerical framework that’s then distributed. So there's computation being done at the data level, and then it has to be regulated. Certain data goes to certain people for certain reasons, under certain circumstances.

So there's a lot of data, a lot of logic, and a lot of scale. Can one of you help step me through it all a bit more to understand the architecture of what’s being conducted here?

Meisinger: The challenge, as you mentioned, is very heterogeneous. We deal with various classes of sensors, classes of data, classes of users, or even communities of users, and with classes of technological problems and solution spaces.

So the architecture is based on a tiered model or in a layered model of most invariant things at the bottom, things that shouldn’t change over the lifetime of 30 years to serve the highest level of attention.

Then, we go into our more specialized layered architecture where we try to find optimal solutions using today’s technologies for high-speed messaging, big data, and so on. Then, we go into specialized solutions for specific groups of users and specific sensors that are there as last-mile technologies to integrate them into the system.

Then as you go towards the core, you approach the invariants of the system.



So you basically see an onion layer model of the architecture, externalization outside. Then as you go toward the core, you approach the invariants of the system.

What are the invariants? We recognized that a system of this scale and a system of this heterogeneity cannot be reinvented every five years as part of the typical maintenance. So as a strongly scalable and extensible system, it's distributed in its nature, and as part of the distribution, the most invariant parts are the protocols and the interactions between the distributed entities on the system.

We found that it's essential to define a common language, a common format, for the various applications and participants of the network, including sensor and sensor agents, but also higher-level software services to communicate in a common format.

This architecture is based on defining a common interaction format. It’s based on defining a common data format. You mentioned the complex numerical model. A lot of things in this architecture are defined so that you have an easier model of reaching many heterogeneous communities by ingesting and getting specific solutions into the system, representing them consistently and then presenting them again in the specific format for the audience.

Our architecture is strongly communication-oriented, service-oriented, message-oriented, and federated.

As Matthew mentioned, it’s an important means to have the individual resources, agents, provide their own policies, not having a central bottleneck in the system or central governing entity in the system that defines policies.

Strongly federated


So it’s a strongly federated system. It’s a system that’s strongly technology-independent. The communication product can be implemented by various technologies, and we’re choosing a couple of programming languages and technologies for our initial reference implementation, but it’s strongly extensible for future communities to use.

Gardner: One of the aspects of this that was particularly interesting to me is that this is very much a two-way street. The scientists who are gathering their analysis can very rapidly go back to these sensors, go back to this compute fabric, this fusion of data, and ask it to do other things in real-time; or to bring in data from outside sources to compare and contrast, to find the commonalities and to find what it is that they’re looking for in terms of trends.

Could one of you help me understand why this is a two-way street, and how that's possible given the scale and complexity?

Arrott: The way to think about it, first and foremost, is to think of it as its four core layers. There is the underlying network resource management layer. We talk about agents. They supply that capability to any process in the system, and we create devices that process.

The next layer up is the data layer, and the data layer consists of two core parts. One is the distribution system that allows for data to be moved in real-time from the source to the interested parties. It’s fundamentally a publish-subscribe (pub-sub) model. We're currently using point-to-point as well as topic-based subscriptions, but we're quickly moving toward content-based routing, which is more based on the the selector that is provided by the consumer to direct traffic toward them.

The other part of the data layer is the traditional harvesting or retrieval of data from historical repositories.



The other part of the data layer is the traditional harvesting or retrieval of data from historical repositories.

The next layer up is the analytic layer. It looks a lot like the device layer, but is focused on the management of processes that are using the big data and responding to new arrival of data in the network or change in data in the network. Finally, there is the fourth layer, which is the mission planning and control layer, which we’ll talk about later.

Gardner: Alexis, when you saw the problem that needed to be solved here, you had a lot of experience with advanced message queuing protocol (AMQP), which I'd like you to explain to us, and you also understand the requirements of a messaging system that can accomplish what Matthew just described.

So tell me about AMQP, why this problem seems to be the right fit for that particular technology, RabbitMQ, and a messaging infrastructure in general.

Richardson: What Matthew and Michael have described can be broken down into three fundamental pieces of technology.

Lot of chatter

Number one, you have a lot of chatter coming from these devices -- machines, people, and other kinds of processes -- and that needs to get to the right place. It's being chattered or twittered away and possibly at high rates and high frequencies. It needs to get to just the set of receivers following that stream, very similar to how we understand distribution to our computers. So you need what’s called pub-sub, which is a fundamental technology.

In addition, that data needs to be stored somewhere. People need to go back and audit it, to pull it out of the archive and replay it, or view it again. So you need some form of storage and reliability built into your messaging network.

Finally, you need the ability to attach applications that will be written by autonomous groups, scientists, and other people who don’t necessarily talk to one another, may choose these different programming languages, and may be deploying our applications, as Matthew said, on their own servers, on multiple different clouds that they are choosing through what you would like to be a common platform. So you need this to be done in a standard way.

AMQP is unique in bringing together pub-sub with reliable messaging with standards, so that this can happen. That is precisely why AMQP is important. It's like HTTP and email SMTP, but it’s aimed at messaging the publish-subscribe reliable message delivery in a standard way. And RabbitMQ is one of the first implementations, and that’s how we ended up working with the OOI team -- because RabbitMQ provides these and does it well.

Gardner: Now we’ve talked a lot about computer science and some of the thorny issues that have been created as a result of this project, but, I’d also like to go back to the project itself, and give our listeners a sense of what this can accomplish. I’ve heard it described as "the Hubble Telescope of oceans.

It's the notion that we're providing capabilities that do not currently exist for oceanographers.

"

Let’s go back to the oceanography and the climate science. What can we accomplish with this, when this data is delivered in the fashion we’ve been discussing, where the programmability is there, where certain scientists can interact with these sensors and data, ask it to do things, and then get that information back in a format that’s not raw, but is in fact actionable intelligence?

Matthew, what could possibly happen in terms of the change in our understanding of the oceans from this type of undertaking?

Arrott: The way to think about this is not so much from the fact that we know exactly what will happen. It's the notion that we're providing capabilities that do not currently exist for oceanographers. It can be summed up as continual presence in the oceans at multiple scales through multiple perspectives, also known as the different classes of instrumentation that are observed in the ocean.

Another class of instrumentation is deployed specifically for refocusing. The scope of the OOI is such that it is considered to be observing the ocean at multiple scales -- coastal, regional, and global. It is an expandable model such that other observatories, as well as additions to the OOI network, can be considered and deployed in subsequent years.

This allows us now, as Alexis talked about, to attach many different classes of applications to the network. One of the largest classes of applications that we’ll attach to the network are the modeling, in particular the nowcast and forecast modeling.

Happening at scale

T
hrough those observations about the ocean now, about what the ocean will be, and to be able to ground-truth those models going forward, based on data arriving in the same time as the forecasts, provides for a broad range of modeling that has been done for a fair amount of time, but it now allows it to happen at scale.

Once you have that ability to actually model the oceans and predict where it’s going, you can use that to refocus the instrumentation on emergent events. It's this ability to have long-term presence in the ocean, and the ability to refocus the instrumentation on emergent events, that really represents the revolutionary change in the formation of this infrastructure.

Meisinger: Let me add, I'm very fascinated by The Hubble Space Telescope as something that produces fantastic imagery and fantastic insights into the universe. For me as a computer scientist, it’s often very difficult to imagine what users of the system would do with the system.

I’d like to see the OOI as a platform that’s developed by the experts in their fields to deploy the platforms, the buoys, the cables, the sensors into the ocean that then enables the users of the system over 25 years to produce unprecedented knowledge and results out of that system.

The primary mission of our project is to provide this platform, the space telescope in the ocean. And it’s not a single telescope. In our case, it's a set of 65 buoys, locations in the ocean, and even a cable that runs a 1,000 miles at the seafloor of the Pacific Northwest that provides 10 gigabit ethernet connectivity to the instrument, and high power.

The primary mission of our project is to provide this platform, the space telescope in the ocean.



It’s a model where scientists have to compete. They have to compete for a slot on that infrastructure. They'll have to apply for grants and they'll have to reserve the spot, so that they can accomplish the best scientific discoveries out of that system.

It’s kind of the analogy of the space telescope that will bring ocean scientists to the next level. This is our large platform, our large infrastructure that have the best scientists develop and research to best results. That’s the fascination that I see as part of this project.

Gardner: For the average listener to understand, is this comparable to tracking weather and the climate on the surface? Many of us, of course, get our weather forecasts and they seem to be getting better. We have satellites, radar, measurements, and historical data to compare, and we have models of what weather should do. Is this in some ways taking the weather of the oceans? Is it comparable?

Arrott: Quite comparable. There's a movement to instrument the Earth, so that we can understand from observation, as opposed to speculation, what the Earth is actually doing, and from a notion of climate and climate change, what we might be doing to the Earth as participants on it.

The weather community, because of the demand for commercial need for that weather data, has been well in advance of the other environmental sciences in this regard. What you'll find is that OOI is just one of several ongoing initiatives to do exactly what weather has done.

The work that I did at NCSA, was with the atmospheric sciences community was very clear at the time. What could they do if they had the kind of resources that we now have here in the 21st century? We've worked with them and modeled much of our system based on the systems that they built, both in the research area, and in the operational area in programs such as Nova.

Science more mature


Gardner: So, in a sense, we're following the path of what we’ve done with the weather, and understanding the climate on land. We’re now moving into the oceans, but at a time when the computer science is more mature, and in fact, perhaps even much more productive.

Back to you Alexis Richardson. This is being sponsored by the US National Science Foundation, so being cost efficient is very important, of course. How is it that cloud computing is being brought to bear, making this productive, and perhaps even ahead of where the whole weather and predicting weather has been, because we can now avail ourselves of some of the newer tools and models around data and cloud infrastructure?

Richardson: Happily, that’s an easy one. Imagine if a person or scientist wanted to process very quickly a large amount of data that’s come from the oceans to build a picture of the climate, the ocean, or anything to do with the coastal proprieties of the North American coast. They might need to borrow 10,000 or 20,000 machines for an hour, and they might need to have a vast amount of data readily accessible to those machines.

In the cloud, you can do that, and with big data technologies today, that is a realistic proposition. It was not five to 10 years ago. It’s that simple.

Obviously, you need to have the technologies, like this messaging that we talked about, to get that data to those machines so they can be processed. But, the cloud is really there to bring it altogether and to make it seem to the application owner like something that’s just ready for them to acquire it, and when they don’t need it anymore, they can put it back and someone else can use it.

Its common execution infrastructure subsystem is built in order to enable this access to computation and big data very quickly.



Gardner: Back to you Michael. How do you view the advent of cloud computing as a benefit to this sort of initiative? We have a piece of it from Alexis, but I’d like to hear your perspective on why cloud models are enabling this perhaps at an unprecedented scale, but also at a most efficient cost?

Meisinger: Absolutely. It does enable computing at unprecedented scale for exactly reasons that Alexis mentioned. A lot of the earth's environment is changing. Assume that you’re interested in tracking the effect of a hurricane somewhere in the ocean and you’re interested in computing a very complex numerical model that provides certain predictions about currents and other variables of the ocean. You want to do that when the hurricane occurs and you want to do it quickly. Part of the strategy is to enable quick computation on demand.

The OOI architecture, in particular, its common execution infrastructure subsystem, is built in order to enable this access to computation and big data very quickly. You want to be able to make use of execution provider’s infrastructure as a service very quickly to run your own models with the infrastructure that the OOI provides.

Then, there are other users that want to do things more regularly, and they might have their own hardware. They might run their own clusters, but in order to be interoperable, and in order to have excess overflow capabilities, it’s very important to have cloud infrastructure as a means of making the system more homogenous.

So the cloud is a way of abstracting compute resources of the various participants of the system, be they commercial or academic cloud computing providers or institutions that provide their own clusters as cloud systems, and they all form a large compute network, a compute fabric, so that they can run the computation in a predictable way, but also then in a very episodic way.

Cloud as enabler


I really see that the cloud paradigm is one of the enablers of doing this very efficiently, and it enables us as a software infrastructure project to develop the systems, the architecture, to actually manage this computation from a system’s point of view in a central way.

Gardner: Alexis, because of AMQP and the VMware cloud application platform, it seems to me that you’ve been able to shop around for cloud resources, using the marketplace, because you’ve allowed for interoperability among and between platforms, applications, tools, and frameworks.

Is it the case that leveraging AMQP has given you the opportunity to go to where the compute resources are available at the lowest cost when that’s in your best interest?

Richardson: The dividend of interoperability for the end user and the end customer in this platform environment is ultimately portability -- portability through being able to choose where your application will run.

Michael described it very well. A hurricane is coming. Do you want to use the machines provided by the cloud provider here for this price? Do you want to use your own servers? Maybe your neighboring data center has servers available to you, provided those are visible and provided there is this fundamental interoperability through cloud platforms of the type that we are investing in. Then, you will be able to have that choice. And that lets you make these decisions in a way that you could not do before.

Providing a strong platform or a strong technological footprint that’s not specific to any technology is a great benefit to the community out there.



Gardner: I’m afraid we’re almost out of time, but I want to try to compare this to what this will allow in other areas. It’s been mentioned by Alexis and others that this has got some common features to Twitter, Facebook, or Zynga.

We think of the social environment because of the scale, complexity, and the use of cloud models. But we’re doing far more advanced computational activities here. This is simply not a display of 140 characters, based on a very rudimentary search, for example. These are at the high performance computing (HPC) level, supercomputer-level types of requests and analysis.

So are we combining the best of a social fabric approach and the architecture behind that to what we’ve been traditionally exposed to in high-performance computing and supercomputing? If so, what does that mean for how we could bring this to other types of uses in the future? I’ll throw this out to any of you. How are we doing the best of the old and the new computing, and what does that mean for the future?

Meisinger: This is the direction in which the future will evolve, and it’s the combination of proven patterns of interaction that are emerging out of how humans interact applied to high-performance computing. Providing a strong platform or a strong technological footprint that’s not specific to any technology is a great benefit to the community out there.

Providing a reference architecture and a reference implementation that can solve these problems, that social network for sensor networks and for device computation will be a pattern that can be leveraged by other interested participants, either by participating in the system directly or indirectly, where it’s just taking that pattern and the technologies that come with it and basically bringing it to the next level in the future. Developing it as one large project in a coherent set really yields a technology stack and architecture that will carry us far into the future.

Arrott: With all the incremental change that we're introducing is taking the concepts of Facebook and of Twitter and the notions of Dropbox, which is the ability to move a file to a shared place so someone else can pick it up later, which was really not possible long ago. I had to do an FTP server, put up an HTTP server to accomplish that.

Sharing processes

W
hat we are now adding to the mix is not sharing just artifacts, but we’re actually sharing processes with one another, and then specifically sharing instrumentation. I can say to you, "Here, have a look through my telescope." You can move it around and focus it.

Basically, we introduced the concept of artifacts or information resources, as well as the concept of a taskable resource, and the thing that we’re adding to that which can be shared are taskable resources.

Gardner: I’m just going to throw out a few blue-sky ideas that it seems this could be applicable to ... things like genetics and the human genome, but on an individual basis; or crime statistics, in order to have better insight into human behavior at a massive scale; or perhaps even healthcare, where you’re diagnosing specific types of symptoms and then correlating them across entire regions or genetic patterns that would be brought to bear on those symptoms.

Am I off-base? Is this science fiction? Or am I perhaps pointing to where this sort of capability might go next?

It’s a platform where you can plug in your own system or subsystem that you can then make available to whoever is connected to that platform.



Richardson: The answer to your question is, "Yes," if you add one little phrase into that: in real-time. If, you’re talking about crime statistics, as events happen on the streets, information is gathered and shared and processed. As people go on jobs, if information is gathered, shared, and processed on how people are doing, then you will be able to have the kind of crime or healthcare benefits that you described. I’m sure we could think of lots of use cases. Transport is another one.

Arrott: At the institution in which the OOI Cyberinfrastructure is housed, California Institute of Telecommunication and Information Technology (Calit2), all of the concerns that you’ve mentioned are, in fact, active development research programs, all of which have yielded significant improvements in the computational environment for that scientific community.

Gardner: Michael, last word to you. Where do you see this potentially going in terms of the capability? Obviously, it's a very important activity, with the oceans. But the methods that you’re defining, the implementations that you’re perfecting, where do you see them being applied in the not-too-distant future?

Meisinger: You’re absolutely right. This pattern is very applicable and it’s not that frequent that a research and construction project of that size has an ability to provide an end-to-end technology solution to this challenge of big data combined with real-time analysis and real-time command and control of the infrastructure.

What I see that’s evolving into is, first of all, you can take the solutions build in this project and apply it to other communities that are in need for such a solution. But then it could go further. Why not combine these communities into a larger system? Why not federate or connect all these communities into a larger infrastructure that all is based on common ideas, common standards, and that still enables open participation?

It’s a platform where you can plug in your own system or subsystem that you can then make available to whoever is connected to that platform, whoever you trust. So it can evolve into a large ecosystem, and that does not have to happen under the umbrella of one organization such as OOI.

Larger ecosystem

I
t can happen to a larger ecosystem of connected computing based on your own policies, your own technologies, your own standards, but where everyone shares a common piece of the same idea and can take whatever they want and not consume what they’re not interested in.

Gardner: And as I said earlier, at that very interesting intersection of where you can find the most efficient compute resources available and avail yourself of them with that portability, it sounds like a really powerful combination.

We’ve been talking about how the Ocean Observatories Initiative and its accompanying Cyberinfrastructure Program have been not only feeding the means for the ocean to be better understood and climate interaction to be better appreciated, but we’re also seeing how the architecture behind that is leading to the potential for many other big data, cloud fabric, real-time, compute-intensive applications.

Everyone shares a common piece of the same idea and can take whatever they want and not consume what they’re not interested in.



I’d like to thank our guests, Matthew Arrott, Project Manager at the OOI and the initiative for the Cyberinfrastructure. Thank you so much, Matthew.

Arrott: Thank you.

Gardner: We’ve also been joined by Michael Meisinger, Managing Systems Architect for the OOI Cyberinfrastructure. Thank you, Michael.

Meisinger: Thanks, Dana.

Gardner: And Alexis Richardson, the Senior Director for VMware Cloud Application Platform. Thank you, Alexis.

Richardson: Thank you, very much.

Gardner: And this is Dana Gardner, Principal Analyst at Interarbor Solutions. Thanks to you, our audience, for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod. Download the transcript. Sponsor: VMware.

Transcript of a BriefingsDirect podcast on how cloud and big data come together to offer climate researchers a treasure trove of ongoing, real-time information. Copyright Interarbor Solutions, LLC, 2005-2012. All rights reserved.

You may also be interested in: