Friday, June 17, 2011

Discover Case Study: Holistic ALM Helps Blue Cross and Blue Shield of Florida Break Down Application Inefficiencies, Redundancy

Transcript of a BriefingsDirect podcast from HP Discover 2011 on how Blue Cross and Blue Shield of Florida gains better visibility into application lifecycles for improved operational efficiency and reliability.

Listen to the podcast. Find it on iTunes/iPod and Download the transcript. Sponsor: HP.

Dana Gardner: Hello, and welcome to a special BriefingsDirect podcast series coming to you from the HP Discover 2011 conference in Las Vegas. We're here on the Discover show floor the week of June 6 to explore some major enterprise IT solutions, trends, and innovations making news across HP’s ecosystem of customers, partners, and developers.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, and I'll be your host throughout this series of HP-sponsored Discover live discussions.

We're now going to focus on Blue Cross and Blue Shield of Florida and a case study about how they’ve been able to improve their applications' performance -- and even change the culture of how they test, provide, and operate their applications.

We're here today with Victor Miller, Senior Manager of Systems Management at Blue Cross and Blue Shield of Florida in Jacksonville. Welcome. [Disclosure: HP is a sponsor of BriefingsDirect podcasts.]

Victor Miller: Thank you.

Gardner: Tell me a little bit about this cultural dynamic? When you shift from one way of doing applications, you do employ technology, you do employ products. There are methodologies and process, but I am interested about how you changed your vision of how applications should be done.

Miller: The way we looked at applications was by their silos. It was a bunch of technology silos monitoring and managing their individual ecosystems. There was no real way of pulling information together. We didn’t represent what the customer is actually feeling inside the applications.

One of the things we started looking at was that we have to focus on the customers, seeing exactly what they were doing in the application to bring the information back. We were looking at the performance of the end-user transactions or what the end-users were doing inside the app, versus what Oracle database is doing, for example.

When you start pulling that information together, it allows you to get full traceability of the performance of the entire application from a development, test, staging, performance testing, and then also production side. You can actually compare that information to understand exactly where you're at. Also, you're breaking down those technology silos, when you're doing that. You move more toward a proactive transactional monitoring perspective.

Gardner: It sounds as if you started looking at the experience of the application, rather than the metrics or the parts. Is that fair?

Miller: That’s correct. We're looking at how the users are using it and what they're doing inside the applications, like you said, instead of the technology around it. The technology can change. You can add more resources or remove resources, but really it's all up to the end-user, what they are doing in their performance of the apps.

Overcome hurdles

Gardner: In order to make this shift and to enjoy better performance and experience with your applications, you had to overcome some hurdles. Maybe you could explain what Blue Cross and Blue Shield of Florida is. I think I have a pretty good idea, but you can probably do a better job than I. After we learn a bit about your organization, what were some of the hurdles you had to overcome to get toward this improved culture?

Miller: Blue Cross and Blue Shield is one of the 39 independent Blue Crosses throughout the United States. We're based out of Florida. We've been around since about 1944. We're independent licensee of the Blue Cross Blue Shield Association. One of our main focuses is healthcare.

We do sell insurance, but we also have our retail environment, where we're bringing in more healthcare services. It’s really about the well-being of our Florida population. We do things to help Florida as a whole, to make everyone more healthy where possible.

Gardner: Let’s look at that problem set. In order to have a better experience for the health and welfare of your clients and constituents, what was the problem? What did you need to change?

Miller: Well, when we started looking at things we thought we were doing fine until we actually started bringing the data together to understand exactly what was really going on, and our customers weren’t happy with IT performance of their application, the availability of their applications.

From an availability perspective, we weren’t looking very good. So, we had to figure out what we could do to resolve that.

We started looking at the technology silos and bringing them together in one holistic perspective. We started seeing that, from an availability perspective, we weren’t looking very good. So, we had to figure out what we could do to resolve that. In doing that, we had to break down the technology silos, and really focus on the whole picture of the application, and not just the individual components of the applications.

Gardner: So this sounds like you had to go deeper into the network, looking at the ecosystem of the applications. What did you have to do to start to get that full picture?

Miller: Our previous directors reordered our environment and brought in a systems management team. It’s responsibility is to monitor and help manage the infrastructure from that perspective, centralize the tool suites, and understand exactly what we're going to use for the capabilities. We created a vision of what we wanted to do and we've been driving that vision for several years to try to make sure that it stays on target and focused to solve this problem.

Gardner: And how did you go about choosing the products and the management capabilities you're going to employ?

Miller: We were such early adopters that we actually chose best-in-breed. We were agent-based monitoring environment, and we moved to agent-less. At the time, we adopted Mercury SiteScope. Then, we also brought in Mercury’s BAC and a lot of Topaz technologies with diagnostics and things like that. We had other capabilities like Bristol Technology’s TransactionVision.

Umbrella of products

P purchased all the companies and brought them into one umbrella of product suites. It allowed us to bind the best-of-breed. We bought technologies that didn’t overlap, could solve a problem, and integrated well with each other. It allowed us to be able to get more traceability inside of these spaces, so we can get really good information about what the performance availability is of those applications that we're focusing on.

Gardner: In addition to adopting these products, I imagine you also had to change some of your processes and methodologies like ITIL. Tell me about the combination of the products and the processes that led you to some pretty impressive results?

Miller: One of the major things was that it was people, process, and technology that we were focused on in making this happen. On the people side, we moved our command center from our downtown office to our corporate headquarters where all the admins are, so they can be closer to the command center. If there were a problem that command center can directly contact them and they go down in there.

We instituted what I guess I’d like to refer to as "butts in the seat." I can't come with a better name for it, but it's when the person is on call, they were in the command center working down there. They were doing the regular operational work, but they were in the command center. So if there was an incident they would be there to resolve it.

In the agent-based technologies we were monitoring thousands of measurement points. But, you have to be very reactive, because you have to come after the fact trying to figure out which one triggered. Moving to the agent-less technology is a different perspective on getting the data, but you’re focusing on the key areas inside those systems that you want to pay attention to versus the everything model.

In doing that, our admins were challenged to be a little bit more specific as to what they wanted us to pay attention to from a monitoring perspective.

In doing that, our admins were challenged to be a little bit more specific as to what they wanted us to pay attention to from a monitoring perspective to give them visibility into the health of their systems and applications.

Gardner: I imagine that this is translated back into your development earlier into the requirements. Is there a feedback loop of sorts now that you can look to that perhaps you didn’t have in the past?

Miller: Yeah, there is a feedback loop and the big thing around that is actually moving monitoring further back into the process.

We’ve found out is if we fix something in development, it may cost a dollar. If we fix it in testing, it might cost $10. In production staging it may cost $1,000 It could be $10,000 or $100,000, when it’s in production, because that goes back to the entire lifecycle again, and more people are involved. So the idea is moving things further back in the lifecycle has been a very big benefit.

Also, it involved working with the development and testing staffs to understand that you can’t throw application over the wall and say, "Monitor my app, because it’s production." We have no idea which is your application, or we might say that it’s monitored, because we're monitoring infrastructure around your application, but we may not be monitoring a specific component of the application.

Educating people

The challenge there is reeducating people and making sure that they understand that they have to develop their app with monitoring in mind. Then, we can make sure that we can actually give them visibility back into the application if there is a problem, so they can get to the root cause faster, if there's an incident.

Gardner: This is all well and good, and it sounds fabulous for a handful of apps. But I imagine you have to scale this. How do you take what you’ve been describing in terms of this journey, but make it for dozens or hundreds of applications? What is it that you rely on to automate this?

Miller: We’ve created several different processes around this and we focused on monitoring every single technology. We still monitor those from a siloed perspective, but then we also added a few transactional monitors on top of that inside those silos, for example, transaction scripts that run at the same database query over-and-over again to get information out of there.

At the same time, we had to make some changes, where we started leveraging the Universal Configuration Management Database (UCMDB) or Run-time Service Model to bring it up and build business services out of this data to show how all these things relate to each other. The UCMDB behind the scenes is one of the cornerstones of the technology. It brings all that silo-based information together to create a much better picture of the apps.

Gardner: Some people call that a system of record.

Miller: That’s correct. We don’t necessarily call it the system of record. We have multiple systems of record. It’s more like the federation adapter for all these records to pull the information together. It guides us into those systems of record to pull that information out.

We’ve created several different processes around this and we focused on monitoring every single technology.

Gardner: What does this get for you? Are there any metrics or examples you can point to that validate that how effective this can be?

Miller: About eight years ago when we first started this, we had incident meetings where we had between 15 and 20 people going over 20-30 incidents per week. We had those every day of the week On Friday, we would review all the ones for the first four days of the week. So, we were spending a lot of time doing that.

Out of those meetings, we came up with what I call "the monitor of the day." If we found something that was an incident that occurred in the infrastructure that was not caught by some type of monitoring technology, we would then have it monitored. We’d bring that back, and close that loop to make sure that it would never happen again.

Another thing we did was improve our availability. We were taking something like five and six hours to resolve some of these major incidents. We looked at the 80:20 rule. We solved 80 percent of the problems in a very short amount of time. Now, we have six or seven people resolving incidents. Our command center staff is in the command center 24 hours a day to do this type of work.

Additional resources

hen they needed additional resources, they just pick up the phone and call the resources down. So, it’s a level 1 or level 2 type person working with one admin to solve a problem, versus having all hands on deck, where you have 50 admins in a room resolving incidents.

I'm not saying that we don’t have those now. We do, but when we do, it’s a major problem. It’s not something very small. It could be a firmware on a blade enclosure going down, which takes an entire group of applications down. It's not something you can plan for, because you're not making changes to your systems. It's just old hardware or stuff like that that can cause an outage.

Another thing that is done for us is those 20 or 30 incidents we had per week are down to one or two. Knock on wood on that one, but it is really a testament to a lot of the things that our IT department has done as a whole. They're putting a lot of effort into into reducing the number of incidents that are occurring in the infrastructure. And, we're partnering with them to get the monitoring in place to allow for them to get the visibility in the applications to actually throw alerts on trends or symptoms, versus throwing the alert on the actual error that occurs in the infrastructure.

Gardner: Now, we started talking earlier about your philosophy and the experience of the user. Are there any metrics or anecdotes from the welfare and benefit of your end-customers that have developed from the way that you’ve been able to improve your applications?

Miller: Customer satisfaction for IT is a lot higher now than it used to be. IT is being called in to support and partner with the business, versus business saying, "I want this," and then IT does it in a vacuum. It’s more of a partnership between the two entities to be able to bring stuff together. Operations is creating dashboards and visibility into business applications for the business, so they can see exactly what they're doing in the performance of their one department, versus just from an IT perspective. We can get the data down to specific people now.

Customer satisfaction for IT is a lot higher now than it used to be. IT is being called in to support and partner with the business.

Gardner: Because these activities are a journey, you never perhaps get to an end destination. What are you looking forward to next? What’s the roadmap for improving even beyond where you are now?

Miller: Some of the big things I am looking at are closed-loop processes, where I have actually started to work with making some changes, working with our change management team to make changes to the way that we do changes in our environment where everything is configuration item (CI) based, and doing that allows for that complete traceability of an asset or a CI through its entire lifecycle.

You understand every incident, request, problem request that ever occurred on that asset, but also you can actually see financial information. You can also see inventory information and location information and start bringing the information together to make smart decisions based on the data that you have in your environment.

Gardner: That sounds like it could lead to some significant cost savings in the long run?

Miller: That’s my hope. The really big thing is really to help reduce the cost of IT in our business and be able to do whatever we can to help cut our cost and keep a lean ship going.

Gardner: Well, great. We’ve been hearing about a user case study, Blue Cross and Blue Shield of Florida, and how they’ve been improving their application performance and the user experience, and then ultimately providing a better visibility for IT and the perception of IT along with overall reduction in total cost. We’ve been hearing this story from Victor Miller, Senior Manager of Systems Management at Blue and Cross Blue Shield of Florida in Jacksonville. Thank you.

Miller: Thank you.

Gardner: And thanks to our audience for joining this special BriefingsDirect podcast coming to you from the HP Discover 2011 Conference in Las Vegas. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this series of user experience discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Download the transcript. Sponsor: HP.

Transcript of a BriefingsDirect podcast from HP Discover 2011 on how Blue Cross and Blue Shield of Florida gains better visibility into application lifecycles for improved operational efficiency and reliability. Copyright Interarbor Solutions, LLC, 2005-2011. All rights reserved.

You may also be interested in:

No comments:

Post a Comment