Showing posts with label AIOps. Show all posts
Showing posts with label AIOps. Show all posts

Friday, November 20, 2020

How the Journey to Modern Data Management is Paved with an Inclusive Edge-to-Cloud Data Fabric


Transcript of a discussion on
the best ways widely inclusive data can be managed for today’s data-rich but too often insights-poor organizations. 

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Hewlett Packard Enterprise.

 

Dana Gardner: Hello, and welcome to the next BriefingsDirect Voice of Analytics Innovation discussion. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on the latest insights into end-to-end data management strategies.

Gardner

As businesses seek to gain insights for more elements of their physical edge -- from factory sensors, myriad machinery, and across field operations -- data remains fragmented. But a Data Fabric approach allows information and analytics to reside locally at the edge yet contribute to the global improvement in optimizing large-scale operations.

Stay with us now as we explore how edge-to-core-to-cloud dispersed data can be harmonized with a common fabric to make it accessible for use by more apps and across more analytics.

To learn more about the ways all data can be managed for today’s data-rich but too often insights-poor organizations, we’re joined by Chad Smykay, Field Chief Technology Officer for Data Fabric at Hewlett Packard Enterprise (HPE). Welcome, Chad.

 


Chad Smykay: Thank you.

 

Gardner: Chad, why are companies still flooded with data? It seems like they have the data, but they’re still thirsty for actionable insights. If you have the data, why shouldn’t you also have the insights readily available?

 

Smykay
Smykay: There are a couple reasons for that. We still see today challenges for our customers. One is just having a common data governance methodology. That’s not just to govern the security and audits, and the techniques around that -- but determining just what your data is.

 

I’ve gone into so many projects where they don’t even know where their data lives; just a simple matrix of where the data is, where it lives, and how it’s important to the business. This is really the first step that most companies just don’t do.

 

Gardner: What’s happening with managing data access when they do decide they want to find it? What’s been happening with managing the explosive growth of unstructured data from all corners of the enterprise?

 

Tame your data

 

Smykay: Five years ago, it was still the Wild West of data access. But we’re finally seeing some great standards being deployed and application programming interfaces (APIs) for that data access. Companies are now realizing there’s power in having one API to rule them all. In this case, we see mostly Amazon S3.

 

There are some other great APIs for data access out there, but just having more standardized API access into multiple datatypes has been great for our customers. It allows for APIs to gain access across many different use cases. For example, business intelligence (BI) tools can come in via an API. Or an application developer can access the same API. So that approach really cuts down on my access methodologies, my security domains, and just how I manage that data for API access.

 

Gardner: And when we look to get buy-in from the very top levels of businesses, why are leaders now rethinking data management and exploitation of analytics? What are the business drivers that are helping technologists get the resources they need to improve data access and management?

 

Smykay: The business drivers gain when data access methods are as reusable as possible across the different use cases. It used to be that you’d have different point solutions, or different open source tools, needed to solve a business use-case. That was great for the short-term, maybe with some quarterly project or something for the year you did it in.

Gaining a common, secure access layer that can access different types of data is the biggest driver of our HPE Data Fabric. And the business drivers gain when the data access methods are as reusable as possible.

 

But then, down the road, say three years out, they would say, “My gosh, we have 10 different tools across the many different use cases we’re using.” It makes it really hard to standardize for the next set of use cases.

 

So that’s been a big business driver, gaining a common, secure access layer that can access different types of data. That’s been the biggest driver for our HPE Data Fabric. That and having common API access definitely reduces the management layer cost, as well as the security cost.

 

Gardner: It seems to me that such data access commonality, when you attain it, becomes a gift that keeps giving. The many different types of data often need to go from the edge to dispersed data centers and sometimes dispersed in the cloud. Doesn’t data access commonality also help solve issues about managing access across disparate architectures and deployment models?

 

Smykay: You just hit the nail on the head. Having commonality for that API layer really gives you the ability to deploy anywhere. When I have the same API set, it makes it very easy to go from one cloud provider, or one solution, to another. But that can also create issues in terms of where my data lives. You still have data gravity issues, for example. And if you don’t have portability of the APIs and the data, you start to see some lock-in with the either the point solution you went with or the cloud provider that’s providing that data access for you.

 

Gardner: Following through on the gift that keeps giving idea, what is it about the Data Fabric approach that also makes analytics easier? Does it help attain a common method for applying analytics?

 

Data Fabric deployment options

 

Smykay: There are a couple of things there. One, it allows you to keep the data where it may need to stay. That could be for regulatory reasons or just depend on where you build and deploy the analytics models. A Data Fabric helps you to start separating out your computing and storage capabilities, but also keeps them coupled for wherever the deployment location is.

 


For example, a lot of our customers today have the flexibility to deploy IT resources out in the edge. That could be a small cluster or system that pre-processes data. They may typically slowly trickle all the data back to one location, a core data center or a cloud location. Having these systems at the edge gives them the benefit of both pushing information out, as well as continuing to process at the edge. They can choose to deploy as they want, and to make the data analytics solutions deployed at the core even better for reporting or modeling.

 

Gardner: It gets to the idea of act locally and learn globally. How is that important, and why are organizations interested in doing that?

 

Smykay: It’s just-in-time, right? We want everything to be faster, and that’s what this Data Fabric approach gets for you.

 

In the past, we’ve seen edge solutions deployed, but you weren’t processing a whole lot at the edge. You were pushing along all the data back to a central, core location -- and then doing something with that data. But we don’t have the time to do that anymore.

 

Unless you can change the laws of physics -- last time I checked, they haven’t done that yet -- we’re bound by the speed of light for these networks. And so we need to keep as much data and systems as we can out locally at the edge. Yet we need to still take some of that information back to one central location so we can understand what’s happening across all the different locations. We still want to make the rearview reporting better globally for our business, as well as allow for more global model management.

 

Gardner: Let’s look at some of the hurdles organizations have to overcome to make use of such a Data Fabric. What is it about the way that data and information exist today that makes it hard to get the most out of it? Why is it hard to put advanced data access and management in place quickly and easily?

 

Track the data journey

 

Smykay: It’s tough for most organizations because they can’t take the wings off the airplane while flying. We get that. You have to begin by creating some new standards within your organization, whether that’s standardizing on an API set for different datatypes, multiple datatypes, a single datatype.

 

Then you need to standardize the deployment mechanisms within your organization for that data. With the HPE Data Fabric, we give the ability to just say, “Hey, it doesn’t matter where you deploy. We just need some x86 servers and we can help you standardize either on one API or multiple APIs.”

 

We now support more than 10 APIs, as well as the many different data types that these organizations may have.

We see a lot of data silos out there today with customers -- and they're getting worse. They're now all over the place between multiple cloud providers. And there's all the networking in the middle. I call it silo sprawl.

 

Typically, we see a lot of data silos still out there today with customers – and they’re getting worse. By worse, I mean they’re now all over the place between multiple cloud providers. I may use some of these cloud storage bucket systems from cloud vendor A, but I may use somebody else’s SQL databases from cloud vendor B, and those may end up having their own access methodologies and their own software development kits (SDKs).

 

Next you have to consider all the networking in the middle. And let’s not even bring up security and authorization to all of them. So we find that the silos still exist, but they’ve just gotten worse and they’ve just sprawled out more. I call it the silo sprawl.

 

Gardner: Wow. So, if we have that silo sprawl now, and that complexity is becoming a hurdle, the estimates are that we’re going to just keep getting more and more data from more and more devices. So, if you don’t get a handle on this now, you’re never going to be able to scale, right?

 

Smykay: Yes, absolutely. If you’re going to have diversity of your data, the right way to manage it is to make it use-case-driven. Don’t boil the ocean. That’s where we’ve seen all of our successes. Focus on a couple of different use cases to start, especially if you’re getting into newer predictive model management and using machine learning (ML) techniques.

But, you also have to look a little further out to say, “Okay, what’s next?” Right? “What’s coming?” When you go down that data engineering and data science journey, you must understand that, “Oh, I’m going to complete use case A, that’s going to lead to use case B, which means I’m going to have to go grab from other data sources to either enrich the model or create a whole other project or application for the business.”

You should create a data journey and understand where you’re going so you don’t just end up with silo sprawl.

Gardner: Another challenge for organizations is their legacy installations. When we talk about zettabytes of data coming, what is it about the legacy solutions -- and even the cloud storage legacy -- that organizations need to rethink to be able to scale?

Zettabytes of data coming

Smykay: It’s a very important point. Can we just have a moment of silence? Because now we’re talking about zettabytes of data. Okay, I’m in.

Some 20 years ago, we were talking about petabytes of data. We thought that was a lot of data, but if you look out to the future, we’re talking about some studies showing connected Internet of Things (IoT) devices generating this zettabytes amount of data.


If you don’t get a handle on where your data points are going to be generated, how they’re going to be stored, and how they’re going to be accessed now, this problem is just going to get worse and worse for organizations.

Look, Data Fabric is a great solution. We have it, and it can solve a ton of these problems. But as a consultant, if you don’t get ahead of these issues right now, you’re going to be under the umbrella of probably 20 different cloud solutions for the next 10 years. So, really, we need to look at the datatypes that you’re going to have to support, the access methodologies, and where those need to be located and supported for your organization.

Gardner: Chad, it wasn’t that long ago that we were talking about how to manage big data, and Hadoop was a big part of that. NoSQL and other open source databases in particular became popular. What is it about the legacy of the big data approach that also needs to be rethought?

Smykay: One common issue we often see is the tendency to go either/or. By that I mean saying, “Okay, we can do real-time analytics, but that’s a separate data deployment. Or we can do batch, rearview reporting analytics, and that’s a separate data deployment.” But one thing that our HPE Data Fabric has always been able to support is both -- at the same time -- and that’s still true.

So if you’re going down a big data or data lake journey -- I think now the term now is a data lakehouse, that’s a new one. For these, basically I need to be able to do my real-time analytics, as well as my traditional BI reporting or rearview mirror reporting -- and that’s what we’ve been doing for over 10 years. That’s probably one of the biggest limitations we have seen.

But it’s a heavy lift to get that data from one location to another, just because of the metadata layer of Hadoop. And then you had dependencies with some of these NoSQL databases out there on Hadoop, it caused some performance issues. You can only get so much performance out of those databases, which is why we have NoSQL databases just out of the box of our Data Fabric -- and we’ve never run into any of those issues.

Gardner: Of course, we can’t talk about end-to-end data without thinking about end-to-end security. So, how do we think about the HPE Data Fabric approach helping when it comes to security from the edge to the core?

Secure data from edge to core

 

Smykay: This is near-and-dear to my heart because everyone always talks about these great solutions out there to do edge computing. But I always ask, “Well, how do you secure it? How do you authorize it? How does my application authorization happen all the way back from the edge application to the data store in the core or in the cloud somewhere?”

That’s what I call off-sprawl, where those issues just add up. If we don’t have one way to secure and manage all of our different data types, then what happens is, “Okay, well, I have this object-based system out there, and it has its own authorization techniques.” It has its own authentication techniques. By the way, it has its own way of enforcing security in terms of who has access to what, unless … I haven’t talked about monitoring, right? How do we monitor this solution?

So, now imagine doing that for each type of data that you have in your organization -- whether it’s a SQL database, because that application is just a driving requirement for that, or a file-based workload, or a block-based workload. You can see where this starts to steamroll and build up to be a huge problem within an organization, and we see that all the time.

We're seeing a ton of issues today in the security space. We're seeing people getting hacked. It happens all the way down to the application layer, as you often have security sprawl that makes it very hard to manage all of the different systems.

 

And, by the way, when it comes to your application developers, that becomes the biggest annoyance for them. Why? Because when they want to go and create an application, they have to go and say, “Okay, wait. How do I access this data? Oh, it’s different. Okay. I’ll use a different key.” And then, “Oh, that’s a different authorization system. It’s a completely different way to authenticate with my app.”

I honestly think that’s why we’re seeing a ton of issues today in the security space. It’s why we’re seeing people get hacked. It happens all the way down to the application layer, as you often have this security sprawl that makes it very hard to manage all of these different systems.

Gardner: We’ve come up in this word sprawl several times now. We’re sprawling with this, we’re sprawling with that; there’s complexity and then there’s going to be even more scale demanded.


The bad news is there is quite a bit to consider when you want end-to-end data management that takes the edge into consideration and has all these other anti-sprawl requirements. The good news is a platform and standards approach with a Data Fabric forms the best, single way to satisfy these many requirements.

So let’s talk about the solutions. How does HPE Ezmeral generally -- and the Ezmeral Data Fabric specifically -- provide a common means to solve many of these thorny problems?

Smykay: We were just talking about security. We provide the same security domain across all deployments. That means having one web-based user interface (UI), or one REST API call, to manage all of those different datatypes.

We can be deployed across any x86 system. And having that multi-API access -- we have more than 10 – allows for multi-data access. It includes everything from storing data into files and storing data in blocks. We’re soon going to be able to support blocks in our solution. And then we’ll be storing data into bit streams such as Kafka, and then into a NoSQL database as well.

Gardner: It’s important for people to understand that HPE Ezmeral is a larger family and that the Data Fabric is a subset. But the whole seems to be greater than the sum of the parts. Why is that the case? How has what HPE is doing in architecting Ezmeral been a lot more than data management?

Smykay: Whenever you have this “whole is greater than the sum of the parts,” you start reducing so many things across the chain. When we talk about deploying a solution, that includes, “How do I manage it? How do I update it? How do I monitor it?” And then back to securing it.

Honestly, there is a great report from IDC that says it best. We show a 567-percent, five-year return on investment (ROI). That’s not from us, that’s IDC talking to our customers. I don’t know of a better business value from a solution than that. The report speaks for itself, but it comes down to these paper cuts of managing a solution. When you start to have multiple paper cuts, across multiple arms, it starts to add up in an organization.

Gardner: Chad, what is it about the HPE Ezmeral portfolio and the way the Data Fabric fits in that provides a catalyst to more improvement?

 

All data put to future use

 

Smykay: One, the HPE Data Fabric can be deployed anywhere. It can be deployed independently. We have hundreds and hundreds of customers. We have to continue supporting them on their journey of compute and storage, but today we are already shipping a solution where we can containerize the Data Fabric as a part of our HPE Ezmeral Container Platform and also provide persistent storage for your containers.

 

The HPE Ezmeral Container Platform comes with the Data Fabric, it’s a part of the persistent storage. That gives you full end-to-end management of the containers, not only the application APIs. That means the management and the data portability.

 

So, now imagine being able to ship the data by containers from your location, as it makes sense for your use case. That’s the powerful message. We have already been on the compute and storage journey; been down that road. That road is not going away. We have many customers for that, and it makes sense for many use cases. We’ve already been on the journey of separating out compute and storage. And we’re in general availability today. There are some other solutions out there that are still on a road map as far as we know, but at HPE we’re there today. Customers have this deployed. They’re going down their compute and storage separation journey with us.

 

Gardner: One of the things that gets me excited about the potential for Ezmeral is when you do this right, it puts you in a position to be able to do advanced analytics in ways that hadn’t been done before. Where do you see the HPE Ezmeral Data Fabric helping when it comes to broader use of analytics across global operations?

 

Smykay: One of our CMOs used to say it best, and which Jack Morris has said: “If it’s going to be about the data, it better be all about the data.”

 


When you improve automating data management across multiple deployments -- managing it, monitoring it, keeping it secure -- you can then focus on those actual use cases. You can focus on the data itself, right? That’s living in the HPE Data Fabric. That is the higher-level takeaway. Our users are not spending all their time and money worrying about the data lifecycle. Instead, they can now go use that data for their organizations and for future use cases.

 

HPE Ezmeral sets your organization up to use your data instead of worrying about your data. We are set up to start using the Data Fabric for newer use cases and separating out compute and storage, and having it run in containers. We’ve been doing that for years. The high-level takeaway is you can go focus on using your data and not worrying about your data.

 

Gardner: How about some of the technical ways that you’re doing this? Things like global namespaces, analytics-ready fabrics, and native multi-temperature management. Why are they important specifically for getting to where we can capitalize on those new use cases?

 

Smykay: Global namespaces is probably the top feature we hear back from our customers on. It allows them to gain one view of the data with the same common security model. Imagine you’re a lawyer sitting at your computer and you double-click on a Data Fabric drive, you can literally then see all of your deployments globally. That helps with discovery. That helps with bringing onboard your data engineers and data scientists. Over the years that’s been one of the biggest challenges, they spend a lot of time building up their data science and data engineering groups and on just discovering the data.

 

Global namespace means I’m reducing my discovery time to figure out where the data is. A lot of this analytics-ready value we’ve been supporting in the open source community for more than 10 years. There’s a ton of Apache open source projects out there, like Presto, Hive, and Drill. Of course there’s also Spark-ready, and we have been supporting Spark for many years. That’s pretty much the de facto standard we’re seeing when it comes to doing any kind of real-time processing or analytics on data.

 

As for multi-temperature, that feature allows you to decrease your cost of your deployment, but still allows managing all your data in one location. There are a lot of different ways we do that. We use erasure coding. We can tear off to Amazon S3-compliant devices to reduce the overall cost of deployment.

 

These features contribute to making it still easier. You gain a common Data Fabric, common security layer, and common API layer.

 

Gardner: Chad, we talked about much more data at the edge, how that’s created a number of requirements, and the benefits of a comprehensive approach to data management. We talked about the HPE Data Fabric solution, what it brings, and how it works. But we’ve been talking in the abstract.

 

What about on the ground? Do you have any examples of organizations that have bitten off and made Data Fabric core for them? As an adopter, what do they get? What are the business outcomes?

 

Central view benefits businesses

 

Smykay: We’ve been talking a lot about edge-to-core-to-cloud, and the one example that’s just top-of-mind is a big, tier-1 telecoms provider. This provider makes the equipment for your AT&Ts and your Vodafones. That equipment sits out on the cell towers. And they have many Data Fabric use cases, more than 30 with us.

 

But the one I love most is real-time antenna tuning. They’re able to improve customer satisfaction in real time and reduce the need to physically return to hotspots on an antenna. They do it via real-time data collection on the antennas and then aggregating that across all of the different layers that they have in their deployments.

One example is real-time antennae tuning. They're able to improve customer satisfaction in real time and reduce the need to physically return to hotspots on an antennae. They do it instead via real-time data collection and aggregating that across all of their deployments.

 

They gain a central view of all of the data using a modern API for the DevOps needs. They still centrally process data, but they also process it at the edge today. We replicate all of that data for them. We manage that for them and take a lot of the traditional data management tasks off the table for them, so they can focus on the use case of the best way to tune antennas.

 

Gardner: They have the local benefit of tuning the antenna. But what’s the global payback? Do we have a business quantitative or qualitative returns for them in doing that?

 

Smykay: Yes, but they’re pretty secretive. We’ve heard that they’ve gotten a payback in the millions of dollars, but an immediate, direct payback for them is in reducing the application development spend everywhere across the layer. That reduction is because they can use the same type of API to publish that data as a stream, and then use the same API semantics to secure and manage it all. They can then take that same application, which is deployed in a container today, and easily deploy it to any remote location around the world.

 

Gardner: There’s that key aspect of the application portability that we’ve danced around a bit. Any other examples that demonstrate the adoption of the HPE Data Fabric and the business pay-offs?

 

Smykay: Another one off the top of my head is a midstream oil and gas customer in the Houston area. This one’s not so much about edge-to-core-to-cloud. This is more about consolidation of use cases.

 

We discussed earlier that we can support both rearview reporting analytics as well as real-time reporting use cases. And in this case, they actually have multiple use cases, up to about five or six right now. Among them, they are able to do predictive failure reports for heat exchangers. These heat exchangers are deployed regionally and they are really temperamental. You have to monitor them all the time.

 

But now they have a proactive model where they can do a predictive failure monitor on those heat exchangers just by checking the temperatures on the floor cameras. They bring in all real-time camera data and they can predict, “Oh, we think we’re having an issue with this heat exchanger on this time and this day.” So that decreases management cost for them.

 

They also gain a dynamic parts management capability for all of their inventory in their warehouses. They can deliver faster, not only on parts, but reduce their capital expenditure (CapEx) costs, too. They have gained material measurement balances. When you push oil across a pipeline, they can detect where that balance is off across the pipeline and detect where they’re losing money, because if they are not pushing oil across the pipe at x amount of psi, they’re losing money.

 

So they’re able to dynamically detect that and fix it along the pipe. They also have a pipeline leak detection that they have been working on, which is modeled to detect corrosion and decay.

 

The point is there are multiple use cases. But because they’re able to start putting those data types together and continue to build off of it, every use case gets stronger and stronger.

 

Gardner: It becomes a virtuous adoption cycle; the more you can use the data generally, then the more value, then the more you invest in getting a standard fabric approach, and then the more use cases pop up. It can become very powerful.

 

This last example also shows the intersection of operational technology (OT) and IT. Together they can start to discover high-level, end-to-end business operational efficiencies. Is that what you’re seeing?

 

Data science teams work together

 

Smykay: Yes, absolutely. A Data Fabric is kind of the Kumbaya set among these different groups. If they’re able to standardize on the IT and developer side, it makes it easier for them to talk the same language. I’ve seen this with the oil and gas customer. Now those data science and data engineering teams work hand in hand, which is where you want to get in your organization. You want those IT teams working with the teams managing your solutions today. That’s what I’m seeing. As you get a better, more common data model or fabric, you get faster and you get better management savings by having your people working better together.

 

Gardner: And, of course, when you’re able to do data-driven operations, procurement, logistics, and transportation you get to what we’re referring generally as digital business transformation.

 

Chad, how does a Data Fabric approach then contribute to the larger goal of business transformation?

 

Smykay: It allows organizations to work together through a common data framework. That’s been one of the biggest issues I’ve seen, when I come in and say, “Okay, we’re going to start on this use case. Where is the data?”

 

Depending on size of the organization, you’re talking to three to five different groups, and sometimes 10 different people, just to put a use case together. But as you create a common data access method, you see an organization where it’s easier and easier for not only your use cases, but your businesses to work together on the goal of whatever you’re trying to do and use your data for.

 

Gardner: I’m afraid we’ll have to leave it there. We’ve been exploring how a Data Fabric approach allows information and analytics to reside locally at the edge, yet contribute to a global improvement in optimizing large-scale operations.

 

And we’ve learned how HPE Ezmeral Data Fabric makes modern data management more attainable so businesses can dramatically improve their operational efficiency and innovate from edge to core to clouds.

 


So please join me in thanking our guest, Chad Smykay, Field Chief Technology Officer for Data Fabric at HPE. Thanks so much, Chad.

 

Smykay: Thank you, I appreciate it.

 

Gardner: And a big thank you as well to our audience for joining this sponsored BriefingsDirect Voice of Analytics Innovation discussion. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of Hewlett Packard Enterprise-supported discussions.

Thanks again for listening. Please pass this along to your IT community, and do come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on the best ways widely inclusive data can be managed for today’s data-rich but too often insights-poor organizations. Copyright Interarbor Solutions, LLC, 2005-2020. All rights reserved.

You may also be interested in:

Thursday, October 08, 2020

The IT Intelligence Foundation for Digital Business Transformation Builds from HPE InfoSight AIOps


A discussion on how HPE
InfoSight has emerged as a broad and inclusive capability for AIOps across an expanding array of HPE products and services.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the BriefingsDirect AIOps innovation podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on how artificial intelligence (AI) increasingly supports IT operations.

Gardner

One of the most successful uses of machine learning (ML) and AI for IT efficiency has been the InfoSight technology developed at Nimble Storage, now part of Hewlett Packard Enterprise (HPE). Initially targeting storage optimization, HPE InfoSight has emerged as a broad and inclusive capability for AIOps across an expanding array of HPE products and services.

Stay with us now as we welcome a Nimble Storage founder, along with a cutting-edge machine learning architect, to examine the expanding role and impact of HPE InfoSight in making IT resiliency better than ever.

To learn more about the latest IT operations solutions that help companies deliver agility and edge-to-cloud business continuity, we’re joined by Varun Mehta, Vice President and General Manager for InfoSight at HPE and founder of Nimble Storage. Welcome, Varun.

Varun Mehta: Nice to be here, Dana.


Gardner:
We’re also here with David Adamson, Machine Learning Architect at HPE InfoSight. Welcome, David.

David Adamson: Thank you very much.

Gardner: Varun, what was the primary motivation for creating HPE InfoSight? What did you have in mind when you built this technology?

Data delivers more than a quick fix

Mehta: Various forms of call home were already in place when we started Nimble, and that’s what we had set up to do. But then we realized that the call home data was used to do very simple actions. It was basically to look at the data one time and try and find problems that the machine was having right then. These were very obvious issues, like a crash. If you had had any kind of software crash, that’s what call home data would identify.

Mehta
We found that if instead of just scanning the data one time, if we could store it in a database and actually look for problems over time in areas wider than just a single use, we could come up with something very interesting. Part of the problem until then was that a database that could store this amount of data cheaply was just not available, which is why people would just do the one-time scan.

The enabler was that a new database became available. We found that rather than just scan once, we could put everyone’s data into one place, look at it, and discover issues across the entire population. That was very powerful. And then we could do other interesting things using data science such as workload planning from all of that data. So the realization was that if the databases became available, we could do a lot more with that data.

Gardner: And by taking advantage of that large data capability and the distribution of analytics through a cloud model, did the scope and relevancy of what HPE InfoSight did exceed your expectations? How far has this now come?

Mehta: It turned out that this model was really successful. They say that, “imitation is the sincerest form of flattery.” And that was proven true, too. Our customers loved it, our competitors found out that our customers loved it, and it basically spawned an entire set of features across all of our competitors.

The reason our customers loved it -- followed by our competitors -- was that it gave people a much broader idea of the issues they were facing. We then found that people wanted to expand this envelope of understanding that we had created beyond just storage.

And that led to people wanting to understand how their hypervisor was doing, for example. And so, we expanded the capability to look into that. People loved the solution and wanted us to expand the scope into far more than just storage optimization.

Gardner: David, you hear Varun describing what this was originally intended for. As a machine learning architect, how has HPE InfoSight provided you with a foundation to do increasingly more when it comes to AIOps, dependability, and reliability of platforms and systems?

The database is full of data that not only tracks everything longitudinally across the installed base, but also over time. The richness of that data gives us features we otherwise could not have conceived of. Many issues can now be automated away.
Adamson: As Varun was describing, the database is full of data that not only tracks everything longitudinally across the installed base, but also over time. The richness of that data set gives us an opportunity to come up with features that we otherwise wouldn’t have conceived of if we hadn’t been looking through the data. Also very powerful from InfoSight’s early days was the proactive nature of the IT support because so many simple issues had now been automated away.
 

That allowed us to spend time investigating more interesting and advanced problems, which demanded ML solutions. Once you’ve cleaned up the Pareto curve of all the simple tasks that can be automated with simple rules or SQL statements, you uncover problems that take longer to solve and require a look at time series and telemetry that’s quantitative in nature and multidimensional. That data opens up the requirement to use more sophisticated techniques in order to make actionable recommendations.

Gardner: Speaking of actionable, something that really impressed me when I first learned about HPE InfoSight, Varun, was how quickly you can take the analytics and apply them. Why has that rapid capability to dynamically impact what’s going on from the data proved so successful? 

Support to succeed

Mehta: It turned out to be one of the key points of our success. I really have to compliment the deep partnership that our support organization has had with the HPE InfoSight team.

The support team right from the beginning prided themselves on providing outstanding service. Part of the proof of that was incredible Net Promoter scores (NPS), which is this independent measurement of how satisfied customers are with our products. Nimble’s NPS score was 86, which is even higher than Apple. We prided ourselves on providing a really strong support experience to the customer.

Whenever a problem would surface, we would work with the support team. Our goal was for a customer to see a problem only once. And then we would rapidly fix that problem for every other customer. In fact, we would fix it preemptively so customers would never have to see it. So, we evolved this culture of identifying problems, creating signatures for these problems, and then running everybody’s data through the signatures so that customers would be preemptively inoculated from these problems. That’s why it became very successful.

Gardner: It hasn’t been that long since we were dealing with red light-green light types of IT support scenarios, but we’ve come a long way. We’re not all the way to fully automated, lights-out, machines running machines operations.

David, where do you think we are on that automated support spectrum? How has HPE InfoSight helped change the nature of systems’ dependability, getting closer to that point where they are more automated and more intelligent?

Adamson: The challenge with fully automated infrastructure stems from the variety of different components in the environments -- and all of the interoperability among those components. If you look at just a simple IT stack, they are typically applications on top of virtual machines (VMs), on top of hosts -- they may or may not have independent storage attached – and then the networking of all these components. That’s discounting all the different applications and various software components required to run them.

Adamson
There are just so many opportunities for things to break down. In that context, you need a holistic perspective to begin to realize a world in which the management of that entire unit is managed in a comprehensive way. And so we strive for observability models and services that collect all the data from all of those sources. If we can get that data in one place to look at the interoperability issues, we can follow the dependency chains.

But then you need to add intelligence on top of that, and that intelligence needs to not only understand all of the components and their dependencies, but also what kinds of exceptions can arise and what is important to the end users.

So far, with HPE InfoSight, we go so far as to pull in all of our subject matter expertise into the models and exception-handling automation. We may not necessarily have upfront information about what the most important parts of your environment are. Instead, we can stop and let the user provide some judgment. It’s truly about messaging to the user the different alternative approaches that they can take. As we see exceptions happening, we can provide those recommendations in a clean and interpretable way, so [the end user] can bring context to bear that we don’t necessarily have ourselves.

Gardner: And the timing for these advanced IT operations services is very auspicious. Just as we’re now able to extend intelligence, we’re also at the point where we have end-to-end requirements – from the edge, to the cloud, and back to the data center.

And under such a hybrid IT approach, we are also facing a great need for general digital transformation in businesses, especially as they seek to be agile and best react to the COVID-19 pandemic. Are we able yet to apply HPE InfoSight across such a horizontal architecture problem? How far can it go?

Seeing the future: End-to-end visibility

Mehta: Just to continue from where David started, part of our limitation so far has been from where we began. We started out in storage, and then as Nimble became part of HPE, we expanded it to compute resources. We targeted hypervisors; we are expanding it now to applications. To really fix problems, you need to have end-to-end visibility. And so that is our goal, to analyze, identify, and fix problems end-to-end.

That is one of the axis of development we’re pursuing. The other axis of development is that things are just becoming more-and-more complex. As businesses require their IT infrastructure to become highly adaptable they also need scalability, self-healing, and enhanced performance. To achieve this, there is greater-and-greater complexity. And part of that complexity has been driven by really poor utilization of resources.

Go back 20 years and we had standalone compute and storage machines that were not individually very well-utilized. Then you had virtualization come along, and virtualization gave you much higher utilization -- but it added a whole layer of complexity. You had one machine, but now you could have 10 VMs in that one place.

Now, we have containers coming out, and that’s going to further increase complexity by a factor of 10. And right on the horizon, we have serverless computing, which will increase the complexity another order of magnitude.

Complexity is increasing, interconnectedness is increasing, and yet the demands on the business to stay agile, competitive, and scalable are also increasing. It's really hard for IT administrators to stay on top of this. That's why you need end-to-end automation.
So, the complexity is increasing, the interconnectedness is increasing, and yet the demands on businesses to stay agile and competitive and scalable are also increasing. It’s really hard for IT administrators to stay on top of this. And that’s why you need end-to-end automation and to collect all of the data to actually figure out what is going on. We have a lot of work cut out for us.
 
There is another area of research, and David spends a lot of time working on this, which is you really want to avoid false positives. That is a big problem with lots of tools. They provide so many false positives that people just turn them off. Instead, we need to work through all of your data to actually say, “Hey, this is a recommendation that you really should pay attention to.” That requires a lot of technology, a lot of ML, and a lot of data science experience to separate the wheat from the chaff.

One of the things that’s happened with the COVID-19 pandemic response is the need for very quick response stats. For example, people have had to quickly set up web sites for contact tracing, reporting on the diseases, and for vaccines use. That shows an accelerated manner in how people need digital solutions -- and it’s just not possible without serious automation.

Gardner: Varun just laid out the complexity and the demands for both the business and the technology. It sounds like a problem that mere mortals cannot solve. So how are we helping those mere mortals to bring AI to bear in a way that allows them to benefit – but, as Varun also pointed out, allows them to trust that technology and use it to its full potential?

Complexity requires automated assistance

Adamson: The point Varun is making is key. If you are talking about complexity, we’re well beyond the point where people could realistically expect to log-in to each machine to find, analyze, or manage exceptions that happen across this ever-growing, complex regime.

Even if you’re at a place where you have the observability solved, and you’re monitoring all of these moving parts together in one place -- even then, it easily becomes overwhelming, with pages and pages of dashboards. You couldn’t employ enough people to monitor and act to spot everything that you need to be spotting.

You need to be able to trust automated exception [finding] methods to handle the scope and complexity of what people are dealing with now. So that means doing a few things.

People will often start with naïve thresholds. They create manual thresholds to give alerts to handle really critical issues, such as all the servers went down.

But there are often more subtle issues that show up that you wouldn’t necessarily have anticipated setting a threshold for. Or maybe your threshold isn’t right. It depends on context. Maybe the metrics that you’re looking at are just the raw metrics you’re pulling out of the system and aren’t even the metrics that give a reliable signal.


What we see from the data science side is that a lot of these problems are multi-dimensional. There isn’t just one metric that you could set a threshold on to get a good, reliable alert. So how do you do that right?

For the problems that IT support provides to us, we apply automation and we move down the Pareto chart to solve things in priority of importance. We also turn to ML models. In some of these cases, we can train a model from the installed base and use a peer-learning approach, where we understand the correlations between problem states and indicator variables well enough so that we can identify a root cause for different customers and different issues.

Sometimes though, if the issue is rare enough, scanning the installed base isn’t going to give us a high enough signal to the noise. Then we can take some of these curated examples from support and do a semi-supervised loop. We basically say, “We have three examples that are known. We’re going to train a model on them.” Maybe it’s a few tens of thousands of data points, but it’s still in the three examples, so there’s co-correlation that we are worried about. 


In that case we say: “Let me go fishing in that installed base with these examples and pull back what else gets flagged.” Then we can turn those back over to our support subject matter experts and say, “Which of these really look right?” And in that way, you can move past the fact that your starting data set of examples is very small and you can use semi-supervised training to develop a more robust model to identify the issues.

Gardner: As you are refining and improving these models, one of the benefits in being a part of HPE is to access growing data sets across entire industries, regions, and in fact the globe. So, Varun, what is the advantage of being part of HPE and extending those datasets to allow for the budding models to become even more accurate and powerful over time?

Gain a global point of view

Mehta: Being part of HPE has enabled us to leapfrog our competition. As I said, our roots are in storage, but really storage is just the foundation of where things are located in an organization. There is compute, networking, hypervisors, operating systems, and applications. With HPE, we certainly now cover the base infrastructure, which is storage followed by compute. At some point we will bring in networking. We already have hypervisor monitoring, and we are actively working on application monitoring.

HPE has allowed us to radically increase the scope of what we can look at, which also means we can radically improve the quality of the solutions we offer to our customers. And so it’s been a win-win solution, both for HPE where we can offer a lot of different insights into our products, and for our customers where we can offer them faster solutions to more kinds of problems.

Gardner: David, anything more to offer on the depth, breadth, and scope of data as it’s helping you improve the models?

Adamson: I certainly agree with everything that Varun said. The one thing I might add is in the feedback we’ve received over time. And that is, one of the key things in making the notifications possible is getting us as close as possible to the customer experience of the applications and services running on the infrastructure.

Gaining additional measurements from the applications themselves is going to give us the ability to differentiate ourselves, to find the important exceptions to the end user, what they really want us to take action on, the events that are truly business-critical.
We’ve done a lot of work to make sure we identify what look like meaningful problems. But we’re fundamentally limited if the scope of what we measure is only at the storage or hypervisor layer. So gaining additional measurements from the applications themselves is going to give us the ability to differentiate ourselves, to find the important exceptions to the end user, what they really want to take action on. That’s critical for us -- not sending people alerts they are not interested in but making sure we find the events that are truly business-critical.
 

Gardner: And as we think about the extensibility of the solution -- extending past storage into compute, ultimately networking, and applications -- there is the need to deal with the heterogeneity of architecture. So multicloud, hybrid cloud, edge-to-cloud, and many edges to cloud. Has HPE InfoSight been designed in a way to extend it across different IT topologies?

Across all architecture

Mehta: At heart, we are building a big data warehouse. You know, part of the challenge is that we’ve had this explosion in the amount of data that we can bring home. For the last 10 years, since InfoSight was first developed, the tools have gotten a lot more powerful. What we now want to do is take advantage of those tools so we can bring in more data and provide even better analytics.

The first step is to deal with all of these use cases. Beyond that, there will probably be custom solutions. For example, you talked about edge-to-cloud. There will be locations where you have good bandwidth, such as a colocation center, and you can send back large amounts of data. But if you’re sitting as the only compute in a large retail store like a Home Depot, for example, or a McDonald’s, then the bandwidth back is going to be limited. You have to live within that and still provide effective monitoring. So I’m sure we will have to make some adjustments as we widen our scope, but the key is having a really strong foundation and that’s what we’re working on right now.

Gardner: David, anything more to offer on the extensibility across different types of architecture, of analyzing the different sources of analytics?

Adamson: Yes, originally, when we were storage-focused and grew to the hypervisor level, we discovered some things about the way we keep our data organized. If we made it more modular, we could make it easier to write simple rules and build complex models to keep turnaround time fast. We developed some experience and so we’ve taken that and applied it in the most recent release of recommendations into our customer portal.


We’ve modularized our data model even further to help us support more use cases from environments that may or may not have specific components. Historically, we’ve relied on having Nimble Storage, they’re a hub for everything to be collected. But we can’t rely on that anymore. We want to be able to monitor environments that don’t necessarily have that particular storage device, and we may have to support various combinations of HPE products and other non-HPE applications.

Modularizing our data model to truly accommodate that has been something that we started along the path for and I think we’re making good strides toward.

The other piece is in terms of the data science. We’re trying to leverage longitudinal data as much as possible, but we want to make sure we have a sufficient set of meaningful ML offerings. So we’re looking at unsupervised learning capabilities that we can apply to environments for which we don’t have a critical mass of data yet, especially as we onboard monitoring for new applications. That’s been quite exciting to work on.

Gardner: We’ve been talking a lot about the HPE InfoSight technology, but there also has to be considerations for culture. A big part of digital transformation is getting silos between people broken down.

Is there a cultural silo between the data scientists and the IT operations people? Are we able to get the IT operations people to better understand what data science can do for them and their jobs? And perhaps, also allow the data scientists to understand the requirements of a modern, complex IT operations organization? How is it going between these two groups, and how well are they melding?

IT support and data science team up

Adamson: One of the things that Nimble did well from the get-go was have tight coupling between the IT support engineers and the data science team. The support engineers were fielding the calls from the IT operations guys. They had their fingers on the pulse of what was most important. That meant not only building features that would help our support engineers solve their escalations more quickly, but also things that we can productize for our customers to get value from directly.

Gardner: One of the great ways for people to better understand a solution approach like HPE InfoSight is through examples.  Do we have any instances that help people understand what it can do, but also the paybacks? Do we have metrics of success when it comes to employing HPE InfoSight in a complex IT operations environment?

Mehta: One of the examples I like to refer to was fairly early in our history but had a big impact. It was at the University Hospital of Basel in Switzerland. They had installed a new version of VMware, and a few weeks afterward things started going horribly wrong with their implementation that included a Nimble Storage device. They called VMware and VMware couldn’t figure it out. Eventually they called our support team and using InfoSight, our support team was able to figure it out really quickly. The problem turned out to be a result of a new version of VMware. If there was a hold up in the networking, some sort of bottleneck in their networking infrastructure, this VMware version would try really hard to get the data through.

We were able to preemptively alert other people who had the same combinations of VMware and Nimble Storage and say, "Guys, your should either upgrade to this new patch that VMware has made or just be aware that you are susceptible to this problem."
So instead of submitting each write once to the storage array once, it would try 64 times. Suddenly, their traffic went up by 64 times. There was a lot of pounding on the network, pounding on the storage system, and we were able to tell with our analytics that, “Hey this traffic is going up by a huge amount.” As we tracked it back, it pointed to the new version of VMware that had been loaded. We then connected with the VMware support team and worked very closely with all of our partners to identify this bug, which VMware very promptly fixed. But, as you know, it takes time for these fixes to roll out to the field.

We were able to preemptively alert other people who had the same combination of VMware on Nimble Storage and say, “Guys, you should either upgrade to this new patch that VMware has made or just be aware that you are susceptible to this problem.”

So that’s a great example of how our analytics was able to find a problem, get it fixed very quickly -- quicker than any other means possible -- and then prevent others from seeing the same problem.

Gardner: David, what are some of your favorite examples of demonstrating the power and versatility of HPE InfoSight?

Adamson: One that comes to mind was the first time we turned to an exception-based model that we had to train. We had been building infrastructure designed to learn across our installed base to find common resource bottlenecks and identify and rank those very well. We had that in place, but we came across a problem that support was trying to write a signature for. It was basically a drive bandwidth issue.

But we were having trouble writing a signature that would identify the issue reliably. We had to turn to an ML approach because it was fundamentally a multidimensional problem. If we looked across, we have had probably 10 to 20 different metrics that we tracked per drive per minute on each system. We needed to, from those metrics, come up with a good understanding of the probability that this was the biggest bottleneck on the system. This was not a problem we could solve by just setting a threshold.

So we had to really go in and say, “We’re going to label known examples of these situations. We’re going to build the sort of tooling to allow us to do that, and we’re going to put ourselves in a regime where we can train on these examples and initiate that semi-supervised loop.”

We actually had two to three customers that hit that specific issue. By the time we wanted to put that in place, we were able to find a few more just through modeling. But that set us up to start identifying other exceptions in the same way.

We’ve been able to redeploy that pattern now several times to several different problems and solve those issues in an automated way, so we don’t have to keep diagnosing the same known flavors of problems repeatedly in the future.

Gardner: What comes next? How will AI impact IT operations over time? Varun, why are you optimistic about the future?

Software eats the world 

Mehta: I think having a machine in the loop is going to be required. As I pointed out earlier, complexity is increasing by leaps and bounds. We are going from virtualization to containers to serverless. The number of applications keeps increasing and demand on every industry keeps increasing. 

Andreessen Horowitz, a famous venture capital firm once said, “Software is eating the world,” and really, it is true. Everything is becoming tied to a piece of software. The complexity of that is just huge. The only way to manage this and make sure everything keeps working is to use machines.

That’s where the challenge and opportunity is. Because there is so much to keep track of, one of the fundamental challenges is to make sure you don’t have too many false positives. You want to make sure you alert only when there is a need to alert. It is an ongoing area of research.

There’s a big future in terms of the need for our solutions. There’s plenty of work to keep us busy to make sure we provide the appropriate solutions. So I’m really looking forward to it.


There’s also another axis to this. So far, people have stayed in the monitoring and analytics loop and it’s like self-driving cars. We’re not yet ready for machines to take over control of our cars. We get plenty of analytics from the machines. We have backup cameras. We have radars in front that alert us if the car in front is braking too quickly, but the cars aren’t yet driving themselves.

It’s all about analytics yet we haven’t graduated from analytics to control. I think that too is something that you can expect to see in the future of AIOps once the analytics get really good, and once the false positives go away. You will see things moving from analytics to control. So lots of really cool stuff ahead of us in this space.

Gardner: David, where do you see HPE InfoSight becoming more of a game changer and even transforming the end-to-end customer experience where people will see a dramatic improvement in how they interact with businesses?

Adamson: Our guiding light in terms of exception handling is making sure that not only are we providing ML models that have good precision and recall, but we’re making recommendations and statements in a timely manner that come only when they’re needed -- regardless of the complexity.

A lot of hard work is being put into making sure we make those recommendation statements as actionable and standalone as possible. We’re building a differentiator through the fact that we maintain a focus on delivering a clean narrative, a very clear-cut, “human readable text” set of recommendations. 

And that has the potential to save a lot of people a lot of time in terms of hunting, pecking, and worrying about what’s unseen and going on in their environments.

Gardner: Varun, how should enterprise IT organizations prepare now for what’s coming with AIOps and automation? What might they do to be in a better position to leverage and exploit these technologies even as they evolve?

Pick up new tools

Mehta: My advice to organizations is to buy into this. Automation is coming. Too often we see people stuck in the old ways of doing things. They could potentially save themselves a lot of time and effort by moving to more modern tools. I recommend that IT organizations make use of the new tools that are available.

HPE InfoSight is generally available for free when you buy an HPE product, sometimes with only the support contract. So make use of the resources. Look at the literature with HPE InfoSight. It is one of those tools that can be fire-and-forget, which is you turn it on and then you don’t have to worry about it anymore.

It’s the best kind of tool because we will come back to you and tell you if there’s anything you need to be aware of. So that would be the primary advice I would have, which is to get familiar with these automation tools and analytics tools and start using them.

Gardner: I’m afraid we’ll have to leave it there. We have been exploring how HPE InfoSight has emerged as a broad and inclusive capability for AIOps across an expanding array of edge-to-cloud solutions. And we’ve learned how these expanding AIOps capabilities are helping companies deliver increased agility -- and even accelerated digital transformation.


So please join me in thanking our guests, Varun Mehta, Vice President and General Manager for InfoSight at HPE and a founder of Nimble Storage. Thanks so much, Varun.

Mehta: Thank you, Dana.

Gardner: And we’ve also been here with David Adamson, Machine Learning Architect at HPE. Thanks so much, David.

Adamson: Thank you. It’s been a pleasure.

Gardner: And a big thank you as well to our audience for joining this sponsored BriefingsDirect AIOps innovation discussion. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of Hewlett Packard Enterprise-supported discussions.

Thanks again for listening. Please pass this along to your IT community, and do come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Hewlett Packard Enterprise.

A discussion on how HPE InfoSight has emerged as a broad and inclusive capability for AIOps across an expanding array of HPE products and services. Copyright Interarbor Solutions, LLC, 2005-2020. All rights reserved.

You may also be interested in: