Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts

Wednesday, February 22, 2017

How Development and Management of Modern Applications Benefits from Data-Driven Continuous Intelligence

Transcript of a discussion on how modern applications are different, and what data and insight are needed to make them more robust, agile and responsive.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Sumo Logic.

Dana Gardner: Welcome to the next edition of BriefingsDirect. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator.

Gardner
Today, more than ever, how a company's applications perform equates with how the company itself performs and is perceived. From airlines to retail, from finding cabs to gaming, how the applications work deeply impacts how the business processes and business outcomes work.

We’ll now explore how new levels of insight and intelligence into what really goes on underneath the covers of modern applications ensure that apps are built, deployed, and operated properly.

A new breed of continuous intelligence emerges by gaining data from systems infrastructure logs -- either on-premises or in the cloud -- and then cross-referencing that with intrinsic business metrics information.
Access the Webinar
On Gaining Operational Visibility
Into AWS
We’re here with an executive from Sumo Logic to learn how modern applications are different, what's needed to make them robust and agile, and how the right mix of data, metrics and machine learning provides the means to make and keep apps operating better than ever.

With that, please join me in welcoming our guest, Ramin Sayar, President and CEO of Sumo Logic. Welcome to BriefingsDirect, Ramin.

Ramin Sayar: Thank you very much, Dana. I appreciate it.

Gardner: There’s no doubt that the apps make the company, but what is it about modern applications that makes them so difficult to really know? How is that different from the applications we were using 10 years ago?

Sayar: You hit it on the head a little bit earlier. This notion of always-on, always-available, always-accessible types of applications, either delivered through rich web mobile interfaces or through traditional mechanisms that are served up through laptops or other access points and point-of-sale systems are driving a next wave of technology architecture supporting these apps.

These modern apps are around a modern stack, and so they’re using new platform services that are created by public-cloud providers, they’re using new development processes such as agile or continuous delivery, and they’re expected to constantly be learning and iterating so they can improve not only the user experience -- but the business outcomes.

Gardner: Of course, developers and business leaders are under pressure, more than ever before, to put new apps out more quickly, and to then update and refine them on a continuous basis. So this is a never-ending process.

User experience

Sayar: You’re spot on. The obvious benefits around always on is centered on the rich user interaction and user experience. So, while a lot of the conversation around modern apps tends to focus on the technology and the components, there are actually fundamental challenges in the process of how these new apps are also built and managed on an ongoing basis, and what implications that has for security. A lot of times, those two aspects are left out when people are discussing modern apps.

Sayar
Gardner: That's right. We’re now talking so much about DevOps these days, but in the same breath, we’re taking about SecOps -- security and operations. They’re really joined at the hip.

Sayar: Yes, they’re starting to blend. You’re seeing the technology decisions around public cloud, around Docker and containers, and microservices and APIs, and not only led by developers or DevOps teams. They’re heavily influenced and partnering with the SecOps and security teams and CISOs, because the data is distributed. Now there needs to be better visibility instrumentation, not just for the access logs, but for the business process and holistic view of the service and service-level agreements (SLAs).

Gardner: What’s different from say 10 years ago? Distributed used to mean that I had, under my own data-center roof, an application that would be drawing from a database, using an application server, perhaps a couple of services, but mostly all under my control. Now, it’s much more complex, with many more moving parts.

Sayar: We like to look at the evolution of these modern apps. For example, a lot of our customers have traditional monolithic apps that follow the more traditional waterfall approach for iterating and release. Often, those are run on bare-metal physical servers, or possibly virtual machines (VMs). They are simple, three-tier web apps.

We see one of two things happening. The first is that there is a need for either replacing the front end of those apps, and we refer to those as brownfield. They start to change from waterfall to agile and they start to have more of an N-tier feel. It's really more around the front end. Maybe your web properties are a good example of that. And they start to componentize pieces of their apps, either on VMs or in private clouds, and that's often good for existing types of workloads.
Now there needs to be better visibility instrumentation, not just for the access logs, but for the business process and holistic view of the service and service-level agreements.

The other big trend is this new way of building apps, what we call greenfield workloads, versus the brownfield workloads, and those take a fundamentally different approach.

Often it's centered on new technology, a stack entirely using microservices, API-first development methodology, and using new modern containers like Docker, Mesosphere, CoreOS, and using public-cloud infrastructure and services from Amazon Web Services (AWS), or Microsoft Azure. As a result, what you’re seeing is the technology decisions that are made there require different skill sets and teams to come together to be able to deliver on the DevOps and SecOps processes that we just mentioned.

Gardner: Ramin, it’s important to point out that we’re not just talking about public-facing business-to-consumer (B2C) apps, not that those aren't important, but we’re also talking about all those very important business-to-business (B2B) and business-to-employee (B2E) apps. I can't tell you how frustrating it is when you get on the phone with somebody and they say, “Well, I’ll help you, but my app is down,” or the data isn’t available. So this is not just for the public facing apps, it's all apps, right?

It's a data problem

Sayar: Absolutely. Regardless of whether it's enterprise or consumer, if it's mid-market small and medium business (SMB) or enterprise that you are building these apps for, what we see from our customers is that they all have a similar challenge, and they’re really trying to deal with the volume, the velocity, and the variety of the data around these new architectures and how they grapple and get their hands around it. At the end of day, it becomes a data problem, not just a process or technology problem.

Gardner: Let's talk about the challenges then. If we have many moving parts, if we need to do things faster, if we need to consider the development lifecycle and processes as well as ongoing security, if we’re dealing with outside third-party cloud providers, where do we go to find the common thread of insight, even though we have more complexity across more organizational boundaries?

Sayar: From a Sumo Logic perspective, we’re trying to provide full-stack visibility, not only from code and your repositories like GitHub or Jenkins, but all the way through the components of your code, to API calls, to what your deployment tools are used for in terms of provisioning and performance.

We spend a lot of effort to integrate to the various DevOps tool chain vendors, as well as provide the holistic view of what users are doing in terms of access to those applications and services. We know who has checked in which code or which branch and which build created potential issues for the performance, latency, or outage. So we give you that 360-view by providing that full stack set of capabilities.
Unlike others that are out there and available for you, Sumo Logic's architecture is truly cloud native and multitenant, but it's centered on the principle of near real-time data streaming.

Gardner: So, the more information the better, no matter where in the process, no matter where in the lifecycle. But then, that adds its own level of complexity. I wonder is this a fire-hose approach or boiling-the-ocean approach? How do you make that manageable and then actionable?

Sayar: We’ve invested quite a bit of our intellectual property (IP) on not only providing integration with these various sources of data, but also a lot in the machine learning  and algorithms, so that we can take advantage of the architecture of being a true cloud native multitenant fast and simple solution.

So, unlike others that are out there and available for you, Sumo Logic's architecture is truly cloud native and multitenant, but it's centered on the principle of near real-time data streaming.

As the data is coming in, our data-streaming engine is allowing developers, IT ops administrators, sys admins, and security professionals to be able to have their own view, coarse-grained or granular-grained, from our back controls that we have in the system to be able to leverage the same data for different purposes, versus having to wait for someone to create a dashboard, create a view, or be able to get access to a system when something breaks.

Gardner: That’s interesting. Having been in the industry long enough, I remember when logs basically meant batch. You'd get a log dump, and then you would do something with it. That would generate a report, many times with manual steps involved. So what's the big step to going to streaming? Why is that an essential part of making this so actionable?

Sayar: It’s driven based on the architectures and the applications. No longer is it acceptable to look at samples of data that span 5 or 15 minutes. You need the real-time data, sub-second, millisecond latency to be able to understand causality, and be able to understand when you’re having a potential threat, risk, or security concern, versus code-quality issues that are causing potential performance outages and therefore business impact.

The old way was hope and pray, when I deployed code, that I would find something when a user complains is no longer acceptable. You lose business and credibility, and at the end of the day, there’s no real way to hold developers, operations folks, or security folks accountable because of the legacy tools and process approach.

Center of the business

Those expectations have changed, because of the consumerization of IT and the fact that apps are the center of the business, as we’ve talked about. What we really do is provide a simple way for us to analyze the metadata coming in and provide very simple access through APIs or through our user interfaces based on your role to be able to address issues proactively.

Conceptually, there’s this notion of wartime and peacetime as we’re building and delivering our service. We look at the problems that users -- customers of Sumo Logic and internally here at Sumo Logic -- are used to and then we break that down into this lifecycle -- centered on this concept of peacetime and wartime.

Peacetime is when nothing is wrong, but you want to stay ahead of issues and you want to be able to proactively assess the health of your service, your application, your operational level agreements, your SLAs, and be notified when something is trending the wrong way.

Then, there's this notion of wartime, and wartime is all hands on deck. Instead of being alerted 15 minutes or an hour after an outage has happened or security risk and threat implication has been discovered, the real-time data-streaming engine is notifying people instantly, and you're getting PagerDuty alerts, you're getting Slack notifications. It's no longer the traditional helpdesk notification process when people are getting on bridge lines.
No longer do you need to do “swivel-chair” correlation, because we're looking at multiple UIs and tools and products.

Because the teams are often distributed and it’s shared responsibility and ownership for identifying an issue in wartime, we're enabling collaboration and new ways of collaboration by leveraging the integrations to things like Slack, PagerDuty notification systems through the real-time platform we've built.

So, the always-on application expectations that customers and consumers have, have now been transformed to always-on available development and security resources to be able to address problems proactively.

Gardner: It sounds like we're able to not only take the data and information in real time from the applications to understand what’s going on with the applications, but we can take that same information and start applying it to other business metrics, other business environmental impacts that then give us an even greater insight into how to manage the business and the processes. Am I overstating that or is that where we are heading here?

Sayar: That’s exactly right. The essence of what we provide in terms of the service is a platform that leverages the machine logs and time-series data from a single platform or service that eliminates a lot of the complexity that exists in traditional processes and tools. No longer do you need to do “swivel-chair” correlation, because we're looking at multiple UIs and tools and products. No longer do you have to wait for the helpdesk person to notify you. We're trying to provide that instant knowledge and collaboration through the real-time data-streaming platform we've built to bring teams together versus divided.

Gardner: That sounds terrific if I'm the IT guy or gal, but why should this be of interest to somebody higher up in the organization, at a business process, even at a C-table level? What is it about continuous intelligence that cannot only help apps run on time and well, but help my business run on time and well?

Need for agility

Sayar: We talked a little bit about the whole need for agility. From a business point of view, the line-of-business folks who are associated with any of these greenfield projects or apps want to be able to increase the cycle times of the application delivery. They want to have measurable results in terms of application changes or web changes, so that their web properties have either increased or potentially decreased in terms of user satisfaction or, at the end of the day, business revenue.

So, we're able to help the developers, the DevOps teams, and ultimately, line of business deliver on the speed and agility needs for these new modes. We do that through a single comprehensive platform, as I mentioned.

At the same time, what’s interesting here is that no longer is security an afterthought. No longer is security in the back room trying to figure out when a threat or an attack has happened. Security has a seat at the table in a lot of boardrooms, and more importantly, in a lot of strategic initiatives for enterprise companies today.

At the same time we're helping with agility, we're also helping with prevention. And so a lot of our customers often start with the security teams that are looking for a new way to be able to inspect this volume of data that’s coming in -- not at the infrastructure level or only the end-user level -- but at the application and code level. What we're really able to do, as I mentioned earlier, is provide a unifying approach to bring these disparate teams together.
Download the State
Of Modern Applications
In AWS Report
Gardner: And yet individuals can extract the intelligence view that best suits what their needs are in that moment.

Sayar: Yes. And ultimately what we're able to do is improve customer experience, increase revenue-generating services, increase efficiencies and agility of actually delivering code that’s quality and therefore the applications, and lastly, improve collaboration and communication.

Gardner: I’d really like to hear some real world examples of how this works, but before we go there, I’m still interested in the how. As to this idea of machine learning, we're hearing an awful lot today about bots, artificial intelligence (AI), and machine learning. Parse this out a bit for me. What is it that you're using machine learning  for when it comes to this volume and variety in understanding apps and making that useable in the context of a business metric of some kind?

Sayar: This is an interesting topic, because of a lot of noise in the market around big data or machine learning and advanced analytics. Since Sumo Logic was started six years ago, we built this platform to ensure that not only we have the best in class security and encryption capabilities, but it was centered on the fundamental purpose around democratizing analytics, making it simpler to be able to allow more than just a subset of folks get access to information for their roles and responsibilities, whether you're security, ops, or development teams.

To answer your question a little bit more succinctly, our platform is predicated on multiple levels of machine learning and analytics capabilities. Starting at the lowest level, something that we refer to as LogReduce is meant to separate the signal-to-noise ratio. Ultimately, it helps a lot of our users and customers reduce mean time to identification by upwards of 90 percent, because they're not searching the irrelevant data. They're searching the relevant and oftentimes occurring data that's not frequent or not really known, versus what’s constantly occurring in their environment.

In doing so, it’s not just about mean time to identification, but it’s also how quickly we're able to respond and repair. We've seen customers using LogReduce reduce the mean time to resolution by upwards of 50 percent.

Predictive capabilities

Our core analytics, at the lowest level, is helping solve operational metrics and value. Then, we start to become less reactive. When you've had an outage or a security threat, you start to leverage some of our other predictive capabilities in our stack.

For example, I mentioned this concept of peacetime and wartime. In the notion of peacetime, you're looking at changes over time when you've deployed code and/or applications to various geographies and locations. A lot of times, developers and ops folks that use Sumo want to use log compare or outlier predictor operators that are in their machine learning capabilities to show and compare differences of branches of code and quality of their code to relevancy around performance and availability of the service and app.

We allow them, with a click of a button, to compare this window for these events and these metrics for the last hour, last day, last week, last month, and compare them to other time slices of data and show how much better or worse it is. This is before deploying to production. When they look at production, we're able to allow them to use predictive analytics to look at anomalies and abnormal behavior to get more proactive.

So, reactive, to proactive, all the way to predictive is the philosophy that we've been trying to build in terms of our analytics stack and capabilities.
Sumo Logic is very relevant for all these customers that are spanning the data-center infrastructure consolidation to new workload projects that they may be building in private-cloud or public-cloud endpoints.

Gardner: How are some actual customers using this and what are they getting back for their investment?

Sayar: We have customers that span retail and e-commerce, high-tech, media, entertainment, travel, and insurance. We're well north of 1,200 unique paying customers, and they span anyone from Airbnb, Anheuser-Busch, Adobe, Metadata, Marriott, Twitter, Telstra, Xora -- modern companies as well as traditional companies.

What do they all have in common? Often, what we see is a digital transformation project or initiative. They either have to build greenfield or brownfield apps and they need a new approach and a new service, and that's where they start leveraging Sumo Logic.

Second, what we see is that's it’s not always a digital transformation; it's often a cost reduction and/or a consolidation project. Consolidation could be tools or infrastructure and data center, or it could be migration to co-los or public-cloud infrastructures.

The nice thing about Sumo Logic is that we can connect anything from your top of rack switch, to your discrete storage arrays, to network devices, to operating system, and middleware, through to your content-delivery network (CDN) providers and your public-cloud infrastructures.

As it’s a migration or consolidation project, we’re able to help them compare performance and availability, SLAs that they have associated with those, as well as differences in terms of delivery of infrastructure services to the developers or users.

So whether it's agility-driven or cost-driven, Sumo Logic is very relevant for all these customers that are spanning the data-center infrastructure consolidation to new workload projects that they may be building in private-cloud or public-cloud endpoints.

Gardner: Ramin, how about a couple of concrete examples of what you were just referring to.

Cloud migration

Sayar: One good example is in the media space or media and entertainment space, for example, Hearst Media. They, like a lot of our other customers, were undergoing a digital-transformation project and a cloud-migration project. They were moving about 36 apps to AWS and they needed a single platform that provided machine-learning analytics to be able to recognize and quickly identify performance issues prior to making the migration and updates to any of the apps rolling over to AWS. They were able to really improve cycle times, as well as efficiency, with respect to identifying and resolving issues fast.

Another example would be JetBlue. We do a lot in the travel space. JetBlue is also another AWS and cloud customer. They provide a lot of in-flight entertainment to their customers. They wanted to be able to look at the service quality for the revenue model for the in-flight entertainment system and be able to ascertain what movies are being watched, what’s the quality of service, whether that’s being degraded or having to charge customers more than once for any type of service outages. That’s how they're using Sumo Logic to better assess and manage customer experience. It's not too dissimilar from Alaska Airlines or others that are also providing in-flight notification and wireless type of services.

The last one is someone that we're all pretty familiar with and that’s Airbnb. We're seeing a fundamental disruption in the travel space and how we reserve hotels or apartments or homes, and Airbnb has led the charge, like Uber in the transportation space. In their case, they're taking a lot of credit-card and payment-processing information. They're using Sumo Logic for payment-card industry (PCI) audit and security, as well as operational visibility in terms of their websites and presence.
They were able to really improve cycle times, as well as efficiency, with respect to identifying and resolving issues fast.

Gardner: It’s interesting. Not only are you giving them benefits along insight lines, but it sounds to me like you're giving them a green light to go ahead and experiment and then learn very quickly whether that experiment worked or not, so that they can find refine. That’s so important in our digital business and agility drive these days.

Sayar: Absolutely. And if I were to think of another interesting example, Anheuser-Busch is another one of our customers. In this case, the CISO wanted to have a new approach to security and not one that was centered on guarding the data and access to the data, but providing a single platform for all constituents within Anheuser-Busch, whether security teams, operations teams, developers, or support teams.

We did a pilot for them, and as they're modernizing a lot of their apps, as they start to look at the next generation of security analytics, the adoption of Sumo started to become instant inside AB InBev. Now, they're looking at not just their existing real estate of infrastructure and apps for all these teams, but they're going to connect it to future projects such as the Connected Path, so they can understand what the yield is from each pour in a particular keg in a location and figure out whether that’s optimized or when they can replace the keg.

So, you're going from a reactive approach for security and processes around deployment and operations to next-gen connected Internet of Things (IoT) and devices to understand business performance and yield. That's a great example of an innovative company doing something unique and different with Sumo Logic.

Gardner: So, what happens as these companies modernize and they start to avail themselves of more public-cloud infrastructure services, ultimately more-and-more of their apps are going to be of, by, and for somebody else’s public cloud? Where do you fit in that scenario?

Data source and location

Sayar: Whether you’re running on-prem, whether you're running co-los, whether you're running through CDN providers like Akamai, whether you're running on AWS or Azure, Heroku, whether you're running SaaS platforms and renting a single platform that can manage and ingest all that data for you. Interestingly enough, about half our customers’ workloads run on-premises and half of them run in the cloud.

We’re agnostic to where the data is or where their applications or workloads reside. The benefit we provide is the single ubiquitous platform for managing the data streams that are coming in from devices, from applications, from infrastructure, from mobile to you, in a simple, real-time way through a multitenant cloud service.

Gardner: This reminds me of what I heard, 10 or 15 years ago about business intelligence (BI), drawing data, analyzing it, making it close to being proactive in its ability to help the organization. How is continuous intelligence different, or even better, and something that would replace what we refer to as BI?
The expectation is that it’s sub-millisecond latency to understand what's going on, from a security, operational, or user-experience point of view.

Sayar: The issue that we faced with the first generation of BI was it was very rear-view and mirror-centric, meaning that it was looking at data and things in the past. Where we're at today with this need for speed and the necessity to be always on, always available, the expectation is that it’s sub-millisecond latency to understand what's going on, from a security, operational, or user-experience point of view.

I'd say that we're on V2 or next generation of what was traditionally called BI, and we refer to that as continuous intelligence, because you're continuously adapting and learning. It's not only based on what humans know and what rules and correlation that they try to presuppose and create alarms and filters and things around that. It’s what machines and machine intelligence needs to supplement that with to provide the best-in-class type of capability, which is what we refer to as continuous intelligence.

Gardner: We’re almost out of time, but I wanted to look to the future a little bit. Obviously, there's a lot of investing going on now around big data and analytics as it pertains to many different elements of many different businesses, depending on their verticals. Then, we're talking about some of the logic benefit and continuous intelligence as it applies to applications and their lifecycle.

Where do we start to see crossover between those? How do I leverage what I’m doing in big data generally in my organization and more specifically, what I can do with continuous intelligence from my systems, from my applications?

Business Insights

Sayar: We touched a little bit on that in terms of the types of data that we integrate and ingest. At the end of the day, when we talk about full-stack visibility, it's from everything with respect to providing business insights to operational insights, to security insights.

We have some customers that are in credit-card payment processing, and they actually use us to understand activations for credit cards, so they're extracting value from the data coming into Sumo Logic to understand and predict business impact and relevant revenue associated with these services that they're managing; in this case, a set of apps that run on a CDN.

At the same time, the fraud and risk team are using us for threat and prevention. The operations team is using us for understanding identification of issues proactively to be able to address any application or infrastructure issues, and that’s what we refer to as full stack.

Full stack isn’t just the technology; it's providing business visibility insights to line the business users or users that are looking at metrics around user experience and service quality, to operational-level impacts that help you become more proactive, or in some cases, reactive to wartime issues, as we've talked about. And lastly, the security team helps you take a different security posture around reactive and proactive, around threat, detection, and risk.

In a nutshell, where we see these things starting to converge is what we refer to as full stack visibility around our strategy for continuous intelligence, and that is technology to business to users.
Try Sumo Logic for Free
To Get Critical Data and Insights
Into Apps and Infrastructure Operations
Gardner: I’m afraid we will have to leave it here. You've been listening to a sponsored BriefingsDirect discussion on how modern applications are different and what's needed to make them more robust, agile, and responsive. We've heard how new levels of insight and intelligence of what really goes on underneath the covers of modern apps across your lifecycle can ensure that those apps are built, deployed, and operated properly.

So, please join me in thanking our guest, Ramin Sayar, President and CEO of Sumo Logic. Thank you so much.

Sayar: Thank you very much.

Gardner: I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing series of BriefingsDirect discussions. A big thank you to our sponsor today, Sumo Logic, and a big thank you as well to our audience. Please come back for our next edition.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Sumo Logic.

Transcript of a discussion on how modern applications are different, and what data and insight are needed to make them more robust, agile and responsive. Copyright Interarbor Solutions, LLC, 2005-2017. All rights reserved.

You may also be interested in:

Tuesday, January 17, 2017

Fast Acquisition of Diverse Unstructured Data Sources Makes IDOL API Tools a Star at LogitBot

Transcript of a discussion on how high-performing big-data analysis powers an innovative artificial intelligence-based investment tool.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition to the Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on digital transformation. Stay with us now to learn how agile businesses are fending off disruption -- in favor of innovation.

Gardner
Our next case study highlights how high-performing big-data analysis powers an innovative artificial intelligence (AI)-based investment opportunity and evaluation tool. We'll learn how LogitBot in New York identifies, manages, and contextually categorizes truly massive and diverse data sources.

By leveraging entity recognition APIs, LogitBot not only provides investment evaluations from across these data sets, it delivers the analysis as natural-language information directly into spreadsheets as the delivery endpoint. This is a prime example of how complex cloud-to core-to edge processes and benefits can be managed and exploited using the most responsive big-data APIs and services.

To describe how a virtual assistant for targeting investment opportunities is being supported by cloud-based big-data services, we're joined by Mutisya Ndunda, Founder and CEO of LogitBot in New York. Welcome.

Mutisya Ndunda: Thank you so much for having us.

Gardner: We're also here with Michael Bishop, CTO of LogicBot. Welcome, Michael.

Michael Bishop: Thank you for having us. It’s good to be here.
Humanization of Machine Learning
For Big Data Success
Learn More
Gardner: Let’s look at some of the trends driving your need to do what you're doing with AI and bots, bringing together data, and then delivering it in the format that people want most. What’s the driver in the market for doing this?

Ndunda: LogitBot is all about trying to eliminate friction between people who have very high-value jobs and some of the more mundane things that could be automated by AI.

Ndunda
Today, in finance, the industry, in general, searches for investment opportunities using techniques that have been around for over 30 years. What tends to happen is that the people who are doing this should be spending more time on strategic thinking, ideation, and managing risk. But without AI tools, they tend to get bogged down in the data and in the day-to-day. So, we've decided to help them tackle that problem.

Gardner: Let the machines do what the machines do best. But how do we decide where the demarcation is between what the machines do well and what the people do well, Michael?

Bishop: We believe in empowering the user and not replacing the user. So, the machine is able to go in-depth and do what a high-performing analyst or researcher would do at scale, and it does that every day, instead of once a quarter, for instance, when research analysts would revisit an equity or a sector. We can do that constantly, react to events as they happen, and replicate what a high-performing analyst is able to do.

Gardner: It’s interesting to me that you're not only taking a vast amount of data and putting it into a useful format and qualitative type, but you're delivering it in a way that’s demanded in the market, that people want and use. Tell me about this core value and then the edge value and how you came to decide on doing it the way you do?

Evolutionary process

Ndunda: It’s an evolutionary process that we've embarked on or are going through. The industry is very used to doing things in a very specific way, and AI isn't something that a lot of people are necessarily familiar within financial services. We decided to wrap it around things that are extremely intuitive to an end user who doesn't have the time to learn technology.

So, we said that we'll try to leverage as many things as possible in the back via APIs and all kinds of other things, but the delivery mechanism in the front needs to be as simple or as friction-less as possible to the end-user. That’s our core principle.

Bishop: Finance professionals generally don't like black boxes and mystery, and obviously, when you're dealing with money, you don’t want to get an answer out of a machine you can’t understand. Even though we're crunching a lot of information and  making a lot of inferences, at the end of the day, they could unwind it themselves if they wanted to verify the inferences that we have made.

Bishop
We're wrapping up an incredibly complicated amount of information, but it still makes sense at the end of the day. It’s still intuitive to someone. There's not a sense that this is voodoo under the covers.

Gardner: Well, let’s pause there. We'll go back to the data issues and the user-experience issues, but tell us about LogitBot. You're a startup, you're in New York, and you're focused on Wall Street. Tell us how you came to be and what you do, in a more general sense.

Ndunda: Our professional background has always been in financial services. Personally, I've spent over 15 years in financial services, and my career led me to what I'm doing today.

In the 2006-2007 timeframe, I left Merrill Lynch to join a large proprietary market-making business called Susquehanna International Group. They're one of the largest providers of liquidity around the world. Chances are whenever you buy or sell a stock, you're buying from or selling to Susquehanna or one of its competitors.

What had happened in that industry was that people were embracing technology, but it was algorithmic trading, what has become known today as high-frequency trading. At Susquehanna, we resisted that notion, because we said machines don't necessarily make decisions well, and this was before AI had been born.

Internally, we went through this period where we had a lot of discussions around, are we losing out to the competition, should we really go pure bot, more or less? Then, 2008 hit and our intuition of allowing our traders to focus on the risky things and then setting up machines to trade riskless or small orders paid off a lot for the firm; it was the best year the firm ever had, when everyone else was falling apart.

That was the first piece that got me to understand or to start thinking about how you can empower people and financial professionals to do what they really do well and then not get bogged down in the details.

Then, I joined Bloomberg and I spent five years there as the head of strategy and business development. The company has an amazing business, but it's built around the notion of static data. What had happened in that business was that, over a period of time, we began to see the marketplace valuing analytics more and more.

Make a distinction

Part of the role that I was brought in to do was to help them unwind that and decouple the two things -- to make a distinction within the company about static information versus analytical or valuable information. The trend that we saw was that hedge funds, especially the ones that were employing systematic investment strategies, were beginning to do two things, to embrace AI or technology to empower your traders and then also look deeper into analytics versus static data.

That was what brought me to LogitBot. I thought we could do it really well, because the players themselves don't have the time to do it and some of the vendors are very stuck in their traditional business models.

Bishop: We're seeing a kind of renaissance here, or we're at a pivotal moment, where we're moving away from analytics in the sense of business reporting tools or understanding yesterday. We're now able to mine data, get insightful, actionable information out of it, and then move into predictive analytics. And it's not just statistical correlations. I don’t want to offend any quants, but a lot of technology [to further analyze information] has come online recently, and more is coming online every day.

For us, Google had released TensorFlow, and that made a substantial difference in our ability to reason about natural language. Had it not been for that, it would have been very difficult one year ago.

At the moment, technology is really taking off in a lot of areas at once. That enabled us to move from static analysis of what's happened in the past and move to insightful and actionable information.
Relying on a backward-looking mechanism of trying to interpret the future is kind of really dangerous, versus having a more grounded approach.

Ndunda: What Michael kind of touched on there is really important. A lot of traditional ways of looking at financial investment opportunities is to say that historically, this has happened. So, history should repeat itself. We're in markets where nothing that's happening today has really happened in the past. So, relying on a backward-looking mechanism of trying to interpret the future is kind of really dangerous, versus having a more grounded approach that can actually incorporate things that are nontraditional in many different ways.

So, unstructured data, what investors are thinking, what central bankers are saying, all of those are really important inputs, one part of any model 10 or 20 years ago. Without machine learning and some of the things that we are doing today, it’s very difficult to incorporate any of that and make sense of it in a structured way.

Gardner: So, if the goal is to make outlier events your friend and not your enemy, what data do you go to to close the gap between what's happened and what the reaction should be, and how do you best get that data and make it manageable for your AI and machine-learning capabilities to exploit?

Ndunda: Michael can probably add to this as well. We do not discriminate as far as data goes. What we like to do is have no opinion on data ahead of time. We want to get as much information as possible and then let a scientific process lead us to decide what data is actually useful for the task that we want to deploy it on.

As an example, we're very opportunistic about acquiring information about who the most important people at companies are and how they're connected to each other. Does this guy work on a board with this or how do they know each other? It may not have any application at that very moment, but over the course of time, you end up building models that are actually really interesting.

We scan over 70,000 financial news sources. We capture news information across the world. We don't necessarily use all of that information on a day-to-day basis, but at least we have it and we can decide how to use it in the future.

We also monitor anything that companies file and what management teams talk about at investor conferences or on phone conversations with investors.

Bishop: Conference calls, videos, interviews.

Audio to text

Ndunda: HPE has a really interesting technology that they have recently put out. You can transcribe audio to text, and then we can apply our text processing on top of that to understand what management is saying in a structural, machine-based way. Instead of 50 people listening to 50 conference calls you could just have a machine do it for you.

Gardner: Something we can do there that we couldn't have done before is that you can also apply something like sentiment analysis, which you couldn’t have done if it was a document, and that can be very valuable.

Bishop: Yes, even tonal analysis. There are a few theories on that, that may or may not pan out, but there are studies around tone and cadence. We're looking at it and we will see if it actually pans out.

Gardner: And so do you put this all into your own on-premises data-center warehouse or do you take advantage of cloud in a variety of different means by which to corral and then analyze this data? How do you take this fire hose and make it manageable?

Bishop: We do take advantage of the cloud quite aggressively. We're split between SoftLayer and Google. At SoftLayer we have bare-metal hardware machines and some power machines with high-power GPUs.
Humanization of Machine Learning
For Big Data Success
Learn More
On the Google side, we take advantage of Bigtable and BigQuery and some of their infrastructure tools. And we have good, old PostgreSQL in there, as well as DataStax, Cassandra, and their Graph as the graph engine. We make liberal use of HPE Haven APIs as well and TensorFlow, as I mentioned before. So, it’s a smorgasbord of things you need to corral in order to get the job done. We found it very hard to find all of that wrapped in a bow with one provider.

We're big proponents of Kubernetes and Docker as well, and we leverage that to avoid lock-in where we can. Our workload can migrate between Google and the SoftLayer Kubernetes cluster. So, we can migrate between hardware or virtual machines (VMs), depending on the horsepower that’s needed at the moment. That's how we handle it.

Gardner: So, maybe 10 years ago you would have been in a systems-integration capacity, but now you're in a services-integration capacity. You're doing some very powerful things at a clip and probably at a cost that would have been impossible before.

Bishop: I certainly remember placing an order for a server, waiting six months, and then setting up the RAID drives. It's amazing that you can just flick a switch and you get a very high-powered machine that would have taken six months to order previously. In Google, you spin up a VM in seconds. Again, that's of a horsepower that would have taken six months to get.

Gardner: So, unprecedented innovation is now at our fingertips when it comes to the IT side of things, unprecedented machine intelligence, now that the algorithms and APIs are driving the opportunity to take advantage of that data.

Let's go back to thinking about what you're outputting and who uses that. Is the investment result that you're generating something that goes to a retail type of investor? Is this something you're selling to investment houses or a still undetermined market? How do you bring this to market?

Natural language interface

Ndunda: Roboto, which is the natural-language interface into our analytical tools, can be custom tailored to respond, based on the user's level of financial sophistication.

At present, we're trying them out on a semiprofessional investment platform, where people are professional traders, but not part of a major brokerage house. They obviously want to get trade ideas, they want to do analytics, and they're a little bit more sophisticated than people who are looking at investments for their retirement account.  Rob can be tailored for that specific use case.

He can also respond to somebody who is managing a portfolio at a hedge fund. The level of depth that he needs to consider is the only differential between those two things.

In the back, he may do an extra five steps if the person asking the question worked at a hedge fund, versus if the person was just asking about why is Apple up today. If you're a retail investor, you don’t want to do a lot of in-depth analysis.

Bishop: You couldn’t take the app and do anything with it or understand it.
If our initial findings here pan out or continue to pan out, it's going to be a very powerful interface.

Ndunda: Rob is an interface, but the analytics are available via multiple venues. So, you can access the same analytics via an API, a chat interface, the web, or a feed that streams into you. It just depends on how your systems are set up within your organization. But, the data always will be available to you.

Gardner: Going out to that edge equation, that user experience, we've talked about how you deliver this to the endpoints, customary spreadsheets, cells, pivots, whatever. But it also sounds like you are going toward more natural language, so that you could query, rather than a deep SQL environment, like what we get with a Siri or the Amazon Echo. Is that where we're heading?

Bishop: When we started this, trying to parameterize everything that you could ask into enough checkboxes and forums pollutes the screen. The system has access to an enormous amount of data that you can't create a parameterized screen for. We found it was a bit of a breakthrough when we were able to start using natural language.

TensorFlow made a huge difference here in natural language understanding, understanding the intent of the questioner, and being able to parameterize a query from that. If our initial findings here pan out or continue to pan out, it's going to be a very powerful interface.

I can't imagine having to go back to a SQL query if you're able to do it natural language, and it really pans out this time, because we’ve had a few turns of the handle of alleged natural-language querying.

Gardner: And always a moving target. Tell us specifically about SentryWatch and Precog. How do these shake out in terms of your go-to-market strategy?

How everything relates

Ndunda: One of the things that we have to do to be able to answer a lot of questions that our customers may have is to monitor financial markets and what's impacting them on a continuous basis. SentryWatch is literally a byproduct of that process where, because we're monitoring over 70,000 financial news sources, we're analyzing the sentiment, we're doing deep text analysis on it, we're identifying entities and how they're related to each other, in all of these news events, and we're sticking that into a knowledge graph of how everything relates to everything else.

It ends up being a really valuable tool, not only for us, but for other people, because while we're building models. there are also a lot of hedge funds that have proprietary models or proprietary processes that could benefit from that very same organized relational data store of news. That's what SentryWatch is and that's how it's evolved. It started off with something that we were doing as an import and it's actually now a valuable output or a standalone product.

Precog is a way for us to showcase the ability of a machine to be predictive and not be backward looking. Again, when people are making investment decisions or allocation of capital across different investment opportunities, you really care about your forward return on your investments. If I invested a dollar today, am I likely to make 20 cents in profit tomorrow or 30 cents in profit tomorrow?

We're using pretty sophisticated machine-learning models that can take into account unstructured data sources as part of the modeling process. That will give you these forward expectations about stock returns in a very easy-to-use format, where you don't need to have a PhD in physics or mathematics.
We're using pretty sophisticated machine-learning models that can take into account unstructured data sources as part of the modeling process.

You just ask, "What is the likely return of Apple over the next six months," taking into account what's going on in the economy.  Apple was fined $14 billion. That can be quickly added into a model and reflect a new view in a matter of seconds versus sitting down in a spreadsheet and trying to figure out how it all works out.

Gardner: Even for Apple, that's a chunk of change.

Bishop: It's a lot money, and you can imagine that there were quite a few analysts on Wall Street in Excel, updating their models around this so that they could have an answer by the end of the day, where we already had an answer.

Gardner: How do the HPE Haven OnDemand APIs help the Precog when it comes to deciding those sources, getting them in the right format, so that you can exploit?

Ndunda: The beauty of the platform is that it simplifies a lot of development processes that an organization of our size would have to take on themselves.

The nice thing about it is that a drag-and-drop interface is really intuitive; you don't need to be specialized in Java, Python, or whatever it is. You can set up your intent in a graphical way, and then test it out, build it, and expand it as you go along. The Lego-block structure is really useful, because if you want to try things out, it's drag and drop, connect the dots, and then see what you get on the other end.

For us, that's an innovation that we haven't seen with anybody else in the marketplace and it cuts development time for us significantly.

Gardner: Michael, anything more to add on how this makes your life a little easier?

Lowering cost

Bishop: For us, lowering the cost in time to run an experiment is very important when you're running a lot of experiments, and the Combinations product enables us to run a lot of varied experiments using a variety of the HPE Haven APIs in different combinations very quickly. You're able to get your development time down from a week, two weeks, whatever it is to wire up an API to assist them.

In the same amount of time, you're able to wire the initial connection and then you have access to pretty much everything in Haven. You turn it over to either a business user, a data scientist, or a machine-learning person, and they can drag and drop the connectors themselves. It makes my life easier and it makes the developers’ lives easier because it gets back time for us.

Gardner: So, not only have we been able to democratize the querying, moving from SQL to natural language, for example, but we’re also democratizing the choice on sources and combinations of sources in real time, more or less for different types of analyses, not just the query, but the actual source of the data.
The power of a lot of this stuff is in the unstructured world, because valuable information typically tends to be hidden in documents.

Bishop: Correct.

Ndunda: Again, the power of a lot of this stuff is in the unstructured world, because valuable information typically tends to be hidden in documents. In the past, you'd have to have a team of people to scour through text, extract what they thought was valuable, and summarize it for you. You could miss out on 90 percent of the other valuable stuff that's in the document.

With this ability now to drag and drop and then go through a document in five different iterations by just tweaking, a parameter is really useful.

Gardner: So those will be IDOL-backed APIs that you are referring to.

Ndunda: Exactly.

Bishop: It’s something that would be hard for an investment bank, even a few years ago, to process. Everyone is on the same playing field here or starting from the same base, but dealing with unstructured data has been traditionally a very difficult problem. You have a lot technologies coming online as APIs; at the same time, they're also coming out as traditional on-premises [software and appliance] solutions.

We're all starting from the same gate here. Some folks are little ahead, but I'd say that Facebook is further ahead than an investment bank in their ability to reason over unstructured data. In our world, I feel like we're starting basically at the same place that Goldman or Morgan would be.

Gardner: It's a very interesting reset that we’re going through. It's also interesting that we talked earlier about the divide between where the machine and the individual knowledge worker begins or ends, and that's going to be a moving target. Do you have any sense of how that changes its characterization of what the right combination is of machine intelligence and the best of human intelligence?

Empowering humans

Ndunda: I don’t foresee machines replacing humans, per se. I see them empowering humans, and to the extent that your role is not completely based on a task, if it's based on something where you actually manage a process that goes from one end to another, those particular positions will be there, and the machines will free our people to focus on that.

But, in the case where you have somebody who is really responsible for something that can be automated, then obviously that will go away. Machines don't eat, they don’t need to take vacation, and if it’s a task where you don't need to reason about it, obviously you can have a computer do it.

What we're seeing now is that if you have a machine sitting side by side with a human, and the machine can pick up on how the human reasons with some of the new technologies, then the machine can do a lot of the grunt work, and I think that’s the future of all of this stuff.
I don’t foresee machines replacing humans, per se. I see them empowering humans.

Bishop: What we're delivering is that we distill a lot of information, so that a knowledge worker or decision-maker can make an informed decision, instead of watching CNBC and being a single-source reader. We can go out and scour the best of all the information, distill it down, and present it, and they can choose to act on it.

Our goal here is not to make the next jump and make the decision. Our job is to present the information to a decision-maker.

Gardner: It certainly seems to me that the organization, big or small, retail or commercial, can make the best use of this technology. Machine learning, in the end, will win.

Ndunda: Absolutely. It is a transformational technology, because for the first time in a really long time, the reasoning piece of it is within grasp of machines. These machines can operate in the gray area, which is where the world lives.

Gardner: And that gray area can almost have unlimited variables applied to it.

Ndunda: Exactly. Correct.
Humanization of Machine Learning
For Big Data Success
Learn More
Gardner: I'm afraid we'll have to leave it there. We've been exploring how high-performing big-data analysis powers an innovative artificial intelligence-based investment opportunity in a valuation tool, and we've learned how LogitBot in New York identifies, manages, and contextually categorizes truly massive and diverse data sources.

So please join me in thanking our guests, Mutisya Ndunda, Founder and CEO of LogitBot in New York. Thank you, sir.

Ndunda: It was a pleasure. Thank you so much.

Gardner: We've also been here with Michael Bishop, CTO of LogicBot. Thank you, Michael.

Bishop: Thank you, Dana.

Gardner: And a big thank you as well to our audience for joining us for this Hewlett-Packard Enterprise, Voice of the Customer digital transformation discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE sponsored interviews. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how high-performing big-data analysis powers an innovative artificial intelligence-based investment opportunity. Copyright Interarbor Solutions, LLC, 2005-2016. All rights reserved.

You may also be interested in:

Thursday, June 09, 2016

Alation Centralizes Enterprise Data Knowledge by Employing Machine Learning and Crowdsourcing

Transcript of a discussion on how Alation makes data actionable by keeping it up-to-date and accessible using innovative means.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the Hewlett Packard Enterprise (HPE) Voice of the Customer podcast series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation -- and how it’s making an impact on people’s lives.

Gardner
Our next big-data case study discussion focuses on the Tower of Babel problem for disparate data, and explores how Alation manages multiple data types by employing machine learning and crowdsourcing.

We'll explore how Alation makes data more actionable via such innovative means as combining human experts and technology systems.

To learn more about how enterprises and small companies alike can access more data for better analytics, please join me in welcoming Stephanie McReynolds, Vice-President of Marketing at Alation in Redwood City, California. Welcome.
Embed the HPE
Big Data
OEM Software
Stephanie McReynolds: Thank you, Dana. Glad to be here.

Gardner: I've heard of crowdsourcing for many things, and machine learning is more-and-more prominent with big-data activities, but I haven't necessarily seen them together. How did that come about? How do you, and why do you need to, employ both machine learning and experts in crowdsourcing?

McReynolds: Traditionally, we've looked at data as a technology problem. At least over the last 5-10 years, we’ve been pretty focused on new systems like Hadoop for storing and processing larger volumes of data at a lower cost than databases could traditionally support. But what we’ve overlooked in the focus on technology is the real challenge of how to help organizations use the data that they have to make decisions. If you look at what happens when organizations go to apply data, there's often a gap between the data we have available and what decision-makers are actually using to make their decisions.

McReynolds
There was a study that came out within the last couple of years that showed that about 56 percent of managers have data available to them, but they're not using it . So, there's a human gap there. Data is available, but managers aren't successfully applying data to business decisions, and that’s where real return on investment (ROI) always comes from. Storing the data, that’s just an insurance policy for future use.

The concept of crowdsourcing data, or tapping into experts around the data, gives us an opportunity to bring humans into the equation of establishing trust in data. Machine-learning techniques can be used to find patterns and clean the data. But to really trust data as a foundation for decision making human experts are needed to add business context and show how data can be used and applied to solving real business problems.

Gardner: Usually, when you're employing people like that, it can be expensive and doesn't scale very well. How do you manage the fit-for-purpose approach to crowdsourcing where you're doing a service for them in terms of getting the information that they need and you want to evaluate that sort of thing? How do you balance that?

Using human experts

McReynolds: The term "crowdsourcing" can be interpreted in many ways. The approach that we’ve taken at Alation is that machine learning actually provides a foundation for tapping into human experts.

We go out and look at all of the log data in an organization. In particular, what queries are being used to access data and databases or Hadoop file structures. That creates a foundation of knowledge so that the machine can learn to identify what data would be useful to catalog or to enrich with human experts in the organization. That's essentially a way to prioritize how to tap into the number of humans that you have available to help create context around that data.

That’s a great way to partner with machines, to use humans for what they're good for, which is establishing a lot of context and business perspective, and use machines for what they're good for, which is cataloging the raw bits and bytes and showing folks where to add value.

Gardner: What are some of the business trends that are driving your customers to seek you out to accomplish this? What's happening in their environments that requires this unique approach of the best of machine and crowdsourcing and experts?

McReynolds: There are two broader industry trends that have converged and created a space for a company like Alation. The first is just the immense volume and variety of data that we have in our organizations. If it weren’t the case that we're adding additional data storage systems into our enterprises, there wouldn't be a good groundwork laid for Alation, but I think more interestingly perhaps is a second trend and that is around self-service business intelligence (BI).

So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible. That’s created this perfect storm for a system like Alation which helps catalog all the data in the organization and make it more accessible for humans to interpret in accurate ways.
So as we're increasing the number of systems that we're using to store and access data, we're also putting more weight on typical business users to find value in that data and trying to make that as self-service a process as possible.

Gardner: And we often hear in the big data space the need to scale up to massive amounts, but it appears that Alation is able to scale down. You can apply these benefits to quite small companies. How does that work when you're able to help a very small organization with some typical use cases in that size organization?

McReynolds: Even smaller organizations, or younger organizations, are beginning to drive their business based on data. Take an organization like Square, which is a great brand name in the financial services industry, but it’s not a huge organization in and of itself, or Inflection or Invoice2go, which are also Alation customers.

We have many customers that have data analyst teams that maybe start with five people or 20 people. We also have customers like eBay that have closer to a thousand analysts on staff. What Alation provides to both of those very different sizes of organizations is a centralized place, where all of the information around their data is stored and made accessible.

Even if you're only collaborating with three to five analysts, you need that ability to share your queries, to communicate on which queries addressed which business problems, which tables from your HPE Vertica database were appropriate for that, and maybe what Hive tables on your Hadoop implementation you could easily join to those Vertica tables. That type of conversation is just as relevant in a 5-person analytics team as it is in a 1000-person analytics team.

Gardner: Stephanie, if I understand it correctly, you have a fairly horizontal capability that could apply to almost any company and almost any industry. Is that fair, or is there more specialization or customization that you apply to make it more valuable, given the type of company or type of industry?

Generalized technology

McReynolds: The technology itself is a generalized technology. Our founders come from backgrounds at Google and Apple, companies that have developed very generalized computing platforms to address big problems. So the way the technology is structured is general.

The organizations that are going to get the most value out of an Alation implementation are those that are data-driven organizations that have made a strategic investment to use analytics to make business decisions and incorporate that in the strategic vision for the company.

So even if we're working with very small organizations, they are organizations that make data and the analysis of data a priority. Today, it’s not every organization out there. Not every mom-and-pop shop is going to have an Alation instance in their IT organization.

Gardner: Fair enough. Given those organizations that are data-driven, have a real benefit to gain by doing this well, they also, as I understand it, want to get as much data involved as possible, regardless of its repository, its type, the silo, the platform, and so forth. What is it that you've had to do to be able to satisfy that need for disparity and variety across these data types? What was the challenge for being able to get to all the types of data that you can then apply your value to?
Embed the HPE
Big Data
OEM Software
McReynolds: At Alation, we see the variety of data as a huge asset, rather than a challenge. If you're going to segment the customers in your organization, every event and every interaction with those customers becomes relevant to understanding who that individual is and how you might be able to personalize offerings, marketing campaigns, or product development to those individuals.

That does put some burden on our organization, as a technology organization, to be able to connect to lots of different types of databases, file structures, and places where data sits in an organization.

So we focus on being able to crawl those source systems, whether they're places where data is stored or whether they're BI applications that use that data to execute queries. A third important data source for us that may be a bit hidden in some organizations is all the human information that’s created, the metadata that’s often stored in Wiki pages, business glossaries, or other documents that describe the data that’s being stored in various locations.

We actually crawl all of those sources and provide an easy way for individuals to use that information on data within their daily interactions. Typically, our customers are analysts who are writing SQL queries. All of that context about how to use the data is surfaced to them automatically by Alation within their query-writing interface so that they can save anywhere from 20 percent to 50 percent of the time it takes them to write a new query during their day-to-day jobs.

Gardner: How is your solution architected? Do you take advantage of cloud when appropriate? Are you mostly on-premises, using your own data centers, some combination, and where might that head to in the future?

Agnostic system

McReynolds: We're a young company. We were founded about three years ago and we designed the system to be agnostic as to where you want to run Alation. We have customers who are running Alation in concert with Redshift in the public cloud. We have customers that are financial services organizations that have a lot of personally identifiable information (PII) data and privacy and security concerns, and they are typically running an on-premise Alation instance.

We architected the system to be able to operate in different environments and have an ability to catalog data that is both in the cloud and on-premise at the same time.

The way that we do that from an architectural perspective is that we don’t replicate or store data within Alation systems. We use metadata to point to the location of that data. For any analyst who's going to run a query from our recommendations, that query is getting pushed down to the source systems to run on-premise or on the cloud, wherever that data is stored.

Gardner: And how did HPE Vertica come to play in that architecture? Did it play a role in the ability to be agnostic as you describe it?
It gives the IT department insight. Day-to-day, Alation is typically more of a business person’s tool for interacting with data.

McReynolds: We use HP Vertica in one portion of our product that allows us to provide essentially BI on the BI that’s happening. Vertica is used as a fundamental component of our reporting capability called Alation Forensics that is used by IT teams to find out how queries are actually being run on data source systems, which backend database tables are being hit most often, and what that says about the organization and those physical systems.

It gives the IT department insight. Day-to-day, Alation is typically more of a business person’s tool for interacting with data.

Gardner: We've heard from HPE that they expect a lot more of that IT department specific ops efficiency role and use case to grow. Do you have any sense of what some of the benefits have been from your IT organization to get that sort of analysis? What's the ROI?

McReynolds: The benefits of an approach like Alation include getting insight into the behaviors of individuals in the organization. What we’ve seen at some of our larger customers is that they may have dedicated themselves to a data-governance program where they want to document every database and every table in their system, hundreds of millions of data elements.

Using the Alation system, they were able to identify within days the rank-order priority list of what they actually need to document, versus what they thought they had to document. The cost savings comes from taking a very data-driven realistic look at which projects are going to produce value to a majority of the business audience, and which projects maybe we could hold off on or spend our resources more wisely.

One team that we were working with found that about 80 percent of their tables hadn't been used by more than one person in the last two years. In that case, if only one or two people are using those systems, you don't really need to document those systems. That individual or those two individuals probably know what's there. Spend your time documenting the 10 percent of the system that everybody's using and that everyone is going to receive value from.

Where to go next

Gardner: Before we close out, any sense of where Alation could go next? Is there another use case or application for this combination of crowdsourcing and machine learning, tapping into all the disparate data that you can and information including the human and tribal knowledge? Where might you go next in terms of where this is applicable and useful?

McReynolds: If you look at what Alation is doing, it's very similar to what Google did for the Internet in terms of being available to catalog all of the webpages that were available to individuals and service them in meaningful ways. That's a huge vision for Alation, and we're just in the early part of that journey to be honest. We'll continue to move in that direction of being able to catalog data for an enterprise and make easily searchable, findable, and usable all of the information that is stored in that organization.

Gardner: Well, very good. I'm afraid we will have to leave it there. We've been examining how Alation maps across disparate data while employing machine learning and crowdsourcing to help centralize and identify data knowledge. And we've learned how Alation makes data actionable by keeping it up-to-date and accessible using innovative means.
Embed the HPE
Big Data
OEM Software
So a big thank you to our guest, Stephanie McReynolds, Vice-President of Marketing at Alation in Redwood City, California. Thank you so much, Stephanie.

McReynolds: Thank you. It was a pleasure to be here.

Gardner: And a big thank you as well to our audience for joining us for this big data innovation case study discussion.

I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a sponsored discussion on how Alation makes data actionable by keeping it up-to-date and accessible using innovative means. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: