Transcript
of a BriefingsDirect discussion on how to manage machine-to-machine
communication for better big data collection and analysis.
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.
Dana Gardner: Hello, and welcome to the next edition of the
HP Discover Podcast Series. I'm
Dana Gardner, Principal Analyst at
Interarbor Solutions,
your host and moderator for this ongoing sponsored discussion on IT
innovation and how it’s making an impact on people’s lives. Once again,
we're focusing on how companies are adapting to the new style of IT to
improve IT performance and deliver better user experiences, as well as
better business results.
Our next innovation case study interview highlights how
Axeda, based in Foxboro, Massachusetts, is creating a
machine-to-machine (M2M) capability for analysis -- in other words, an Axeda Machine Cloud.
We're going to learn how they partner with HP in doing that. We're joined in our discussion today by
Kevin Holbrook, the Senior Director of Advance Development at Axeda. Welcome, Kevin.
Kevin Holbrook: Thank you for having me.
Gardner: We have the whole
Internet of Things (IoT) phenomenon.
People are accepting more and more devices, end points, sensors, even
things within the human body, delivering data out to applications and
data pools. What do you do in terms of helping organizations start to
come to grip with this M2M and Internet of Things data demand?
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
Holbrook: It starts with the connectivity space. Our focus has largely been in
OEMs,
equipment manufacturers. These are people who have the "M" in the M2M
or the "T" in the Internet of Things. They are manufacturing things.
The
initial drivers to have a handle on those things are basic questions,
such as, "Is this device on?" There are multi-million dollar machines
that are currently deployed in the world where that question can’t be
answered without a phone call.
Initial driver
That
was the initial driver, the seed, if you will. We entered into that
space from the remote-service angle. We deployed small-agent software to
the edge to get the first measurements from those systems and get them
pushed up to the cloud, so that users can interact with it.
That grew into
remote access,
telnet sessions or
remote desktop being
able to physically get down there, debug, tweak, and look at the
devices that are operating. From there, we grew into software
distribution, or content distribution. That could be anything from
firmware updates
to physically distributing configuration and calibration files for the
instrument. We're recently seeing an uptake in content distribution for
things like digital signage or in-situ ads being displayed on consumer
goods.
From there, we started aggregating data. We have
about 1.5 million assets connected to our cloud now globally, and there
is all kinds of data coming in. Some of it's very, very basic from a
resource standpoint, looking at CPU consumption, disks space, available
memory, things of that nature.
It goes all the way
through to usage and diagnostics, so that you can get a very granular
impression how this machine is operating. As you begin to aggregate this
data, all sorts of challenges come out of it. HP has proven to be a
great partner for starting to extract value.
We can
certainly get to the data, we can connect the device, and we can
aggregate that data to our partners or to the customer directly. Getting
value from that data is a completely different proposition. Data for
data’s sake is not high value.
From
our perspective, Vertica represents an endpoint. We've carried the
data, cared for the data, and made sure that the device was online,
generating the right information and getting it into Vertica.
Gardner: What is it that you're using
Vertica for to do that? Are we creating applications, are we giving analysis as a service? How is this going to market for you?
Holbrook: From
our perspective, Vertica represents an endpoint. We've carried the
data, cared for the data, and made sure that the device was online,
generating the right information and getting it into Vertica.
When
we approach customers, were approaching it from a joint-sale
perspective. We're the connectivity layer, the instrumentation, the
business automation layer there, and we're getting it into Vertica ,so
that can be the seed for applications for
business intelligence (BI) and for
analytics.
So,
we are the lowest component in the stack when we walk into one of these
engagements with Vertica. Then, it's up to them, on a
customer-by-customer basis, to determine what applications to bring to
the table. A lot of that is defined by the group within the organization
that actually manages connectivity.
We find that
there's a big difference between a service organization, which is
focused primarily on keeping things up and running, versus a business
unit that’s driving utilization metrics, trying to determine not only
how things are used, but how it can influence their billing.
Business use
We've
found that that's a place where Vertica has actually been quite a pop
for us in talking to customers. They want to know not just the simple
metrics of the machines' operation, but how that reflects the business
use of it.
The entire market has shifted and continues
to shift. I was somewhat taken aback only a couple of weeks ago, when I
found out that you can no longer buy a jet engine. I thought this was a
piece of hardware you purchased, as opposed to something that you may
have rented and paid per use. And so [the model changes to leasing] as
the machines get bigger and bigger. We have GE and the Bureau of
Engraving and Printing as customers.
We certainly have
some very large machines connected to our cloud and we're finding that
these folks are shifting away from the notion that one owns a machine
and consumes it until it breaks or dies. Instead, one engages in an
ongoing service model, in which you're paying for the use of that
machine.
While we can generate that data and provide
some degree of visibility and insight into that data, it takes a massive
analytics platform to really get the granular patterns that would drive
business decisions.
Gardner: It sounds like
many of your customers have used this for some basic blocking and
tackling about inventory and access and control, then moved up to a
business metrics of how is it being used, how we're billing, audit
trails, and that sort of thing. Now, we're starting to look at a whole
new type of economy. It's a services economy, based on cloud
interactivity, where we can give granular insights, and they can manage
their business very, very tightly.
There's
not only a ton of data being generated, but the regulatory and
compliance requirements which dictate where you can even leave that data
at rest.
Any thoughts about what's going to be
required of your organization to maintain scale? The more use cases and
the more success, of course, the more demand for larger data and even
better analytics. How do you make sure that you don't run out of runway
on this?
Holbrook: There are a couple of
strategies we've taken, but before I dive into that, I'll say that the
issue is further complicated by the issue of data homing. There's not
only a ton of data being generated, but the regulatory and compliance
requirements which dictate where you can even leave that data at rest.
Just moving it around is one problem, and where it sits on a disk is a
totally different problem. So we're trying to tackle all of these.
The
first way to address the scale for us from an architectural perspective
was to try to distribute the connectivity. In order for you to know
that something's running, you need to hear from it. You might be able to
reach out, what we call
contactability, to say, "Tell me if you're still
running." But, by and large, you know of a machine's existence and its
operation by virtue of it telling you something. So even if a message is
nothing more than "Hello, I'm here," you need to hear from this device.
From
the connectivity standpoint, our goal is not to try to funnel all of
this into a single pipe, but rather to find where to get a point of
presence that is closest and that is reasonable. We’ve been doing this
on our remote-access technology for years, trying to find the
appropriate geographically distributed location to route data through,
to provide as easy and seamless an experience as possible.
So
that’s the first, as opposed to just ruthlessly federating all incoming
data, distributing the connectivity infrastructure, as well as trying
to get that data routed to its end consumer as quickly as possible.
We
break down data from our perspective into three basic temporal
categories. There's the current data, which is the value you would see
reading a dial on the machine. There's recent data, which would tell you
whether something is trending in a negative direction, say pressure
going up. Then, there is the longer-term historical data. While we focus
on the first two, we’d deliberately, to handle the scale problem, don't
focus on the long-term historical data.
Recent data
I'll
treat recent data as being anywhere from 7 to 120 days and beyond,
depending on the data aggregation rates. We focus primarily on that.
When you start to scale beyond that, where the real long tail of this
is, we try to make sure that we have our partner in place to receive the
data.
We don't want to be diving into two years of
data to determine seasonal trending when we're attempting to collect
data from 1.5 million assets and acting as quickly as possible to
respond to error conditions at the edge.
Gardner: Kevin, what about the issue of
latency?
I imagine some of your customers have a very dire need to get analysis
very rapidly on an ongoing streamed basis. Others might be more willing
to wait and do it in a batch approach in terms of their analytics. How
do you manage that, and what are some of the speeds and feeds about the
best latency outcomes?
Holbrook: That’s a
fantastic question. Everybody comes in and says we need a zero-latency
solution. Of course, it took them about two-and-a-half seconds to say
that.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
There's no such thing as real-time, certainly on the Internet. Just negotiating up the
TCP stack
and tearing it down to send one byte is going to take you time. Then,
we send it over wires under the ocean, bounce it off a satellite, you
name it. That's going to take time.
There are two
components to it. One is accepting that near-real-time, which is
effectively the transport latency, is the smallest amount of time it can
take to physically go from point A to point B, absent having a
dedicated fiber line from one location to the other. We can assume that
on the Internet that's domestically somewhere in the one- to two-second
range. Internationally, it's in the two- to three-second or beyond
range, depending on the connectivity of the destination.
What
we provide is an ability to produce real-time streams of data outbound.
You could take from one asset, break up the information it generates,
and stream it to multiple consumers in near-real-time in order to get
the dashboard in the control center to properly reflect the state of the
business. Or you can push it to a data warehouse in the back end, where
it then can be chunked and
ETLd into some other analytics tool.
For
us, we try not to do the batch ETLing. We'd rather make sure that we
handle what we're good at. We're fantastic at remote service, at
automating responses, at connectivity and at expanding what we do. But
we're never going to be a massive ETL, transforming and converting into
somebody’s data model or trying to get deep analytics as a result of
that.
Gardner: Was it part of this need for
latency, familiarity, and agility that led into Vertica? What were some
of the decisions that led to picking Vertica as a partner?
Several reasons
Holbrook: There
were a few reasons. That was one of them. Also the fact that there's a
massive set of offerings already on top of it. A lot of the other people
when we considered this -- and I won't mention competitors that we
looked at -- were more just a piece of the stack, as opposed to a place
where solutions grew out of.
It wasn't just Vertica,
but the ecosystem built on top of Vertica. Some of the vendors we looked
at are currently in the partner zone, because they're now building
their solutions on top of Vertica.
We looked at it as
an entry point into an ecosystem and certainly the in-memory component,
the fact that you're getting no disk reads for massive datasets was very
attractive for us. We don’t want to go through that process. We've
dealt with the struggles internally of trying to have a relational data
model scale. That’s something that Vertica has absolutely solved.
Gardner: Now
your platform includes application services, integration framework, and
data management. Let’s hone in on the application services. How are
developers interested in getting access to this? What are their demands
in terms of being able to use analysis outcomes, outputs, and then bring
that into an application environment that they need to fulfill their
requirements to their users?
It
wasn't just Vertica, but the ecosystem built on top of Vertica. Some of
the vendors we looked at are currently in the partner zone, because
they're now building their solutions on top of Vertica.
Holbrook: It
breaks them down into two basic categories. The first is the
aggregation and the collection of data, and the second is physical
interaction with the device. So we focus on both about equally. When we
look at what developers are doing, almost always it’s transforming the
data coming in and reaching out to things like a
customer relationship management (CRM) system.
It's opening a ticket when a device has thrown a certain error code or
integrating with a backend drop-ship distribution system in the event
that some consumable has begun to run low.
In terms of
interaction, it's been significant. On the data side, we primarily see
that they're extracting subsets of data for deeper analysis. Sometimes,
this comes up in discrete data points. Frequently, this comes up in the
transfer of files. So there is a certain granularity that you can
survive. Coming down the fire-hose is discrete data points that you can
react to, and there's a whole other order of magnitude of data that you
can handle when it's shipped up in a bulk chunk.
A good
example is one of the use cases we have with GE in their oil and gas
division where they have a certain flow of data that's always ongoing
and giving
key performance indicators (KPIs).
But this is nowhere near the level of data that they're actually
collecting. They have database servers that are co-resident with these
massive gas pipeline generators.
So we provide them the
vehicle for that granular data. Then, when a problem is detected
automatically, they can say, "Give me far more granular data for the
problem area." it could be five minutes before or five minutes since.
This is then uploaded, and we hand off to somewhere else.
So
when we find developers doing integration around the data in
particular, it's usually when they're diving in more deeply based on
some sort of threshold or trigger that has been encountered in the
field.
Gardner: And lastly, Kevin, for other organizations that are looking to create data services and something like your
Axeda Machine Cloud,
are there any lessons learned that you could share when it comes to
managing such complexity, scale, and the need for speed? What have you
learned at a high level that you could share?
All about strategy
Holbrook: It’s
all going to be about the data-collection strategy. You're going to
walk into a customer or potential customer, and their default response
is going to be, "Collect everything." That’s not inherently valuable.
Just because you've collected it, doesn’t mean that you are going to get
value from it. We find that, oftentimes, 90-95 percent of the data
collected in the initial deployment is not used in any constructive way.
I
would say focus on the data collection strategy. Scale of bad data is
scale for scale’s sake. It doesn’t drive business value. Make sure that
the folks who are actually going to be doing the analytics are in the
room when you are doing your data collection strategy definition. when
you're talking to the folks who are going to wire up sensors, and when
you're talking to the folks who are building the device.
Unfortunately,
these are frequently within a larger business ,in particular,
completely different groups of people that might report to completely
different vice presidents. So you go to one group, and they have the
connectivity guys. You talk about it and you wire everything up.
We find that, oftentimes, 90-95 percent of the data collected in the initial deployment is not used in any constructive way.
Then,
six to eight months later, you walk into another room. They’ll say
"What the heck is this? I can’t do anything with this. All I ever needed
to know was the following metric." It wasn’t collected because the two
hadn't stayed in touch. The success of deployed solutions and the
reaction to scale challenges is going to be driven directly by that
data-collection strategy. Invest the time upfront and then you'll have a
much better experience in the back.
Gardner: Very
good. I'm afraid we’ll have to leave it there. We've been learning how
Axeda in Foxboro Massachusetts is providing services to its customers to
the Axeda Machine Cloud and as a partner with HP as using their Vertica
Analysis Platform to provide those insights.
Become a member of myVertica
Register now
Gain access to the HP Vertica Community Edition
So I'd
like to thank our guest. We've been joined by Kevin Holbrook, the Senior
Director of Advanced Development at Axeda. Thank you, Kevin.
Holbrook: Thank you.
Gardner: And
a big thank you to our audience for joining us for the special new
style of IT discussion. I'm Dana Gardner; Principal Analyst at
Interarbor Solutions, your host for this ongoing series of HP sponsored
discussions. Thanks again for listening, and come back next time.
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: HP.
Transcript
of a BriefingsDirect discussion on how to manage machine-to-machine
communication for better big data collection and analysis. Copyright
Interarbor Solutions, LLC, 2005-2015. All rights reserved.
You may also be interested in: