Thursday, November 12, 2015

Powerful Reporting From YP's Enterprise Data Warehouse Helps SMBs Conjure New Business

Transcript of a BriefingsDirect discussion on how Yellow Pages helps small businesses attract, reach out to, and retain customers using big data.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Enterprise (HPE) Discover Podcast series. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big-data innovation case study highlights how Yellow Pages (YP) has experimented with and built out a full enterprise data warehouse with powerful reporting capabilities.

We’ll learn how YP pulls massive data and information from across new and legacy resources to report precise metrics to its advertisers, making them more aware about their campaigns -- and how small businesses are fairing.

To learn more, welcome Bill Theisinger, Vice President of Engineering for Platform Data Services at YP in Glendale, California. Welcome, Bill.

Bill Theisinger: Thank you, Dana.

Gardner: Tell us about YP, the digital arm of what people would have known as Yellow Pages a number of years ago. You're all about helping small businesses become better acquainted with their customers, and vice versa.
Hewlett Packard Enterprise
Vertica Community Edition

 Start Your Free Trial Now
Theisinger: YP is a leading local marketing solutions provider in the U.S., dedicated to helping local businesses and communities grow. We help connect local businesses with consumers wherever they are and whatever device they are on, desktop and mobile.

Theisinger
Gardner: As we know, the world has changed dramatically around marketing and advertising and connecting buyers and sellers. So in the digital age, being precise, being aware, being visible is everything, and that means data. Tell us about your data requirements in this new world.

Theisinger: We need to be able to capture how consumers interact with our customers, and that includes where they interact -- whether it’s a mobile device or web device -- and also within our network of partners. We reach about 100 million consumers across the U.S and we do that through both our YP network and our partner network.

Gardner: Tell us too about the evolution. Obviously, you don’t build out data capabilities and infrastructure overnight. Some things are in place, and you move on, you learn, adapt, and you have new requirements. Tell us your data warehouse journey.

Needed to evolve

Theisinger: Yellow Pages saw the shift of their print business moving heavily online and becoming heavily digital. We needed to evolve with that, of course. In doing so, we needed to build infrastructure around the systems that we were using to support the businesses we were helping to grow.

And in doing that, we started to take a look at what the systems requirements were for us to be able to report and message value to our advertisers. That included understanding where consumers were looking, what we were impressing to them, what businesses we were showing them when they searched, what they were clicking on, and, ultimately what businesses they called. We track all of those different metrics.

When we started this adventure, we didn't have the technology and the capabilities to be able to do those things. So we had to reinvent our infrastructure. That’s what we did

Gardner: And as we know, getting more information to your advertisers to help them in their selection and spending expertise is key. It differentiates companies. So this is a core proposition for you. This is at the heart of your business.

Given the mission criticality, what are the requirements? What did you need to do to get that reporting, that warehouse capability?

Theisinger: We need to be able to scale to the size of our network and the size of our partner network, which means no click left behind, if you will, no impression untold, no search unrecognized. That's billions of events we process every day. We needed to look at something that would help us scale. If we added a new partner, if we expanded the YP network, if we added hundreds, thousands, tens of thousands of new advertisers, we needed the infrastructure to able to help us do that.
We need to be able to scale to the size of our network and the size of our partner network, which means no click left behind, if you will, no impression untold, no search unrecognized.

Gardner: I understand that you've been using Hadoop. You might be looking at other technologies as they emerge. Tell us about your Hadoop experience and how that relates to your reporting capabilities.

Theisinger: When I joined YP, Hadoop was a heavy buzz product in the industry. It was a proven product for helping businesses process large amounts of unstructured data. However, it still poses a problem. That unstructured data needs to be structured at some point, and it’s that structure that you report to advertisers and report internally.

That's how we decided that we needed to marry two different technologies -- one that will allow us to scale a large unstructured processing environment like Hadoop and one that will allow us to scale a large structured environment like Hewlett Packard Enterprise (HPE) Vertica.

Business impact

Gardner: How has this impacted your business, now that you've been able to do this and it's been in the works for quite a while? Any metrics of success or anecdotes that can relate back to how the people in your organization are consuming those metrics and then extending that as service and product back into your market? What has been the result?

Theisinger: We have roughly 10,000 jobs that we run every day, both to process data and also for analytics. That data represents about five to six petabytes of data that we've been able to capture about consumers, their behaviors, and activities. So we process that data within our Hadoop environment. We then pass that along into HPE Vertica, structure it in a way that we can have analysts, product owners, and other systems retrieve it, pull and look at those metrics, and be able to report on them to the advertisers.
Hewlett Packard Enterprise
Vertica Community Edition

 Start Your Free Trial Now
Gardner: Is there an automation to this as you look to present a more and better analytics on top of the Vertica? What are you doing to make that customizable to people based on their needs, but at the same time, controlled and managed so that it doesn't become unwieldy?

Theisinger: There is a lot of interaction between customers, both internal and external, when we decide how and what we’re going to present in terms of data, and there are a lot of ways we do that. We present data externally through an advertiser portal. So we want to make sure we work very closely with human factors and ergonomics (HFE) and the use experience (UX) designers as well as our advertisers, through focus groups, workshops, and understanding what they want to understand about the data that we present them.

Then, internally, we decide what would make sense and how we feel comfortable being able to present it to them, because we have a universe of a lot more data than what we probably want to show people.

We also do the same thing internally. We've been able to provide various teams internally whether its sales, marketing, or finance, insights into who's clicking on various business listings, who's viewing various businesses, who’s calling businesses, what their segmentation is, and what their demographics look like and it allows us a lot of analytical insight. We do most of that work through the analytics platforms, which is, in this case, HPE Vertica.
Small businesses need to be able to just pick up their mobile device and look at the effectiveness of their campaigns with YP.

Gardner: Now, that user experience is becoming more and more important. It wasn't that long ago when these reports were going to people who were data scientists or equivalent, but now we're taking the amount to those 600,000 small businesses. Can you tell us a little bit about lessons learned when it comes to delivering an end analytics product, versus building out the warehouse? They seem to be interdependent but we're seeing more and more emphasis on that user experience these days.

Theisinger: You need to bridge the gap between analytics and just data storage and processing. So you have to present them in-state. This is what happens. It’s very descriptive of what's going on, and we try to be a little bit more predictive when it comes to the way we want to do analysis at YP. We're looking to go beyond just descriptive analytics.

What has also changed is the platform by which you present the data. It's going highly mobile. Small businesses need to be able to just pick up their mobile device and look at the effectiveness of their campaigns with YP. They're able to do that through a mobile platform we’ve built called YP for Merchants.

They can log in and see their metrics that are core to their business and how those campaigns are performing. They can even see some details, like if they missed a phone call and they want to be able to reach back out to a consumer and see if they need to help, solve a problem, or provide a service.

Developer perspective

Gardner: And given that your developers had to go through the steps of creating that great user experience and taking it to the mobile tier, was there anything about HPE Vertica, your warehouse, or your approach to analytics that made that development process easier? Is there an approach to delivering this from a developer perspective that you think others might learn from?

Theisinger: There is, and it takes a lot more people than just the analytics team in my group or the engineers in my team. It’s a lot of other teams within YP that build this. But first and foremost, people want to see the data as real time and as near real time as they can.

When a small business relies on contact from customers, we track those calls. When a potential customer calls a small business and that small business isn’t able to actually get to the call or respond to that customer because maybe they are on a job, it's important to know that that call happened recently. It's important for that small business to reach back out to the consumer, because that consumer could go somewhere else and get that service from a competitor.

To be able to do that as quickly as possible is a hard-and-fast requirement. So processing the data as quickly as you can and presenting that, whether it be on a mobile device, in this case, as quickly as you can is definitely paramount to making that a success.
Having the right infrastructure puts you in the position to be able to do that. That’s where businesses are going to end up growing, whether it's ours or small businesses.

Gardner: I've spoken to a number of people over the years and one of the takeaways I get is that infrastructure is destiny. It really seems to be the case in your business that having that core infrastructure decision process done correctly has now given you the opportunity to scale up, be innovative, and react to the market. I think it’s also telling that, in this data-driven decade that we’ve been in for a few years now, the whole small business sector of the economy is a huge part of our overall productivity and growth as an economy.

Any thoughts, generally about making infrastructure decisions for the long run, decisions you won't regret, decisions that that can scale over time and are future proof?

Theisinger: Yeah, for speaking about what I've seen through the job that we’ve had it here at YP, we reach over half a million paying advertisers. The shift is happening between just telling the advertisers what's happened to helping them actually drive new business.

So it's around the fact that I know who my customers are now, how do I find more of them, or how do I reach out to them, how do I market to them? That's where the real shift is. You have to have a really strong scalable and extensible platform to be able to answer that question. Having the right infrastructure puts you in the position to be able to do that. That’s where businesses are going to end up growing, whether it's ours or small businesses.

And our success is hinged to whether or not we can get these small businesses to grow. So we are definitely 100 percent focused on trying to make that happen.

Gardner: It’s also telling that you’ve been able to adjust so rapidly. Obviously, your business has been around for a long time. People are very familiar with the Yellow Pages, the actual physical product, but you've gone to make software so core to your value and your differentiation. I'm impressed and I commend you on being able to make that transitions fairly rapidly.

Core talent

Theisinger: Yeah, well thank you. We’ve invested a lot in the people within the technology team we have there in Glendale. We've built our own internal search capabilities, our own internal products. We’ve pulled a lot of good core talent from other companies.

I used to work at Yahoo with other folks, and YP is definitely focused on trying to make this transition a successful one, but we have our eye on our heritage. Over a hundred years of being very successful in the print business is not something you want to turn your back on. You want to be able to embrace that, and we’ve learned a lot from it, too.

So we're right there with small businesses. We have a very large sales force, which is also very powerful and helpful in making this transition a success. We've leaned on all of that and we become one big kind of happy family, if you will. We all worked very closely together to make this transition successful.
Hewlett Packard Enterprise
Vertica Community Edition

 Start Your Free Trial Now
Gardner: I am afraid we will have to leave it there. We've been learning about how Yellow Pages, or YP, has experimented with and built out a full enterprise data warehouse capability, built with also powerful near real-time reporting capabilities. We've heard why pulling massive data and information from across new and legacy sources is essential to be able to report precise metrics to YP’s advertisers, and how that's differentiating the company in the new world of online marketing and advertising.

So join me in extending a big thank you to Bill Theisinger, Vice President of Engineering for Platform Data Services at YP. Thank you.

Theisinger: Thank you, Dana. I appreciate the time.

Gardner: And a big thank you also to our audience for joining us for this Big Data innovation case study discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored discussions. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a BriefingsDirect discussion on how Yellow Pages help small businesses attract, reach out to, and retain customers using big data. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Monday, November 09, 2015

Internet of Things Brings On Development Demands That DevOps Manages, Say Experts

Transcript of a BriefingsDirect discussion on how continuous processes around development and deployment of applications impact and benefit the Internet of Things trend.

Listen to the podcast. Find it on iTunesGet the mobile app. Download the transcript. Watch for Free: DevOps, Catalyst of the Agile Enterprise. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing discussion on IT innovation and how it’s making an impact on people’s lives.
Gardner

Our next DevOps thought leadership discussion explores how continuous processes around the development and deployment of applications are both impacted by -- and a benefit to -- the Internet of Things (IoT) trend. [Watch for Free: DevOps, Catalyst of the Agile Enterprise.]

To help better understand the relationship between DevOps and a plethora of new end-devices and data please welcome Gary Gruver, consultant, author and a former IT executive who has led many large-scale IT transformation projects. Welcome, Gary.

Gary Gruver: Thank you. It’s nice to be here.

Gardner: We're also here with John Jeremiah, Technology Evangelist at Hewlett Packard Enterprise (HPE). He's on Twitter at @j_jeremiah. Welcome, John.
Learn how DevOps solutions unify development and operations
To accelerate business innovation
John Jeremiah: Hi, Dana. Thanks.

Gardner: Let’s talk about how the DevOps trend extends not to just traditional enterprise IT and software applications, but to a much larger set of applications -- those in the embedded space, mobile, and end-devices of all sorts. Gary, why is DevOps even more important when you have so many different moving parts as we expect in IoT?

Gruver: In software development, everybody needs to be more productive. Software is no longer just on websites and in IT departments. It’s going on everywhere in the industry. It’s gone to every product in every place, and being able to differentiate your product with software is becoming more and more important to everybody.

Gardner: John, from your perspective, is there a sense that DevOps is more impactful, more powerful when we apply it to IoT?

Jeremiah
Jeremiah: The reality is it that IoT is moving as fast as mobile is -- and even faster. If you don’t have the ability to change your software to evolve -- to iterate as there is new business innovation -- you're not going to be able to keep up to be competitive. So IoT is going to require a DevOps approach in order to be successful.

Gardner: In the past, we've had a separate development organization and approach to embedded devices. Do we need to still to do that, or can we combine traditional enterprise software with DevOps and apply the same systems architecture and technologies to all sorts of development?

Software principles

Gruver: The principles of being able to keep your code base more "releasable," to work under a prioritized backlog, to work through the process of adding automated testing, and frequent feedback to the developers so that they get better at it -- this all applies.

Gruver
Therefore, for embedded systems you are going to need to develop simulators and emulators for automated testing. A simulator is a representation of the final product that can be run on a server. As much as possible, you want to be able to create a simulator that represents the software characteristics of the final product. You can then use this and trust it to find defects, because the amount of automated testing you are going to need to be running to transform your businesses is huge. If you don’t have an affordable place like a server farm to run that, it just doesn’t work. [Watch for Free: DevOps, Catalyst of the Agile Enterprise.]

If you have custom ASICs in the product, you're also going to need to create an emulator to test the low-level firmware interacting with the ASIC. This is similar to the simulator, but also includes the custom ASIC and electronics from the final product. I see way too many organizations that are embedded and are trying to transform their process giving up on using simulators and emulators because they're not finding the defects that they want to. Yet they haven’t invested in making them robust so they can be effective.

One of first things I talk about to people that have embedded systems is that you’re not going to be successful transforming your business until you create simulators and emulators that you can trust as a test environment to find defects.

Gardner: How about working as developers and testers with more of an operations mentality?

Gruver: At HPE and HP, we were running 15,000 hours of testing on the code base every day. When it was manual, we couldn’t do that and we really couldn’t transform our business until we fundamentally put that level of automated testing in place.

For laser printer testing, there's no way we would have been able to have enough paper to run that many hours of testing, and we would have worn out printers. There weren’t enough trees in Idaho to make enough paper to do that testing on the final product. Therefore, we needed to create a test farm of simulators and emulators to drive testing upstream as much as possible to get rapid feedback to our developers.

Gardner: Tell us how DevOps helped in the firmware project for HP printers, and how that illustrates where DevOps and embedded development come together?

No new features

Gruver: I had an opportunity to take over leading the LaserJet FW for our organization several years ago. It had been the bottleneck for the organization for two decades. We couldn’t add a new product or plans without checking the firmware, and we had given up asking for new features.

Then, when 2008 hit, and we were forced to cut our spending, as a of lot of people out in the industry at that time. We could no longer invest to spend our way out of problems. So we had to engineer our solution.
Discover how to use big data platforms
To unlock value of Internet of Things
We were fundamentally looking for anything that we could do to improve productivity. We went on a journey of what I would call applying Agile and DevOps principles at scale, as opposed to trying to scale small teams in the organization. We went through this process of continually trying to improve with a group of 400-800 engineers and working through that process. At the end of three years, firmware was no longer the bottleneck.

We had gone from five percent of our capacity going to innovation to 40 percent and we were supporting 1.5 times more products. So we took something that was a bottleneck for the business, completely unleashed that capability, and fundamentally transformed the business.
IoT is going to move so fast that nobody knows exactly what they need and what the capabilities are.
The details are captured in my first book, A Practical Approach to Large-Scale Agile Development. It’s available at all your finest bookstores. [Also see Gary's newest book, Leading the Transformation: Applying Agile and DevOps Principles at Scale.]

Gardner: And how does this provide a harbinger of things to come? What you’ve done with firmware at HP and Laser Printers several years ago, how does that paint a picture of how DevOps can be powerful and beneficial in the larger IoT environment?

Gruver: Well, IoT is going to move so fast that nobody knows exactly what they need and what the capabilities are. It's the ability to move fast. At HP and HPE, we went 2-3 times faster than we ever thought possible. What you're seeing in DevOps is that the unicorns of the world are showing that software development can go much faster than anybody ever thought was possible before.

That’s going to be much more important as you're trying to understand how this market evolves, what capabilities customers want, and where they want them in IoT. The companies that can move fast and respond to the feedback from the customers are going to be the ones that win. [Watch for Free: DevOps, Catalyst of the Agile Enterprise.]

Gardner: John, we've seen sort of a dip in the complexity around mobile devices in particular when people consolidated around iOS and Android after having hit many targets, at least for a software platform, in the past. That may have given people a sense of solace or complacency that they can develop mobile applications rapidly.

But we are now getting, to Gary's point, to a place where we don't really know what sort of endpoints we're going to be dealing with. We're looking at automated cars, houses, drones, appliances, and even sensors within our own bodies.

What are some of the core principles we need to keep in mind to allow for the rapid and continuous development processes for IoT to improve, but without stumbling again as we hit complexity when it comes to new targets?

New technologies

Jeremiah: One of the first things that you're going to have to do is embrace service virtualization and strategies in order to quickly virtualize new technologies and to be able to quickly simulate those technologies when they come to life. We don't know exactly what they're going to be, but we have to be able to embrace that and to bring that into our process and methodology.

And as Gary was talking about earlier, the strategies of going fast that apply in firmware, apply in the enterprise as well about building automated testing, failing as fast as you can, and learning as you go. As we see complexity increase, the real key is going to be able to harness that, and use virtualization as strategy to move that forward.

Gardner: Any other metrics of success? How do we know we're succeeding with DevOps? We talked about speed. We talked about testing early and often. How do you know you're doing this well? For organizations that want to have a good way to demonstrate success, how do they present that?

Gruver: I wouldn't just start off by trying to do DevOps. If you're going to transform your software development processes, the only reason you would go through that much turmoil is because your current development processes aren't meeting the needs of your business. Start off with how your current development processes aren't meeting your business needs.

The executives are in a best position to clarify exactly this gap and get the organization going down a continuous improvement process to improve the development and delivery processes.
Most organizations will quickly find that DevOps has some key tools in the toolbox that they want to start using immediately to start take some inefficiencies out of the development process.
Most organizations will quickly find that DevOps has some key tools in the toolbox that they want to start using immediately to start take some inefficiencies out of the development process.

But don't go off to do DevOps and measure how well you did it. We're all business executives. We run businesses, we manage businesses, and we need to focus on what the business is trying to achieve and just use the tools that will best help that.

Gardner: Where do we go next? DevOps has become a fairly popular concept now. It's getting a lot of attention. People understand that it can have a very positive impact, but getting it in place isn't always easy. There are a lot of different spinning variables -- culture, organization, management. In an enterprise that's looking to expand in the internet of things, perhaps they're not doing that level of development and deployment.

They probably have been a bit more focused on enterprise applications, rather than devices and embedded. How do you start up that capability and do it well within a software development organization? Let's look at moving from traditional development to the IoT development. What should we be keeping in mind?

Gruver: There are two approaches. One is, if you have loosely coupled architectures like most unicorns do, then you can empower the teams, add some operational members, and let figure it out. Most large enterprise organizations have more tightly coupled architectures that require large numbers of people working together to develop and deliver things together. I don't think those transformations are going to be effective until you find inspired executives who are willing to lead the transformation and work through the process.

Successful transformations

I've led a couple of successful transformations. If you look at examples from the DevOps Enterprise Summit that Gene Kim led, the common thing that you saw in most of those is that the organizations that were making progress had an executive that was leading the charge, rallying the troops, and making that happen. It requires coordinating work across a large number of teams, and you need somebody who can look across the value chain and muster the resources to make the technical and the cultural changes. [Read a recent interview with Kim on DevOps and security.]

Where a lot of my passion lies now, and the reason I wrote my second book is, that I don't think there are a lot of resources for the executives to learn how to transform large organizations. So I tried to capture everything that I knew about how best to do that.

My second book, Leading the Transformation: Applying Agile and DevOps Principles at Scale, is a resource that enables people to go faster in the organization. I think that’s the next key launch point -- getting the executives engaged to lead that change. That’s going to be the key to getting the adoption going much better. [Watch for Free: DevOps, Catalyst of the Agile Enterprise.]

Gardner: John, what about skills? It’s one thing to get the top-down buy-in, and it’s one thing to recognize the need for transformation and put in some of the organizational building blocks. But ultimately you need to be have the right people with the right skills.

Any thoughts about how IoT will create demand for a certain set of skills and how well we're in a position to train and find those people?

Jeremiah: IoT requires people to embrace skills and understand much broader than their narrow silo. They'll need to develop an expertise in what they do, but they have to have the relationships. They have to have the ability to work across the organization to learn. One of the skills is constantly learning as they go. As Gary mentioned earlier, it’s not a "done" for DevOps. It’s a journey of learning. It’s a journey of growing and getting better.

Then, as they apply their skills, they're focusing on how they deliver business value. That’s really the change.
Skills such as understanding process and understanding how things are working so you can continuously improve them is a skill that a lot of times people don’t bring to the table. They know their piece, but they don’t often think about the bigger picture. So it’s a set of skills. It’s beyond a single technology. It's understanding that that they are really not in IT -- they're really a part of the business. I love the way Gary said that earlier, and I agree with him. Seeing themselves as part of the business is a different mindset that they have to have as they go to work.

Then, as they apply their skills, they're focusing on how they deliver business value. That’s really the change.

Gardner: How do you do DevOps effectively when you're outsourcing a good part of your development? You may need to do that to find the skills.

For embedded systems, for example, you might look to an outside shop that has special experience in that particular area, but you may still want to get DevOps. How does that work?

Gruver: I think DevOps is key to making outsourcing work, especially if you have different vendors that you're outsourcing to because it forces coordination of the work on a frequent basis. Continuous integration, automated testing, and continuous deployment are the forcing functions that align the organization with working code across the system.

When you're enabling people to go off and work on separate branches and separate issues and you have an integration cycle late in the process, that’s where you get the dysfunction -- with a bunch of different organizations coming together with stuff that doesn’t work. If you force that to happen on a daily, or multiple times a day, basis, you get that system aligned and working well before people spend time and energy working on something that either don’t work together or won’t work well in production. [Watch for Free: DevOps, Catalyst of the Agile Enterprise.]

Gardner: We have been exploring how continuous processes around development and deployment of applications impact and benefit the Internet of Things trend. I'd like to thank our guests, Gary Gruver, consultant, author and a former IT executive who has led many large-scale IT transformation projects, and John Jeremiah, Technology Evangelist at Hewlett Packard Enterprise on Twitter at @j_jeremiah.
Learn how DevOps solutions unify development and operations
To accelerate business innovation
And I'd also like to extend a big thank you to our audience for joining us for this DevOps and Internet of Things innovation discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HPE-sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunesGet the mobile app. Download the transcript. Watch for Free: DevOps, Catalyst of the Agile Enterprise. Sponsor: Hewlett Packard Enterprise.

Transcript of a BriefingsDirect discussion on how continuous processes around development and deployment of applications impact and benefit the Internet of Things trend. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in:

Tuesday, November 03, 2015

Big Data Generates New Insights into What’s Happening in the World's Tropical Ecosystems

Transcript of a discussion on how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and analysis.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Dana Gardner: Hello, and welcome to the next edition of the HP Discover Podcast Series. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator for this ongoing sponsored discussion on IT innovation and how it’s making an impact on people’s lives.

Gardner
Our next big-data case study discussion explores how large-scale monitoring of rainforest biodiversity and climate has been enabled and accelerated by cutting-edge big-data capture, retrieval, and analysis.

We'll learn how quantitative analysis and modeling are generating new insights into what’s happening in tropical ecosystems worldwide, and we'll hear how such insights are leading to better ways to attain and verify sustainable development and preservation methods and techniques.

To learn more about data science -- and how hosting that data science in the cloud -- helps the study of biodiversity, we're pleased to welcome our guests, Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International in Arlington, Virginia. Welcome, Eric.

Eric Fegraus: Hi, Dana. It’s great to be here. Thank you.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Gardner: We're glad to have you. We're also here with Jorge Ahumada, Executive Director of the TEAM Network, also at Conservation International. Welcome, Jorge.

Jorge Ahumada: Great to be here.

Gardner: Let’s start with the trends. Clearly, knowing what’s going on in environments in the tropics helps us understand what to do and what not to do. How has that changed? We spoke about a year ago, Eric. Are there any trends or driving influences that have made this data gathering more important than ever.

Fegraus: Over this last year, we’ve been able to roll out our analytic systems across the TEAM Network. We're having more-and-more uptake with our protected-area managers using the system and we have some good examples where the results are being used.

Fegraus
For example, in Uganda, we noticed that a particular cat species was trending downward. The folks there were really curious why this was happening. At first, they were excited that there was this cat species, which was previously not known to be there.

This particular forest is a gorilla reserve, and one of the main economic drivers around the reserve is ecotourism, people paying to go see the gorillas. Once they saw that these cats are going down, they started asking what could be impacting this. Our system told them that the way they were bringing in the eco-tourists to see the gorillas had shifted and that was potentially having an impact of where the cats were. It allowed them to readjust and think about their practices to bring in the tourists to the gorillas.

Information at work

Gardner: Information at work.

Fegraus: Information at work at the protected-area level.

Gardner: Just to be clear for our audience, the TEAM Network stands for the Tropical Ecology Assessment and Monitoring. Jorge, tell us a little bit about how that came about, the TEAM Network and what it encompasses worldwide?

Ahumada: The TEAM Network was a program that started about 12 years ago and it was started to fill a void in the information we have from tropical forests. Tropical forests cover a little bit less than 10 percent of the terrestrial area in the world, but they have more than 50 percent of the biodiversity.

Ahumda
So they're the critical places to be conserved from that point of view, despite the fact we didn’t have any information about what's happening in these places. That’s how the TEAM Network was born, and the model was to use data collection methods that were standardized, that were replicated across a number of sites, and have systems that would store and analyze that data and make it useful. That was the main motivation.

Gardner: Of course, it’s super-important to be able to collect and retrieve and put that data into a place where it can be analyzed. It’s also, of course, important then to be able to share that analysis. Eric, tell us what's been happening lately that has led to the ability for all of those parts of a data lifecycle to really come to fruition?

Fegraus: Earlier this year, we completed our end-to-end system. We're able to take the data from the field, from the camera traps, from the climate stations, and bring it into our central repository. We then push the data into Vertica, which is used for the analytics. Then, we developed a really nice front-end dashboard that shows the results of species populations in all the protected areas where we work.

The analytical process also starts to identify what could be impacting the trends that we're seeing at a per-species level. This dashboard also lets the user look at the data in a lot of different ways. They can aggregate it and they can slice and dice it in different ways to look at different trends.

Gardner: Jorge, what sort of technologies are they using for that slicing and dicing? Are you seeing certain tools like Distributed R or visualization software and business-intelligence (BI) packages? What's the common thread or is it varied greatly?

Ahumada: It depends on the analysis, but we're really at the forefront of analytics in terms of big data. As Michael Stonebraker and other big data thinkers have said, the big-data analytics infrastructure has concentrated on the storage of big data, but not so much on the analytics. We break that mold because we're doing very, very sophisticated Bayesian analytics with this data.

One of the problems of working with camera-trap data is that you have to separate the detection process from the actual trend that you're seeing because you do have a detection process that has error.

Hierarchical models

We do that with hierarchical models, and it's a fairly complicated model. Just using that kind of model, a normal computer will take days and months. With the power of Vertica and power of processing, we’ve been able to shrink that to a few hours. We can run 500 or 600 species from 13 sites, all over the world in five hours. So it’s a really good way to use the power of processing.

We’d been also more recently working with Distributed R, a new package that was written by HP folks at Vertica, to analyze satellite images, because we're also interested in what’s happening at these sites in terms of forest loss. Satellite images are really complicated, because you have millions of pixels and you don’t really know what each pixel is. Is it forest, agricultural land, or a house? So running that on normal R, it's kind of a problem.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
Distributed R is a package that actually takes some of those functions, like random forest and regression trees, and takes full power of the vertical processing of Vertica. So we’ve seen a 10-fold increase in performance with that, and it allows us to get much more information out of those images.

Gardner: Not only are you on the cutting-edge for the analytics, you've also moved to the bleeding edge on infrastructure and distribution mechanisms. Eric, tell us a little bit about your use of cloud and hybrid cloud?

Fegraus: To back up a little bit, we ended up building a system that uses Vertica. It’s an on-premise solution and that's what we're using in the TEAM Network. We've since realized that this solution we built for the TEAM Network can also be readily scalable to other organizations and government agencies, etc., different people that want to manage camera trap data, they want to do the analytics.

So now, we're at a process where we’ve been essentially doing software development and producing software that’s scalable. If an organization wants to replicate what we’re doing, we have a solution that we can spin up in the cloud that has all of the data management, the analytics, the data transformations and processing, the collection, and all the data quality controls, all built into a software instance that could be spun up in the cloud.
In many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data.

Gardner: And when you say “in the cloud,” are you talking about a specific public cloud, in a specific country or all the above, some of the above?

Fegraus: All of the above. We'll be using Vertica or we're using Vertica OnDemand. We're actually going to transition our existing on-premise solution into Vertica OnDemand. The solution we’re developing uses mostly open-source software and it can be replicated in the Amazon cloud or other clouds that have the right environments where we can get things up and running.

Gardner: Jorge, how important is that to have that global choice for cloud deployment and attract users and also keep your cost limited?

Ahumada: It’s really key, because in many of these countries, it's very difficult for some of those governments to expand out their old solutions on the ground. Cloud solutions offer a very good, effective way to manage data. As Eric was saying, the big limitation here is which cloud solutions are available in each country. Right now, we have something with cloud OnDemand here, but in some of the countries, we might not have the same infrastructure. So we'll have to contract different vendors or whatever.

But it's a way to keep cost down, deliver the information really quick, and store the data in a way that is safe and secure.

What's next?

Gardner: Eric, now that we have this ability to retrieve, gather, analyze, and now distribute, what comes next in terms of having these organizations work together? Do we have any indicators of what the results might be in the field? How can we measure the effectiveness at the endpoint -- that is to say, in these environments based on what you have been able to accomplish technically?

Fegraus: One of the nice things about the software that we built that can run in the various cloud environments, is that it can also be connected. For example, if we start putting these solutions in a particular continent, and there are countries that are doing this next to each other, there are not going to be silos that will be unable to share an aggregated level of data across each other so that we can get a holistic picture of what's happening.

So that was very important when we started going down this process, because one of the big inhibitors for growth within the environmental sciences is that there are these traditional silos of data that people in organizations keep and sit on and essentially don't share. That was a very important driver for us as we were going down this path of building software.

Gardner: Jorge, what comes next in terms of technology. Are the scale issues something you need to hurdle to get across? Are there analytics issues? What's the next requirements phase that you would like to work through technically to make this even more impactful?

Ahumada: As we scale up in size and  start  having more granularity in the countries where we work, the challenge is going to be keeping these systems responsive and information coming. Right now, one of the big limitations is the analytics. We do have analytics running at top speeds, but once we started talking about countries, we're going to have an the order of many more species and many more protected areas to monitor.
This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data.

This is something that the industry is starting to move forward on in terms of incorporating more of the power of the hardware into the analytics, rather than just the storage and the management of data. We're looking forward to keep working with our technology partners, and in particular HP, to help them guide this process. As a case study, we're very well-positioned for that, because we already have that challenge.

Gardner: Also it appears to me that you are a harbinger, a bellwether, for the Internet of Things (IoT). Much of your data is coming from monitoring, sensors, devices, and cameras. It's in the form of images and raw data. Any thoughts about what others who are thinking about the impact of the IoT should consider, now that you have been there?

Fegraus: When we talk about big data, we're talking about data collected from phones, cars, and human devices. Humans are delivering the data. But here we have a different problem. We're talking about nature delivering the data and we don't have that infrastructure in places like Uganda, Zimbabwe, or Brazil.

So we have to start by building that infrastructure and we have the camera traps as an example of that. We need to be able to deploy much more, much larger-scale infrastructure to collect data and diversify the sensors that we currently have, so that we can gather sound data, image data, temperature, and environmental data in a much larger scale.

Satellites can only take us some part of the way, because we're always going to have problems with resolution. So it's really deployment on the ground which is going to be a big limitation, and it's a big field that is developing now.

Gardner: Drones?

Using drones

Fegraus: Drones, for example, have that capacity, especially small drones that are showing to be intelligent, to be able to collect a lot of information autonomously. This is at the cutting edge right now of technological development, and we're excited about it.

Gardner: Well great. I'm afraid we will have to leave it there. We have been learning and exploring how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval, and analysis. And we've seen how quantitative analysis and modeling are generating new insights into what's happening in tropical ecosystems worldwide.

So a big thanks to our guests, Eric Fegraus, Senior Director of Technology of the TEAM Network at Conservation International, and Jorge Ahumada, the Executive Director of the TEAM Network, also at Conservation International.
No-Compromise Big Data Analytics
With HP Vertica OnDemand
Request Your 30-Day Free Trial
And a big thank you to our audience as well for joining us for this big data innovation case study discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host for this ongoing series of HP sponsored discussions. Thanks again for listening, and come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Hewlett Packard Enterprise.

Transcript of a discussion on how large-scale monitoring of rainforest, biodiversity and climate has been enabled and accelerated by cutting-edge, big-data capture, retrieval and analysis. Copyright Interarbor Solutions, LLC, 2005-2015. All rights reserved.

You may also be interested in: