BriefingsDirect Transcripts: analytics

Showing posts with label analytics. Show all posts

Friday, June 07, 2019

How Real-Time Data Streaming and Integration Set the Stage for AI-Driven DataOps

Transcript of a discussion the latest strategies for uniting and governing data wherever it resides to enable rapid and actionable analysis.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Qlik.

Dana Gardner: Hi, this is Dana Gardner, Principal Analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Our next business intelligence (BI) trends discussion explores the growing role of data integration in a multi-cloud world.

Gardner

Just as enterprises seek to gain more insights and value from their copious data, they’re also finding their applications, services, and raw data spread across a hybrid and public clouds continuum. Raw data is also piling up closer to the edge -- on factory floors, in hospital rooms, and anywhere digital business and consumer activities exist.

Stay with us now as we examine the latest strategies for uniting and governing data wherever it resides. By doing so, businesses are enabling rapid and actionable analysis -- as well as entirely new levels of human-to-augmented-intelligence collaboration.

To learn more about the foundational capabilities that lead to a total data access exploitation, we’re now joined by Dan Potter, Vice President of Product Marketing at Attunity, a Division of Qlik. Welcome, Dan.

Dan Potter: Hey, Dana. Great to be with you.

Gardner: Dan, what are the business trends forcing a new approach to data integration?

Potter: It’s all being driven by analytics. The analytics world has gone through some very interesting phases of late: Internet of Things (IoT), streaming data from operational systems, artificial intelligence (AI) and machine learning (ML), predictive and preventative kinds of analytics, and real-time streaming analytics.

Potter

So, it’s analytics driving data integration requirements. Analytics has changed the way in which data is being stored and managed for analytics. Things like cloud data warehouses, data lakes, streaming infrastructure like Kafka -- these are all a response to the business demand for a new style of analytics.

As analytics drives data management changes, the way in which the data is being integrated and moved needs to change as well. Traditional approaches to data integration – such as batch processes, more ETL, and scripted-oriented integration – are no longer good enough. All of that is changing. It’s all moving to a much more agile, real-time style of integration that’s being driven by things like the movement to the cloud and the need to move more data in greater volume, and in greater variety, into data lakes, and how do I shape that data and make it analytics-ready.

With all of these movements, there have been new challenges and new technologies. The pace of innovation is accelerating, and the challenges are growing. The demand for digital transformation and the move to the cloud has changed the landscape dramatically. With that came great opportunities for us as a modern data integration vendor, but also great challenges for companies that are going through this transition.

Gardner: Companies have been doing data integration since the original relational database (RDB) was kicked around. But it seems the core competency of managing the integration of data is more important than ever.

Innovation transforms data integration

Potter: I totally agree, and if done right, in the future, you won’t have to focus on data integration. The goal is to automate as much as possible because the data sources are changing. You have a proliferation of NoSQL databases, graph databases; it’s no longer just an Oracle database or RDB. You have all kinds of different data. You have different technologies being used to transform that data. Things like Spark have emerged along with other transformation technologies that are real-time-oriented. And there are different targets to where this data is being transformed and moved to.

It's difficult for organizations to maintain the skills set -- and you don’t want them to. We want to move to an automated process of data integration. The more we can achieve that, the more valuable all of this becomes. You don’t spend time with mundane data integration; you spend time on the analytics -- and that’s where the value comes from.

Gardner: Now that Attunity is part of Qlik, you are an essential component of a larger undertaking, of moving toward DataOps. Tell me why automated data migration and integration translates into a larger strategic value when you combine it with Qlik?

Potter: DataOps resonates well for the pain we’re setting out to address. DataOps is about bringing the same discipline that DevOps has brought to software development. Only now we’re bringing that to data and data integration for analytics.

How do we accelerate and remove the gap between IT, which is charged with providing analytics-ready data to the business, and all of the various business and analytics requirements? That’s where DataOps comes in. DataOps is technology, but that’s just a part of it. It’s as much or more about people and process -- along with enabling technology and modern integration technology like Attunity.

We’re trying to solve a problem that’s been persistent since the first bit of data hit a hard drive. Data integration challenges will always be there, but we’re getting smarter about the technology that you apply and gaining the discipline to not boil the ocean with every initiative.

The new goal is to get more collaboration between what business users need and to automate the delivery of analytics-ready data, knowing full-well that the requirements are going to change often. You can be much more responsive to those business changes, bring in additional datasets, and prepare that data in different ways and in different formats so it can be consumed with different analytics technologies.

That’s the big problem we’re trying to solve. And now, being part of Qlik gives us a much broader perspective on these pains as relates to the analytics world. It gives us a much broader portfolio of data integration technologies. The Qlik Data Catalyst product is a perfect complement to what Attunity does.

Our role in data integration has been to help organizations move data in real-time as that data changes on source systems. We capture those changes and move that data to where it's needed -- like a cloud, data lake, or data warehouse. We prepare and shape that data for analytics.

Our role in data integration has been to help organizations move data in real-time as that data changes on source systems. We capture those changes and move that data to where it’s needed -- like a cloud, data lake, or data warehouse. We prepare and shape that data for analytics.

Qlik Data Catalyst then comes in to catalog all of this data and make it available to business users so they can discover and govern that data. And it easily allows for that data to be further prepared, enriched, or to create derivative datasets.

So, it’s a perfect marriage in that the data integration world brings together the strength of Attunity with Qlik Data Catalyst. We have the most purpose-fit, modern data integration technology to solve these analytics challenges. And we’re doing it in a way that fits well with a DataOps discipline.

Gardner: We not only have the different data types, we have another level of heterogeneity to contend with and that’s cloud, hybrid cloud, multi-cloud, and edge. We don’t even know what more is going to be coming in two or three years. How does an organization stay agile given that level of dynamic complexity?

Real-time analytics deliver agility

Potter: You need a different approach for a different style of integration technology to support these topologies that are themselves very different. And what the ecosystem looks like today is going to be radically different two years from now.

The pace of innovation just within the cloud platform technologies is very rapid. Just the new databases, transformation engines, and orchestration engines -- it’s just proliferates. And now you have multiple cloud vendors. There are great reasons for organizations to use multiple clouds, to use the best of the technologies or approaches that work for your organization, your workgroup, your division. So you need that. You need to prepare yourself for that, and modern integration approaches definitely help.

One of the interesting technologies to help organizations provide ongoing agility is Apache Kafka. Kafka is a way to move data in real-time and make the data easy to consume even as it’s flowing. We see that as an important piece of the evolving data infrastructure fabric.

At Attunity we create data streams from systems like mainframes, SAP applications, and RDBs. These systems weren’t built to stream data, but we stream-enable that data. We publish it into a Kafka stream and that provides great flexibility for organizations to, for example, process that data in real time for real-time analytics such as fraud detection. It’s an efficient way to publish that data to multiple systems. But it also provides the agility to be able to deliver that data widely and have people find and consume that data easily.

Such new, evolving approaches enable a mentality that says, “I need to make sure that whatever decision I make today is going to future-proof me.” So, setting yourself up right and thinking about that agility and building for agility on day one is absolutely essential.

Gardner: What are the top challenges companies have for becoming masterful at this ongoing challenge -- of getting control of data so that they can then always analyze it properly and get the big business outcomes payoff?

Potter: The most important competency is on the enterprise architecture (EA) level, more than on the people who traditionally build ETL scripts and integration routines. I think those are the piece you want to automate.

The real core competency is to define a modern data architecture and build it for agility so you can embrace the changing technologies and requirements landscape. It may be that you have all of your eggs in one cloud vendor today. But you certainly want to set yourself up so you can evolve and push processing to the most efficient place, and to attain the best technology for the kinds of analytics or operational workloads you want.

That’s the top competency that organizations should be focused on. As an integration vendor, we are trying to reduce the reliance on technical people to do all of this integration work in a manual way. It’s time-consuming, error-prone, and costly. Let’s automate as much as we can and help companies build the right data architecture for the future.

Gardner: What’s fascinating to me, Dan, in this era of AI, ML, and augmented intelligence is that we’re not just creating systems that will get you to that analytic opportunity for intelligence. We are employing that intelligence to get there. It’s tactical and strategic. It’s a process, and it’s a result.

How do AI tools help automate and streamline the process of getting your data lined up properly?

Automated analytics advance automation

Potter: This is an emerging area for integration technology. Our focus initially has been on preparing data to make it available for ML initiatives. We work with vendors such as Databricks at the forefront of processing, using a high performance Spark engine and processing data for data science, ML, and AI initiatives.

We need to ask, “How do we apply cognitive engines, things like Qlik, to the fore within our own technology and get smarter about the patterns of integration that organizations are deploying so we can further automate?” That’s really the next way for us.

Gardner: You’re not just the president, you’re a client.

Potter: Yeah, that’s a great way to put it.

Gardner: How should people prepare for such use of intelligence?

Potter: If it’s done right -- and we plan on doing it right -- it should be transparent to the users. This is all about automation done right. It should just be intuitive. Going back 15 years when we first brought out replication technology at Attunity, the idea was to automate and abstract away all of the complexity. You could literally drag your source, your target, and make it happen. The technology does the mapping, the routing, and handles all the errors for me. It’s that same elegance. That’s where the intelligence comes in, to make it so intuitive that you are not seeing all the magic that’s happening under the covers.

This is all about automation done right. It should just be intuitive. When we first brought out replication technology at Attunity, the idea was to automate and abstract away all of the complexity. That's now where the intelligence comes in, to make it so intuitive that you are not seeing all the magic under the covers.

We follow that same design principle in our product. As the technologies get more complex, it’s harder for us to do that. Applying ML and AI becomes even more important to us. So that’s really the future for us. You’ll continue to see, as we automate more of these processes, all of what is happening under the covers.

Gardner: Dan, are there any examples of organizations on the bleeding edge? They understand the data integration requirements and core competencies. They see this through the lens of architecture.

Automation insures insights into data

Potter: Zurich Insurance is one of the early innovators in applying automation to their data warehouse initiatives. Zurich had been moving to a modern data warehouse to better meet the analytics requirements, but they realized they needed a better way to do it than in the past.

Traditional enterprise data warehousing employs a lot of people, building a lot of ETL scripts. It tends to be very brittle. When source systems change you don’t know about it until the scripts break or until the business users complain about holes in their graphs. Zurich turned to Attunity to automate the process of integrating, moving it to real-time, and automatically structuring their data warehouse.

Their capability to respond to business users is a fraction of what it was. They reduced 45-day cycles to two-day cycles for updating and building out new data marts for users. Their agility is off the charts compared to the traditional way of doing it. They can now better meet the needs of the business users through automation.

As organizations move to the cloud to automate processes, a lot of customers are embracing data lakes. It’s easy to put data into a data lake, but it’s really hard to derive value from the data lake and reconstruct the data to make it analytics-ready.

For example, you can take transactions from a mainframe and dump all of those things into a data lake, which is wonderful. But how do I create any analytic insights? How do I ensure all those frequently updated files I’m dumping into the lake can be reconstructed into a queryable dataset? The way people have done it in the past is manually. I have scriptures using Pig and other languages try to reconstruct it. We fully automate that process. For companies using Attunity technology, our big investments in data lakes has had a tremendous impact on demonstrating value.

Gardner: Attunity recently became part of Qlik. Are there any clients that demonstrate the combination of two-plus-two-equals-five effect when it comes to Attunity and the Qlik Catalyst catalog?

DataOps delivers the magic

Potter: It’s still early days for us. As we look at our installed base -- and there is a lot of overlap between who we sell to -- the BI teams and the data integration teams in many cases are separate and distinct. DataOps brings them together.

In the future, as we take the Qlik Data Catalyst and make that the nexus of where the business side and the IT side come together, the DataOps approach leverages that catalog and extends it with collaboration. That’s where the magic happens.

So business users can more easily find the data. They can send the requirements back to the data engineering team as they need them. By, again, applying AI and ML to the patterns that we are seeing from the analytics side will help better apply that to the data that’s required and automate the delivery and preparation of that data for different business users.

That’s the future, and it’s going to be very interesting. A year from now, after being part of the Qlik family, we’ll bring together the BI and data integration side from our joint customers. We are going to see some really interesting results.

Gardner: As this next, third generation of BI kicks in, what should organizations be doing to get prepared? What should the data architect, who is starting to think about DataOps, do to put them in an advantageous position to exploit this when the market matures?

Potter: First they should be talking to Attunity. We get engaged early and often in many of these organizations. The hardest job in IT right now is [to be an] enterprise architect, because there are so many moving parts. But we have wonderful conversations because at Attunity we’ve been doing this for a long time, we speak the same language, and we bring a lot of knowledge and experience from other organizations to bear. It’s one of the reasons we have deep strategic relationships with many of these enterprise architects and on the IT side of the house.

They should be thinking about what’s the next wave and how to best prepare for that. Foundationally, moving to more real-time streaming integration is an absolute requirement. You can take our word for it. You can go talk to analysts and other peers around the need for real-time data and streaming architectures, and how important that is going to be in the next wave.

Data integration is strategic, it unlocks the value of the data. If you do it right, you're going to set yourself up for long-term success.

So, preparing for that and again thinking about the agility in the automation that’s going to get them the desired results because if they’re not preparing for that now, they are going to be left behind, and if they are left behind the business is left behind, and it is a very competitive world and organizations are competing on data and analytics. So the faster that you can deliver the right data and make it analytic-ready, the faster and better decisions you can make and the more successful you’ll be.

So it really is a do-or-die kind of proposition and that’s why data integration, it’s strategic, it’s unlocking the value of this data, and if you do it right, you’re going to set yourself up for long-term success.

Gardner: I’m afraid we’ll have to leave it there. You’ve been listening to a sponsored BriefingsDirect discussion on the role of data integration in a multicloud world. And we have learned how the latest strategies for uniting and governing all of data, wherever it resides, enables rapid and actionable analysis.

So, a big thank you to our guest, Dan Potter, Vice President of Product Marketing at Attunity, a Division of Qlik.

Potter: Thank you, Dana. Always a pleasure.

Gardner: And a big thank you as well to our audience for joining this BriefingsDirect business intelligence trends discussion. I’m Dana Gardner, Principal Analyst at Interarbor Solutions, your host throughout this series of Qlik-sponsored BriefingsDirect interviews.

Thanks again for listening. Please pass this along to your IT community, and do come back next time.

Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: Qlik.

Transcript of a discussion on the latest strategies for uniting and governing data wherever it resides to enable rapid and actionable analysis. Copyright Interarbor Solutions, LLC, 2005-2019. All rights reserved.

You may also be interested in:

Wednesday, August 03, 2016

How IT Innovators Turn Digital Disruption into a Business Productivity Force Multiplier

Transcript of a discussion on digital business transformation and how that’s been accomplished by several prominent enterprises.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Citrix.

Dana Gardner: Hi, this is Dana Gardner, Principal Analyst at Interarbor Solutions, and you're listening to BriefingsDirect.

Gardner

Our next innovation thought leadership panel discussion examines how digital business transformation has been accomplished by several prominent enterprises. We'll explore how the convergence of cloud, mobility, and big-data analytics has prompted companies to innovate and produce new levels of productivity.

We're now joined by some finalists from the Citrix Synergy 2016 Innovation Awards Program. So, please join me in welcoming our panel. We're here with Olaf Romer, Head of Corporate IT and group CIO at Bâloise in Basel, Switzerland. Welcome.

Olaf Romer: Hi, Dana. Thank you very much for your invitation.

Gardner: We're also here with Alan Crawford, CIO of Action for Children in London. Hello, Alan.

Alan Crawford: Hello, Dana. Great to join you.

Gardner: And we're here with Craig Patterson, CEO of Patterson and Associates in San Antonio, Texas. Welcome, Craig.

Craig Patterson: Thank you very much for letting me be here.

Gardner: Olaf, what are the major trends that drove you to reexamine the workplace conceptually, and how did you arrive at your technology direction for innovating in that regard?

Becoming more modern

Romer: First of all, we're Swiss traditional insurance. So, our driver was to become a little bit more modern to get the new generation of people in our company. In Switzerland, this is s a little bit of problem. We also have big companies in Zurich, for example. So, it’s very important for us.

Romer

We did this in two directions. One direction is on the IT side, and the other direction is on the real-estate side. We changed from the traditional office boxes to a flex office with open space, like Google has. Nobody has their own desk, not even me. We can go anywhere in our office and sit with whom we think it’s necessary. This is also on the IT side. We go in this direction to go for more mobility, an easier way to work in our company.

Gardner: And because you’re an insurance organization, you have a borderless type of enterprise, where you need to interact with field offices, other payers, suppliers, and customers, of course.

Was that ability to deal with many different types of end-point environments also a concern, and how did you solve that?

Romer: The first step was inside our company, and now, we want to go outside to our brokers and to our customers. The security aspect is very, very important. We're still working on being absolutely secure, because we're handling sensitive customer data. We're still in the process of opening our ecosystem outward to the brokers and customers, but also to other companies we work with. [See related post, Expert panel explores the new reality for cloud security and trusted mobile apps delivery.]

Gardner: Alan, tell us about Action for Children and what you’ve been doing in terms of increasing the mobile style of interactions in business.

Crawford: Action for Children is a UK charity. It helps 300,000 children, families, and young people every year. About 5,000 staff, operate from between 300 and 500 branches. So, 300 are our own and a couple of hundred locations are with our partner agencies.

Crawford

When I started there, the big driver was around security and mobility. A lot of the XP computers were running out of support, and the staff outside the office was working on paper.

There was a great opportunity in giving modern tablets to staff to improve the productivity. Productivity in our case means that if you spend less time doing unnecessary visits or do something in one visit instead of three, you can spend more quality time with the family to improve the outcomes for the children.

Gardner: And, of course, as a non-profit organization, costs are always a concern. We’ve heard an awful lot here at Citrix Synergy about lower cost client and endpoint devices. Has that been a good news to your ears? [Learn more about Citrix Synergy 2016.]

Productivity improvements

Crawford: It has. We started with security and productivity as being the main drivers, but actually, as we’ve rolled out, we’ve seen those productivity improvements arise. Now, we're looking at the cost, about the savings we can make on travel, print, and stationery. Our starting budget this year is £1.3 million ($1.7 million) less than it was the year before we introduced tablets for those things. We're trying to work out exactly how much of that we can attribute to the mobile technology and how much of that is due to other factors.

Gardner: Craig, you're working with a number of public sector organizations. Tell us about what they are facing and what mobility as a style of work means to them.

Patterson: Absolutely. I'm working with a lot of public housing authorities. One is Lucas Metropolitan, and other is Hampton Redevelopment Agency. What they're facing is declining budgets and a need to do more with less.

Patterson

When we look at traditional housing-authority and government-service agencies that are paper-based, paper just continues to multiply. You put one piece in the copier and 20 pieces come out. So, being able to take the documents that contain secure private information of our clients and connect those with the clients out in the field is why we need mobility and efficiency and workflows.

And the cloud is what came to mind with that. With content management, we can capture data out in the field. We can move our staff out in the field. We don’t have to bring all of the clients into the office, which can sometimes pose a hardship, especially for elderly, disabled, and many of those in the greatest need. Mobility and efficiency with the cloud and the security have become paramount in how we perform our business.

Gardner: I suppose another aspect of mobility is the ability to bring data in analytics to the very edge. Have you yet to take advantage of that or do you see that it’s something that you’re going to be working toward?

Patterson: We know that it’s something we're working toward. We know from the analytics that we’ve been able to see so far that mobility is the key. For some time, people have thought that we can’t put online things like applications for affordable housing, because people don’t have access to the Internet.

Our analytics prove that entirely wrong. Age groups of 75 and 80 were accessing it on mobile devices faster than the younger group was. What it means is that they find a relative, a grandchild or whoever they need that allows them to access the Internet. It’s been our mindset that has kept us from making the internet and those mobility avenues into our systems available on a broader scale. So, we're moving in that direction so that self service to that community can be displayed more in a broader context.

Measuring outcomes

Crawford: On the analytics and how that’s helped by the mobile working, we had a very similar result in Action for Children in the same year we brought out tablets. We started to do outcome measures with the children we were with. To reach a child, we do a baseline measure when we first meet the family, and then maybe three months later, whatever the period of the intervention, we do a further measure.

Doing that directly on a tablet with the family present has really enhanced the outcome measures. We now have measures on 50,000 children and we can aggregate that, see what the trends are, see what the patterns are geographically by types of service and types of intervention.

Gardner: So it’s that two-way street; the more data and analytics you can bring down to the edge, the more you can actually capture and reapply, and that creates a virtuous cycle of improvement in productivity.

Crawford: Absolutely. In this case, we're looking at the data and learning lessons about what works better to improve the outcomes for disadvantaged children, which is really what we're about.

Gardner: Olaf, user experience is a big topic these days, and insurance, going right to the very edge of where there might be a settlement event of some sort, back to the broker, back to the enterprise. User experience improvements at every step of that means ultimately a better productive outcome for your end-customers. [See related post, How the Citrix Technology Professionals Program produces user experience benefits from greater ecosystem collaboration.]

How does user experience factor into this mobility and data in an analytics equation?

We're looking at the data and learning lessons about what works better to improve the outcomes for disadvantaged children, which is really what we're about.

Romer: First of all, the insurance business is a little bit different business than the others here. The problem is that our customers normally don’t want to touch us during the year. They get a one-time invoice from us and they have to pay the premium. Then, they hope, and we also hope, that they will not have a claim.

We have only one touch a year, and this is little bit of problem. We try to do everything to be more attractive for the customer to get them to us, so that for them it’s clear if they have a problem or need a new insurance, they go to Bâloise Insurance.

We're working on it to bring a little bit of consumerization. In former years the insurance business was very difficult and it wasn’t transparent. The customers have to answer 67 questions before they can take out insurance with us, and this is the point. To make it as simple as possible and to work with a new technology, we have to be attractive for the customers, like taking out insurance through an iPhone. That’s not so easy.

If you talk with a core insurance guy to calculate the premiums, they won’t already have the 67 answers from the customers. So, it's not only the technology, but working a little bit in a differently in the insurance business. The technology will also help us there. For me, the buzzword is big data, and now we have to bring out the value of the data we have in our business, so that we can go directly with the right user interface to the right customer area.

Gardner: Another concept that we have heard quite a bit at Synergy is the need to allow IT to say yes more often. Starting with you Craig, what are you seeing in the trends and in the technology that is perhaps most impactful for you to be able to say yes to the requests and the need for agility in these businesses, in these public sector organizations?

Device agnosticism

Patterson: It’s the device agnosticism, where you bring your own device (BYOD). It’s a device that the individuals are already familiar with. I'm going to take it from two angles. It could be the employee that’s delivering a service out to a customer in the field that can bring their own device, or a partner or contractor, so that we can integrate and shrink-wrap certain data. We will still have data security while they're deploying or doing something out in the field for us. It could be inspections, customer service, medical, etc.

But then, on the client end, they have their own device. By our being able to deliver products through portals that don’t care what device they have, it’s based on mobile protocols and security. Those are the types of trends that are going to allow us to collect the big analytics, know what we think we know, and find out whether we really know it or not and find it, get the facts for it.

The other piece of it though is to make it easy to access the services that we provide to the community, because now it’s a digital community; it’s not just the hardcore community. To see people in a waiting line now for applications hurts my feelings. We want to see them online, accessing it 24×7, when it makes sense for them. Those are the types of services that I see becoming the greater trends in our industry.

Those are the types of trends that are going to allow us to collect the big analytics, know what we think we know, and find out whether we really know it or not and find it, get the facts for it.

Gardner: Alan, what allows you to say “yes” more often?

Crawford: When I started with the XP laptops, we were saying no. So doing lot of comparisons in program within our center now, they're using the tablets and the technology. You have closed Facebook groups with those families. There's now peer support outside hours, when children are going to bed, which is often when they have issues in a family.

They use Eventbrite, the booking app. There are some standard off-the-shelf apps, but the real enterprise in our service in a rural community currently tells everybody in that community what services they're running through posters and flyers that were printed off. That moved to developing our own app. The prototypes are already out there, and the full app will be out there in a few weeks time. We're saying yes to all of those things. We want to support them. It is not just yes, but yes and how can we help you do that.

Gardner: Olaf, of course, productivity is only as good as the metrics that we need to convince the higher-ups in the board room that we need more investment or that we're doing good work with our technology. Do you have any measurements, metrics, even anecdotes about how you measure productivity and what you've done to modernize your workspaces?

Romer: Yes, for us it’s the feedback from the people. It’s very difficult to measure it on a clear technology level, but feedback from the people is very good and very important for us. You can see with the BYOD we introduced one and a half years ago, a stronger cultural change in collaboration. We work together much more efficiently in the company and in the different departments.

In former times, we had closed file shares, and I couldn't see the files of the department next to me. Now, we're working completely in a modern collaboration way. Still, on traditional insurances, let’s say with the government, it’s very hard for them to work in the new style..

In the beginning, there were very strong concerns about that, and now we're in a cultural shift on this. We get a lot of good feedback that in project teams, or in the case of some problems or issues, we can work much better and faster together.

Metrics of success

Gardner: Craig, of course it’s great to say yes to your constituents, but it’s also good to say that we're doing more with less to your higher-ups and those that control the budget. Any metrics of success that you can recall in some of the public-sector organizations you're working with?

Patterson: Absolutely. I'll talk about files in workflow. When a document comes into the organization before, we mapped how much time and money it took to get it in a file folder, having been viewed by everyone that it needs to get viewed by. To give quick context, before, a document took a file folder, a label maker, copy machine, and every time a person needed to put a document in that folder, someone had to get it there. Now, the term "file clerk" is actually becoming obsolete.

When a document come in, it gets scanned, it’s instantaneously put in the correct order in the right electronic folder, and an electronic notification is sent to the person who needs to know. That happens in seconds. When you look at each month, it amounts to savings; before, we were managing files, rather than assisting people.

We can now see how many file folders you looked at, how many documents you actually touched, read, and reviewed in comparison with somebody else.

The metrics are in the neighborhood of just about 75 percent paper reduction, because people aren’t making copies. This means they're not going to the copy machine and along the way, the water-cooler and conversation pits. That also abates some of the efficiencies. We can now see how many file folders you looked at, how many documents you actually touched, read, and reviewed in comparison with somebody else.

We had as many as five documents, in comparison with 1,700 in a month. That starts to tell you some things about where your workload is shifting. Not everyone likes that. They might consider it a little bit "big brother," but we need those analytics to know how best to change our workflows to serve our customer, and that’s the community.

Gardner: I don’t know if this is a metric that’s easy to measure, but less bureaucracy would be something that I think just about everyone would be in favor of. Can you point to something that says we're able to reduce bureaucracy through technology?

Patterson: When you look at bureaucracy and unnecessary paper flows, there are certain yes-and-no questions that are part of bureaucracy. Somebody has it go their desk and their job is to stamp yes or no on it. What decision do you have to make? Well they really don’t; they just have to stamp yes. To me, that’s classic bureaucracy.

Well, if the document hits that person’s desk and it meets a certain criteria or threshold, the computer automatically and instantaneously approves it and it has a documented audit trail. That saves some of our clients in the housing-authority industry, when the auditors come and review things. But if you had to make a decision, it forced you to know how long it took you to make it. So, we can look at why is it taking so long or there are questions that you don’t need to be answering.

Gardner: So let the systems do what they do best and let the people do the exception management and the value-added activities. Alan, you had some thoughts about metrics of success of bureaucracy or both?

Proxy measure

Crawford: Yes, it’s the metrics. The Citrix CEO [Kirill Tatarinov] talked at Citrix Synergy about productivity actually going down in the last few years. We’ve put all these tablets out there and we have individual case studies where we know a particular family-support worker has driven 1,700 miles in the year with the tablet, and it was 3,400 miles in the year without. That’s a proxy measure of how much time they're spending on the road, and we have all the associated cost of fuel and wasted time and effort.

We've just installed an app -- actually I have rolled it out in the last month or so -- that measures how many tablets have been switched on in the month, how much they're been used in the day, and what they've been used for. We can break that down by the geographical areas and give that information back to the line managers, because they're the people to whom it will actually make sense.

I'm right at a stage where it’s great information. It’s really powerful, but it’s actually to understand how many hours a day they should be using that tablet. We're not quite sure, and it probably varies from one type of service to another.

We look at those trends over a period of months. We can tell managers that, yes, total staff used them 90 percent, but it’s 85 percent in yours. All managers, I find, are fairly competitive.

There are inhibitors around mobile network coverage and even broadband coverage in some rural areas. We just follow up on all of those user experience information we get back and try and proactively improve them.

Gardner: Well, that may be a hallmark of business agility, when you can try things out, A/B testing. We’ll try this, we’ll try that, we don’t pay a penalty for doing that. We can simply learn from it and immediately apply our lesson back to the process.

Crawford: It’s all about how we support those areas where we identify that they're not making the most of the technology they’ve been given. And it might be human factors. The staff or even the managers are very fearful. Or it might be technical factors. There are inhibitors around mobile network coverage and even broadband coverage in some rural areas. We just follow up on all of those user experience information we get back and try and proactively improve them.

Gardner: Olaf, when we ask enterprises where they are in their digital transformation, many are saying they're just at the beginning. For you, who are obviously well into a digital transformation process, what lessons learned could you share; any words of advice for others as they embark on this journey?

Romer: The first digital transformation in the insurance business was in the middle of 1990s, when we started to go paperless and work with a digital system. Today, more than 90 percent of our new insurance contracts are completely paperless. In Germany, for example, you can give a digital signature. It’s not allowed for the moment in Switzerland, but from a technical perspective, we can do this.

My advice would be that digitalization gives you a good situation to think about to make it simple. We built up great complexity over the years, and now we're able to bring this down and make it as simple as possible. We created the slogan, “Simply Safe,” for us to rethink everything that we're doing to make it simple and safe. Again, for insurance, it's very important that the digitalization brings us not more complexity, but reduces it.

Gardner: Craig, digital transformation, lessons learned, what advice can you offer others as they embark?

Document and workflow

Patterson: In digital transformation, I’ll just use document and workflow. Start with the higher-end items; there's low-hanging fruit there. I don’t know if we'll ever be totally paperless, which would really allow us to go mobile, but at the same time, know what not to scan. Know what to archive and just get rid off. And don't hang on to old technologies for too long. That’s something else that’s starting to happen. The technological revolution in lifecycle of technology is shorter and we need to plan our strategies along those lines.

Gardner: Alan, words of advice on those also interested in digital transformation?

Crawford: For us, it started about connecting with our cause. We’ve got social care staff and since we’re going to do digital transformation, it's not going to really enthuse them. However, if you explain that this is about actually improving the lives of children with technology, then they start to get interested. So, there is a bit about using your cause and relating the change to your cause.

You’ve got to follow through on all this change to get the real benefits out of it. You’ve got to be a bit tenacious with it to really see the benefits in the end.

A lot of our people factors are on how to engage and train. It's no longer IT saying, "Here’s the solution, and we expect you to do ABC." I was working with those social-care workers, and here are the options, what will work for you and how should we approach that, but then it’s never letting up.

Actually, you’ve got to follow through on all this change to get the real benefits out of it. You’ve got to be a bit tenacious with it to really see the benefits in the end.

Gardner: Tie your digital transformation and the organization’s mission that there is no daylight between them.

Crawford: We’ve got the project digitally enabling Action for Children and that was to try and link the two together inextricably.

Gardner: Very good. I'm afraid we’ll have to leave it there. You’ve been listening to a BriefingsDirect discussion, focused on digital business transformation and how that’s been accomplished by several prominent enterprises.

We’ve heard how the convergence of cloud, mobility and big-data analytics has prompted these companies to innovate and produce new levels of productivity. And some of them are finalists from this year’s Citrix Synergy 2016 Innovation Awards program.

So please join me now in thanking our guests, Olaf Romer, Head of Corporate IT and group CIO at Bâloise in Basel, Switzerland; Alan Crawford, CIO of Action for Children in London, and Craig Patterson, CEO of Patterson and Associates in San Antonio, Texas.

And a big think you to our audience as well for joining this Citrix-sponsored business, innovation, thought leadership discussion. I'm Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator. Thanks again for listening, and do come back next time.

Listen to the podcast. Find it on iTunes. Get the mobile app. Download the transcript. Sponsor: Citrix.

Transcript of a discussion on digital business transformation and how that’s been accomplished by several prominent enterprises. Copyright Interarbor Solutions, LLC, 2005-2016. All rights reserved.

Tuesday, December 16, 2008

MapReduce-scale Analytics Change Business Intelligence Landscape as Enterprises Mine Ever-Expanding Data Sets

Transcript of BriefingsDirect podcast on new computing challenges and solutions in data processing and data management.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today, we present a sponsored podcast discussion on the architectural response to a significant and fast-growing class of new computing challenges. We will be discussing how Internet-scale data sets and Web-scale analytics have placed a different set of requirements on software infrastructure and data processing techniques.

Following the lead of such Web-scale innovators as Google, and through the leveraging of powerful performance characteristics of parallel computing on top of industry-standard hardware, we are now focusing on how MapReduce approaches are changing business intelligence (BI) and the data-management game.

More types of companies and organizations are seeking new inferences and insights across a variety of massive datasets -- some into the petabyte scale. How can all this data be shifted and analyzed quickly, and how can we deliver the results to an inclusive class of business-focused users?

We'll answer some of these questions and look deeply at how these new technologies will produce the payback from cloud computing and massive data mining and BI activities. We'll discover how the results can quickly reach the hands of more decision makers and strategists across more types of businesses.

While the challenge is great, the new value for managing these largest data sets effectively offers deep and powerful new tools for business and for social and economic progress.

To provide an in-depth look at how parallelism, modern data infrastructure, and MapReduce technologies come together, we welcome Tim O’Reilly, CEO and founder of O’Reilly Media, and a top influencer and thought leader in the blogosphere. Welcome, Tim.

Tim O’Reilly: Hi, thanks for having me.

Gardner: We're also joined by Jim Kobielus, senior analyst at Forrester Research. Thank you, Jim.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: Also, Scott Yara, president and co-founder at Greenplum. Welcome, Scott.

Scott Yara: Thank you.

Gardner: We're still dealing with oceans of data, even though we have harsh economic times. We see reduction in some industries, of course, but the amount of data and need for analytics across the Internet is still growing rapidly. BI has become a killer application over the past few years, and we're now extending that beyond enterprise-class computing into cloud-class computing.

I want to go to Jim Kobielus first. Jim, why has this taken place now? What is happening in the world that is simultaneously creating these huge data sets, but also making necessary even better analytics across more businesses?

Kobielus: Thanks, Dana. A number of things are happening or have been happening over the past several years, and the trend continues to grow. In terms of the data sets, it’s becoming ever more massive for analytics. It’s equivalent to Moore’s Law, in the sense that every several years, the size of the average data warehouse or data mart grows by an order of magnitude.

In the early 1990s or the mid 1990s, the average data warehouse was in gigabytes. Now, in the mid to late 2000s, it's in the terabytes. Pretty soon, in the next several years, the average data warehouse will be in the petabyte range. That’s at least a thousand times larger than the current middle-of-the-road data warehouse.

Why are data warehouses bulking up so rapidly? One key thing is that organizations, especially in tough times when they're trying to cut costs, continue to consolidate a lot of disparate data sets into fewer data centers, onto fewer servers, and into fewer data warehouses that become ever-more important for their BI and advanced analytics.

What we're seeing is that more data warehouses are becoming enterprise data warehouses and are becoming multi-domain and multi-subject. You used to have tactical data marts, one for your customer data, one for your product data, one for your finance data, and so forth. Now, the enterprise data warehouse is becoming the be all and end all -- one hub for all of those sets.

What that means is that you have a lot of data coming together that never needed to come together before. Also, the data warehouse is becoming more than a data warehouse. It's becoming a full-fledged content warehouse, not just structured relational data, but unstructured and semi-structured data -- from XML, from your enterprise content management (ECM) system, from the Web, from various formats, and so forth. It's coming together and converging into your warehouse environment. That’s like the bottom of the iceberg that’s coming up, you're seeing it now, and it's coming into your warehouse.

Also, because of the Web 2.0 world and social networking, a lot of the customer and market intelligence that you need is out there in blogs, RSS feeds, and various formats. Increasingly, that is the data that enterprises are trying to mine to look for customers, marketing opportunities, cross-sell opportunities, and clickstream analysis. That’s a massive amount of data that’s coming together in warehouses, and it's going to continue to grow in the foreseeable future.

Gardner: Let’s go to Tim O’Reilly. Tim, from your perspective, what has changed over the past 10 or 20 years that makes these datasets so important?

Long-term perspective

O'Reilly: If you look at what I would call Web 2.0 in a long-term historical perspective, in one sense it's a story about the evolution of computing.

In the first age of computing, business models were dominated by hardware. In the second age, they were dominated by software. What started to happen in the 1990s, underneath everybody’s nose, but not understood and seen, was the commodification of software via open industry standards. Open source started to create new business models around data, and, in particular, around network applications that built huge data sets through user participation. That’s the essence of what I call Web 2.0.

Look at Google. It's a BI company, based on massive data sets, where, first of all, they are spidering all the activity off of the Web, and that’s one layer. Then, they do this detailed analysis of the link structure of that Web, and that’s another layer. Then, they start saying, "Well, what else can we find? They start looking at click stream data. They start looking at browsing history, and where people go afterward. Think of all the data. Then, they deliver service against that.

That’s the essence of Web 2.0, building a massive data set, doing real-time analytics against it, and then figuring out what services you can deliver. What’s happening today is that movement is transferring from the consumer Web into business. People are starting to realize, "Oh, the companies that are doing better are better with their data."

A great example of that is Wal-Mart. You can think of Wal-Mart as a Web 2.0 company. They've got end-to-end analytics in the same way that Google does, except they're doing it with stuff. Somebody takes something off the shelf at Wal-Mart and rings it up. Wal-Mart knows, and it sends a signal downstream to the supplier.

We need to understand that this move to real-time understanding of data at massive scale is going to become more and more important as the lever of competitive advantage -- not just in computer businesses, but in all businesses. Data warehousing and analytics aren't just something that you do in the back office and it's a nice-to-have. It's the very essence of competitive advantage moving forward.

When we think about where this is going, we first have to understand that everybody is connected all the time via applications, and this is accelerating, for example, via mobile. The need for real-time analytics against massive data sets is universal.

Look at some of the things that are happening on the phone. Okay, where am I? What data is relevant to me right now, because you know where I am? Speech recognition is starting to come into focus on the phone. Again, it's a massive data problem, integrating not only speech recognition, but also local dialogs. Oh, wait, local again, you start to see some cross connections between data streams that will help you do better.

Even in the case of starting with someone from Nuance about why Google is able to do some interesting things in the particular domain of search and speech recognition, it’s because they're able to cross-correlate two different data sets -- the speech data set and the search data set. They say, "Okay, yeah, when somebody says that, they are most likely looking for this, because we know that. When they type, they also are most likely looking for that." So this idea of cross-correlation between data sets is starting to come up more and more.

This is a real frontier of competitive advantage. You look at the way that new technologies are being explored by startups. So many of the advantages are in data.

A great example is the company where I'm on the board. It's called Wesabe. They're a personal finance application. People upload their bank statements or give Wesabe information to upload their bank statements. Wesabe is able to do customer analytics for these guys, and say, "Oh, you spent so much on groceries." But, more than that, they're able to say, "The average person who shops at Safeway, spends this much. The average person who shops at Lucky spends this much in your area." Again, it's a massive data problem. That’s the heart of their application.

Now, you think the banks are going to get clued into this and they are going to start to say, "Well, what services can we offer?" Phone companies: "What services can we offer against our data?"

One thing that’s going to happen is the migration of all the BI competencies from the back office to the front office, from being something that you do and generate reports from, to something that you actually generate real-time services from. In order to do that, you've absolutely got to have high performance at massive scale.

Second, a lot of these data sets are not the old-fashion data sets where it was simply structured data.

Gardner: Let’s go to Scott Yara. Scott, we need this transformation. We need this competitive differentiation and new, innovative business approaches by more real-time analytics across larger sets and more diverse sets of content and inference. What’s the approach on the solution side? What technologies are being brought to bear, and how can we start dealing with this at the time and scale that’s required?

A big shift

Yara: Sure. For Greenplum, one of the more interesting aspects of what’s going on is that big technology concepts and ideas that have really been around for two or three decades are being brought to bear, because of the big shift that Tim alludes to, and we are big believers. We're now entering this new cycle, where companies are going to be defined by their ability to capture and make use of the data and the user contributions that are coming from their customers and community. That is really being able to make parallel computing a reality.

We look at the other major computing trend today, and it’s a very mainstream thing like virtualization. Well, virtualization itself was born on the mainframe well over 30 years ago. So, why is virtualization today, in 2008, so important?

Well, it took this intersection of major trends. You had x86 and, as Tim mentioned, the commoditization of both hardware and software, and x86 and multi-core machines became incredibly cheap. At the same time, you had a high-level business trend, an industry trend. The rising cost of data centers and power became so significant that CIOs had to think about the efficiency of their data centers and their infrastructure and what could lower the cost of computing.

If you look at running applications on a much cheaper and much more efficient set of commodity systems and consolidating applications through virtualization, that would be a really compelling thing, and we've seen a multi-billion dollar industry born of that.

You're seeing the same thing here, because business is now driven by Web 2.0, by the success of Google, and by their own use and actions of the Web realizing how important data is to their own businesses. That’s become a very big driver, because it turns out that parallel computing, combined with commodity hardware, is a very disruptive platform for doing large-scale data analysis.

The fact that you can take very, very cheap machines, as Google has shown -- off-the-shelf PCs -- and with the right software, combine them to hundreds, thousands and tens of thousands of systems to deliver analytics at a scale that people couldn’t do before. It’s that confluence and that intersection of market factors that's actually making this whole thing possible.

While parallel computing has been around for 30 years, the timing has become such that it’s now having an opportunity to become really mainstream. Google has become a thought leader in how to do this, and there are a lot of companies creating technologies and models that are emblematic of that.

But, at the end of the day, the focus is in software that is purpose-built to provide parallelism out of the box. This allows companies to sift through huge amounts of data, whether structured or unstructured data. All the fault tolerance, all the parallelism, all those things that you need are done in software, so that you choose off-the-shelf hardware from HP, IBM, Dell, and white-box systems. That’s a model that's as disruptive a shift as client-server and symmetric multiprocessing (SMP) computing was on the mainframe.

Gardner: Jim Kobielus, speak to this point of moving the analytic results, the fruits of this impressive engine and architectural shift from the back office to the front office. This requires quite a shift in tools. We're not going to have those front-office folks writing long SQL queries. They're not going to study up on some of the traditional ways that we interact with data.

What’s in the offing for development, so developers can create applications that target this data now that’s in a format that we can get out and is cross-pollinated in huge data sets that are themselves diverse? What’s in store for app dev, and what’s in store for the people that are looking for a graphical way to get into the business strategist type of user?

Self-service paradigm

Kobielus: One thing we're seeing in the front-end app development is, to take Tim’s point even further, it’s very much becoming more of a Web 2.0 user-centric, self-service development paradigm for analytics.

Look at the ongoing evolution of the online analytical processing (OLAP) market, for example. Things that are going on in terms of user self service, development of data mining, advanced analytic applications within their browser, and within their spreadsheet. They can pull data from various warehouses and marts, and online transaction processing (OLTP) systems, but in a visual, intuitive paradigm.

That can catch a lot of that information in the front-end -- in other words, on the desktop or in the mobile device -- and allows the user to graphically build ever-richer reports and dashboards, and then be able to share that all out to the others in their teams. You can build a growing and collective analytical knowledge base that can be shared. That whole paradigm is coming to the fore.

At Forrester, we published a number of reports on it. Recently, Boris Evelson and I looked at the next generation of OLAP technology. One very important initiative to look at is what Microsoft is doing with Project Gemini. They're still working on that, but they demoed it a couple of months ago at their BI show.

The front office is the actual end user, and power users are the ones who are going to do the bulk of the BI and analytics application development in this new paradigm. This will mean that for the traditional high priesthood of data modelers and developers and data mining specialists, more and more of this development will be offloaded from them, so they can do more sophisticated statistical analysis, and so forth.

The front office will do the bulk of the development. The back office -- in other words, the traditional IT data-modeling professionals -- will be there. They'll be setting the policies and they'll be providing the tooling that the end users and the power users will use to build applications that are personalized to their needs.

So IT then will define the best practices, and they'll provide the tooling. They'll provide general coaching and governance around all of the user-centric development that will go on. That’s what’s going to happen.

It’s not just Microsoft. You can look at the OLAP tooling, more user-centric in-memory spreadsheet-centric approaches that IBM, Cognos, Oracle, and others are rolling out or have already rolled out in their product sets. This is where it’s all going.

Gardner: Tim O’Reilly, in the past, when we've opened up more technological power to more people, we've often encountered much greater innovation, unpredictably so. Should we expect some sort of a wisdom-of-crowd effect to come into play, when we take more of these data sets and analytic tools and make them available?

O'Reilly: There's a distinction between the wisdom of crowds and collective intelligence. The wisdom-of-crowds thesis, as expounded by Surowiecki, is that if you get a whole bunch of people independently, really independently, to weigh in on some subject, their average guess is better than any individual expert's. That’s really about a certain kind of quantitative stuff.

But, there's also a machine-learning approach in which you're not necessarily looking for the average, but you're finding different kinds of meaning in data. I think it’s important to distinguish those two.

Google realized that there was meaning in links that every other search engine of the day was throwing away. This was a way of harnessing collective intelligence, but it wasn’t just the wisdom of crowds. This was actually an insight into the structure of the data and the meaning that was hidden in it.

The breakthroughs are coming from the ability of people to discern meaning in data. That meaning sometimes is very difficult to extract, but the more data you have, the better you can be at it.

A great example of this recently is from the last election. Nate Silver, who ran 538.com, was uncannily accurate in calling the results of the election. The reason he was able to do that was that he looked at everybody’s polls, but didn’t just say, "Well, I'm just going to take the average of them." He used all kinds of deep thinking to understand, "Well, what’s the bias in this one. What’s the bias in that one?" And, he was able to develop an algorithm in which he weighted these things differently.

Gardner: I suppose it’s important for us to take the ability to influence the algorithms that target these advanced data sets and put them into the hands of the people that are closer to the real business issues.

More tools are critical

O'Reilly: That’s absolutely true. Getting more tools for handling larger and more complex data sets, and in particular, being able to mix data sets, is critical.

One of the things that Nate did that nobody else did was that he took everybody’s polls and then created a meta-poll.

Another example is really interesting. You guys probably are familiar with the Netflix Challenge, where Netflix has put up a healthy sum of money to whomever can improve their recommendation algorithm by 10 percent. What’s interesting is that people seem to be stuck at about 8 percent, and they haven’t been able to get the last couple of percent.

It occurred to me in a conversation I was having last night that the breakthroughs will come, not by getting a better algorithm against the Netflix data set, but by understanding some other data set that, when mixed with the Netflix data set, will give better predicted results.

Again, that tells us something about the future of data mining and the future of business intelligence. It is larger, more complex, and more diverse data sets in which you are able to extract meaning in new ways.

One other thing. You were talking earlier about the democratization of these tools. One thing I don’t want to pass by is a comment that was made recently by Joe Hellerstein, who is a computer science professor at UC Berkeley. It was one of those real wake-up-and-smell-the-coffee moments. He said that at Berkeley, every freshman student in CS is now being taught Hadoop. SQL is an elective for seniors. You say, "Whoa, that is a fundamental change in our thinking."

That’s why I think what Greenplum is doing is really interesting, trying to marry the old BI world of SQL with the new business intelligence world of these loose, unstructured data sets that are often analyzed with a MapReduce kind of approach. Can we bring the best of these things together?

That fits with this idea of crossing data sets being one of the new competencies that people are going to have to get better at.

Kobielus: If I can butt in here just one moment, I want to tie into something that Tim just said, that I said a little bit earlier. One important thing is that when you add more data sets to say your analytic environment, it gives you the potential to see more cross-correlations among different entities or domains. So, that’s one of the value props for an all-encompassing or more multi-domain enterprise data warehouse.

Before, you had these subject-specific marts -- customer data here, product data there, finance data there -- and you didn’t have any easy way to cross-correlate them. When you bring them altogether into common repository, implementing common dimensions and hierarchies, and conforming with common metadata, it makes it a whole lot easier for the data miners, the power users, and the end users, to build the applications that can tie it altogether.

There is the "aha" moment. "Aha, I didn’t realize all these hooked up in these various ways." You can extract more meaning by bringing it all together into a unified, enterprise data warehouse.

Gardner: To you, Scott Yara. There's a great emphasis here on bringing together different data sets from disparate sources, with entirely different technologies underlying them. It's not a trivial problem. It’s not a matter of scale necessarily.

What do you see as the potential? What is Greenplum working on to allow folks to mix and match in such a way that the analytics can be innovative and game-changing in a harsh economic environment?

Price/performance improvement

Yara: A couple of things. One, I definitely agree with the assertion that analysis gets easier the more data you have. Whether those are heterogeneous data sets or just the scale of data that people can collect, it's fundamentally easier, cheaper.

In general, these businesses are pretty smart. The executives, analysts, or people that are driving business know that their data is valuable and that insight in improving customer experience through data is key. It’s just really hard and expensive, and that has made it prohibitive for a long, long time.

Now, we're talking about using parallel computing techniques, open-source software, and commodity hardware. It’s literally a 10- to 100-fold improvement in price performance. When the cost of data analysis comes down 10 to 100 times, that’s when new things become possible.

O'Reilly: Absolutely.

Yara: We see lots of customers now from the New York Stock Exchange. These are all businesses that are across vertical industries, but are all affected by the Web and network computing at some level.

Algorithmic trading is driving financial services in a way that we haven’t seen before. They're processing billions of trades every day. Whether it's security, surveillance, or real-time support that they need to provide to very large trading companies, that ability to mine and sift through billions of transactions on a real-time basis is acute.

We were sitting down with one of our large telecom customers yesterday, and there was this convergence that Tim’s talking about. You've got companies with very large mobile carrier businesses. They're broadband service providers, fixed-line service providers, and Internet companies.

Today, the kind of basic personalization that companies like Amazon, eBay, or Google do, telecom carriers are just at the beginning of trying to do that. They have to aggregate the consumer event stream from all these disparate communication systems, and it’s at massive scale.

Greenplum is solely focused on making that happen and mixing the modalities of data, as Tim suggested. Whether it’s unstructured data, whether those are things that exist in legacy databases, or whether you want to mix and match SQL or MapReduce, fundamentally you need to make it easy for businesses to do those things. That’s starting to happen.

Gardner: I suppose part of the new environment that we are in economically is that incremental change is probably not going to cut it. We need to find new forms of revenue and be able to attain them at a very low cost, upfront if possible, and be transformative in how we can take our businesses out through the public networks to reach more customers and give them more value.

Now that we've established that we have these data sets, we can combine them to a certain degree, and that will improve over time. What are the ways in which companies can start actually making money in new ways using these technologies?

Apple’s Genius comes to mind for me as a way of saying, "Okay, you pick a song in your iTunes library, and we're going to use our data and our analytics, and come back with some suggestions on what you might like as a result of that." Again, this is sort of a first go at this, but it opens my eyes to a lot of other types of business development opportunities. Any thoughts on this, Tim O’Reilly?

O'Reilly: In general, as I said earlier, this is the frontier of competitive advantage. Sure, iTunes’ has Genius, but it's the same thing with Netflix recommendations. Amazon has been doing this for years. It's part of their competitive advantage. I mentioned earlier how this is starting to be a force in areas like banking. Think about phone companies and all of the opportunities for new local services.

Not only that, one of my pet hobbyhorses is that phone companies have this call-history database, but they're not building new services for users against it. Your phone still only remembers the last few people that you called. Why can’t I do a search against somebody I talked to three months ago. "Who the heck was that? Was it a guy from this company?" You should be able to search that. They've got the data.

So, as I said earlier, the frontier is turning the back office into new user-facing services, and having the analytics in place to be able to do that meaningfully at scale in real-time. This applies to supply chains. It applies to any business that has data that gets better through user interaction.

This is the lesson of the Web. We saw it first in Web applications. I gave you the example earlier of Wal-Mart. They realized, "Oh, wait a minute. Every time somebody buys something, it’s a vote." That’s the same point that Wesabe is trying to exploit. A credit card statement is a voting list.

I went to this restaurant once. That doesn’t necessarily mean anything. If I go back every week, that may mean something. I spent on average this much. It’s going up. That means something. I spend on average this much. It’s going down, and that means something. So, finding meaning in the data that I already have, how could this be useful not just me but to my users, to my customers, and the services could I build.

This is the frontier, particularly in the world that we are entering, in which computing is going mobile, because so many of the mobile services are fundamentally going to be driven by BI. You need to be able to say in real-time or close to real-time, "This is the relevant data set for this person based on where they are right now."

Needed: future view

Kobielus: I want to underline what Tim just said. Traditionally, data warehouses existed to provide you with perfect hindsight on the customer -- historical data, massive historical data, hopefully on the customer, and that 360 degree view of everything about the customer and everything they have ever done in the past, back to the dawn of recorded time.

Now, it’s coming down to managing that customer relationship and evolving and growing with that relationship. You have to have not so much a past or historical view, but a future view on that customer. You need to know that customer and where they are going better than they know themselves.

In other words, that’s where the killer app of the online recommendation engine becomes critical. Then, the data warehouse, as the platform for recommendation engines, can take both the historical data that persists, but also can take the continuing streams of real-time event data on pricing, on customer interaction in various channels -- be it on the Web or over the phone or whatever -- customer transactions that are going on now, and things and events that are going on in the customer social network.

Then, you feed that all into a recommendation engine, which is a predictive-analytics model running inside the data warehouse. That can optimize that customer’s interaction at every touch point. Let’s say they're dealing with a call-center person live. The call-center person knows exactly how the world looks to that customer right now and has a really good sense for what that customer might need now or might need in three month, six months, or a year, in terms of new services or products, because other customers like them are doing similar things.

It can have recommendations being generated and scripted for the call-center agent in real-time saying, "You know what we think. We recommend that you upgrade to the following service plan because, it provides you with these features that you will find useful in your lifestyle, blah, blah, blah."

In other words, it's understanding the customer in their future, in their possible future, and suggesting things to the customers that they themselves didn’t realize until you suggested them. That’s the future of analytics, and competitive advantage.

O'Reilly: I couldn’t agree more.

Gardner: Scott Yara, we've been discussing this with a little bit of a business-to-consumer (B2C) flavor. In the business-to-business (B2B) world many things are equal in a commoditized market, with traditional types of products and services.

An advantage might be that, as a supplier, I'm going to give you analytics that I can derive from data sets that you might not have access to. I might provide analytical results to you as a business partner free of charge, but as an enticement for you to continue to do business with me, when I don’t have any other way to differentiate. What do you see are some of the scenarios possible on the B2B side?

Yara: You don’t have to look much further than what Salesforce.com is doing. In a lot of ways, they're pioneering what it means to be an enterprise technology company that sells services, and ultimately data, back to their customers. By creating a common platform, where applications can be built, they are very much thinking about how the data is being aggregated on the platforms in use, not by their individual customers, but in aggregate.

You're going to see lots of cases where for traditional businesses that are selling services and products to other businesses, the aggregation of data is going to be interesting and relevant. At the same time, you have companies where even the internal analysis of their data is something they haven’t been able to do before.

We were talking about Google, which is an amazing company. They have this big vision to organize the world’s information. What the rest of the business world is finding out is that while it’s a great vision and they have a lot of data, they only have a small fraction of the overall data in the world. Telecommunication companies, financial stock exchange, retail companies, have all of this real-world data that's not being indexed or organized by Google. These companies actually have access to amazing amounts of information about the customers and businesses.

They are saying, "Why can’t we, at the point of interaction -- like eBay, Amazon, or some of these recommended engines -- start to take some of this aggregate information and turn it into improving businesses in the way that the Web companies have done so successfully. That’s going to be true for B2C businesses, as well as for B2B companies.

We're just at the beginning of that. That’s fundamentally what’s so exciting about Greenplum and where we're headed.

Gardner: Jim Kobielus, who does this make sense for right away? Some companies might be a little skeptical. They're going to have to think about this. But where is the low-lying fruit, where are the no-brainer applications for this approach to data and analytics?

Kobielus: No-brainers -- I always hate that term. It sounds like I am condescending, but low-hanging fruit should be one of those "aha!" opportunities that everybody realizes intuitively. You don’t have to explain to them, so in a sense it's a no-brainer. It’s call center -- customer-contact center.

The customer-contact center is where you touch the customer, and where you hopefully initiate, cultivate, nurture, maintain, and grow the customer relationship. It's one of the many places where you do that. There are people in your organization who are in that front-line capacity.

It doesn’t have to be just people. It could be automated programs through your Website that need to be empowered continuously with the full customer context -- the history of that customer's interactions, the customer’s current state, current sentiment and feelings, and with a full context on the customer’s likely future evolution. So, really it's the call center.

In fact, I cover data warehousing for Forrester. I talk to the data warehousing vendors and their customers about in database analytics, where they are selling this capability right now into real-world deployment. The customer call center is, far and away -- with a bullet -- the number one place for inline analytics to drive the customer interaction in a multi-channel fashion.

Gardner: How about you, Tim O’Reilly. Where are some of the hot verticals and early adopters likely to be on this?

O'Reilly: I've already said several times, mobile apps of various kinds are probably highest on the list. But, I'm a big fan of supply chain. There's a lot to be done there, and there's a huge amount of data. There already is a BI infrastructure, but it hasn’t really been tuned to think about it as a customer-facing application. It's really more a back-office or planning tool.

There are enormous opportunities in media, if you want to put it that way. If you think about the amount of money that’s spent on polling and the power of integrating actual data, rather than stated preference, I think it's huge.

How do we actually figure out what people are going to do? There is great marketing study. I forget who told this story, but it was about a consumer product. They showed examples of different colors. It was a boom box or something like that.

They said, "How many of you think white is the cool color, how many of you think black, how many, blah, blah, blah?" All the people voted, and then they had piles of the boom boxes by the door that the people took as their thank you gift. What they said and what they did were completely at variance.

One of the things that’s possible today is that, increasingly, we are able to see what people actually do, rather than what they say they will do or think they will do.

Gardner: We're just about out of time. Scott Yara, what’s your advice for those folks who are just getting their heads wrapped around this on how to get started? It’s not a trivial activity. It does require a great deal of concerted effort across multiple aspects of IT, perhaps more so than in the past. How do you get started, what should you be doing to get ready?

Yara: That’s one of the real advantages. In sort of a orthogonal way, the ability to create new businesses online in the age of Web 2.0 has been fundamentally cheaper and faster. Doing something disruptive inside of business with their data has to be a fundamentally cheaper and easier thing. So not starting with the big vision of where they need to go, and starting with something tactical -- whether it lives in the call center or at some departmental application -- is the best way to get going.

There are technologies, services, and people now that you can actually peel off a real project, and you can deliver real value right away.

I agree with Tim. We're going to see a lot of activity in the mobility and telecommunication space. These companies are just realizing this. If you think about the kind of personalization that you get with almost every major Internet site today, what’s level of personalization you get from your carrier, relative to how much data that they have? You're going to see lots of telecom companies do things with data that will have real value.

One of our customers was saying that in the traditional old data warehousing world, where it was back office, the service level agreement (SLA) was that when a call got placed and logged, it just needed to make its way into the warehouse seven days later. Seven days from the point of origination of a call, it would make itself into a back-office warehouse.

Those are the kinds of things that are going to change, if we are going to really provide mobility, locality, and recommendation services to customer.

It's having a clear idea of the first application that can benefit from data. Call centers are going to be a good area to provide the service representation of a profile of a customer and be able to change the experience. I think we are going to see those things.

So, they're tractable problems. Starting small is what held back enterprise data warehousing before, where they were looking at these huge investments of people and capital and infrastructure. I think that’s really changing.

Gardner: I am afraid we have to leave it there. We've been discussing new approaches to managing data, processing data, mixing data types and sets, and extracting real-time business results from that. We've looked at tools and we've looked at some of the verticals in business advantages.

I want to thank our panel. We've been joined today by Tim O’Reilly, the CEO and founder of O’Reilly Media. Thank you Tim.

O'Reilly: Glad to do it.

Gardner: Jim Kobielus, Forrester senior analyst. Thank you Jim.

Kobielus: Dana, always a pleasure.

Gardner: Scott Yara, president and co-founder of Greenplum. Appreciate it, Scott.

Yara: Great. Thanks everybody.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks, and come back next time.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Transcript of BriefingsDirect podcast on new computing challenges and solutions in data processing and data management. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.

BriefingsDirect Transcripts

Friday, June 07, 2019

How Real-Time Data Streaming and Integration Set the Stage for AI-Driven DataOps

Innovation transforms data integration

Real-time analytics deliver agility

Automated analytics advance automation

Automation insures insights into data

DataOps delivers the magic

Wednesday, August 03, 2016

How IT Innovators Turn Digital Disruption into a Business Productivity Force Multiplier

You may also be interested in:

Tuesday, December 16, 2008

MapReduce-scale Analytics Change Business Intelligence Landscape as Enterprises Mine Ever-Expanding Data Sets

Principal Analyst

Translate this Blog

Folo My Flipboard Magazines

Search Blog

Subscribe to Podcast Via iTunes

BriefingsDirect Network

Blog Archive