Showing posts with label Seth Grimes. Show all posts
Showing posts with label Seth Grimes. Show all posts

Friday, May 07, 2010

Delivering Data Analytics Through Workday SaaS ERP Applications Empowers Business Managers at Actual Decision Points

Transcript of a sponsored BriefingsDirect podcast on benefits of moving to a SaaS model to provide accessible data analytics.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: Workday.

See a demo on how Workday BI offers business users a new experience for accessing the key information to make smart decisions.

About Workday
This BriefingsDirect podcast features software-as-a-service (SaaS) upstart Workday, provider of enterprise solutions for human resources management, financial management, payroll, spend management, and benefits management.


Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today we present a sponsored podcast discussion on how software-as-a-service (SaaS) applications can accelerate the use and power of business analytics.

We're going to use the example of a human capital management (HCM) and enterprise resource planning (ERP) SaaS provider to show how easily customizable views on data and analytics can have a big impact on how managers and knowledge workers operate.

Historically, the back office business applications that support companies have been distinct from the category of business intelligence (BI). Certainly, applications have had certain ways of extracting analytics, but the interfaces were often complex, unique, and infrequently used.

Often, the data and/or tools were off-limits to the line-of-business managers and workers, when it comes to BI. And the larger data gathering analytics from across multiple data sources remain sequestered among the business analysts and were not often dispersed among the business application users themselves.

By using SaaS applications and rich Internet technologies that create different interface capabilities -- as well as a wellspring of integration and governance on the back-end of these business applications (built on a common architecture) -- more actionable data gets to those who can use it best. They get to use it on their terms, as our case today will show, for HCM or human resources managers in large enterprises.

The trick to making this work is to balance the needs that govern and control the data and analytics, but also opening up the insights to more users in a flexible, intuitive way. The ability to identify, gather, and manipulate data for business analysis on the terms of the end-user has huge benefits. As we enter what I like to call the data-driven decade, I think nearly all business decisions are going to need more data from now on.

So, to learn more about how the application and interfaces are the analytics, with apologies to Marshall McLuhan, please join me in welcoming our panel today. We have with us Stan Swete, Vice President of Product Strategy and the CTO at Workday, the sponsor of this podcast. Welcome back to the show, Stan.

Stan Swete: Thanks, Dana.

Gardner: We're also here with Jim Kobielus, Senior Analyst for BI and Analytics at Forrester Research. Welcome, Jim.

Jim Kobielus: Hi, Dana. Hello, everybody.

Gardner: And Seth Grimes, Principal Consultant at Alta Plana Corp., and a contributing editor at TechWeb's Intelligent Enterprise. Welcome, Seth.

Seth Grimes: Thank you, Dana.

Gardner: As I said, I have this notion that we're approaching a data-driven decade, that more data is being created, but increasingly more data needs to be brought to more decisions, and the enterprise, of course, is a primary place where this can take place.

So, let me take this first to you, Jim Kobielus. How are business workers and managers inside of companies starting to relate better to data? How is data typically getting into the hands of those who are in a position to take action on it best?

Dominant BI tool

Kobielus: It's been getting into hands of people for quite some time through their spread sheets, and the dominant BI tool in the world is Microsoft Excel, although that’s a well-kept secret that everybody knows. Being able to pull data from wherever into your Excel spreadsheet and model it and visualize it is how most people have done decision, support, and modeling for a long time in the business world.

BI has been around for quite a long time as well, and BI and spreadsheets are not entirely separate disciplines. Clearly, Excel, increasingly your browser increasingly, and the mobile client, are the clients of choice for BI.

There are so many different tools that you can use now to access a BI environment or capability to do reporting and query and dashboarding and the like that in the business world we have a wealth of different access members to analytics.

One of the areas that you highlighted -- and I want to hear what Stan from Workday has to say -- is the continued growth and resurgence of BI integrated with your line-of-business applications. That’s where BI started and that’s really the core of BI -- the reporting that's built-in to your HCM, your financial management systems, and so forth.

Many companies have multiple customer data repositories, and that, by its very nature, creates a quality issue.



Gardner: But, Jim, haven’t we evolved to a point where the quality of the data and the BI and the ability of people to access and use it have, in a sense, split or separated over the years?

Kobielus: It has separated and split simply because there is so much data out there, so many different systems of record. For starters, many companies have multiple customer data repositories, and that, by its very nature, creates a quality issue, consolidating, standardizing, correcting, and so forth. That’s where data warehouses have come in, as a consolidation point, as the data governance focus.

If the data warehouse is the primary database engine behind BI, BI has shared in that pain, in that low quality, relating to the fact that data warehouses aren’t even the solutions by themselves. Many companies have scads of data warehouses and marts, and the information is pulled from myriad back-end databases into myriad analytic databases and then pushed out to myriad BI tools.

Quality of data is a huge issue. One approach is to consolidate all of your data down to a single system of record, transactional, on-line transaction processing (OLTP) environment, a single data warehouse, or to a single, or at least a unified, data virtualization layer available to your BI environment. Or, you can do none of those things, but to try to consolidate or harmonize it all through common data quality tools or master data management.

The quality issue is just the ongoing pain that every single BI user feels, and there’s no easy solution.

Gardner: Stan, we've heard from Jim Kobielus on the standard BI view of the world, but I am going to guess that you have a little different view in how data and analytics should get in the hands of the people who use it.

Tell us what your experience has been at Workday, particularly as you've gone from your Release 9 to Release 10, and some of the experience you have had with working with managers.

Disparate data sources

Swete: A lot of the view that we have at Workday really supports what Jim said. When I think of how BI is done, primarily in enterprises, I think of Excel spreadsheets, and there are some good reasons for that, but there’s also some disadvantages that that brings.

One addition I would have on it is that, when I look at the emergence of separate BI tools, one driver was the fact that data comes from all kinds of disparate data sources, and it needs aggregation and special tooling to help overcome that problem.

Taking an apps focus, there’s another causal effect of separate BI tools. It comes from the fact that traditional enterprise applications, have been written for what I would call the back-office user. While they do a very good job of securing access to data, they don’t do a very good job of painting a relevant picture for the operational side of the business.

A big driver for BI was taking the information that’s in the enterprise systems and putting a view on some dimensionality that managers or the operational side of the business could relate to. I don’t think apps have done that very well, and that’s where a lot of BI originated as well.

From a Workday perspective, we think that you're going to always need to have separate tools to be data aggregators, to get some intelligence out of data from disparate sources. But, when the data can be focused on the data in a single application, we think there is an opportunity for the people who build that application to build in more BI, so that separate tooling is not needed. That’s what we think we are doing at Workday.

Grimes: Dana, I'd love to riff on this a little bit -- on what Jim said and what Stan has just said. We're definitely in a data-driven decade, but there’s just so much data out there that maybe we should extend that metaphor of driving a bit.

The real destination here is business value, and what provides the roadmap to get from data to business value is the competencies, experiences, and the knowledge of business managers and users, picking up on some of the stuff that Stan just said.

It’s the systems, the data warehouses, that Jim was talking about, but also hosted, as-a-service types of systems, which really focus on delivering the BI capabilities that people need. Those are the great vehicle for getting to that business value destination, using all of that data to drive you along in that direction.

Gardner: Traditionally, however, if you look at back office applications -- as on-premises, silo, stack, self-contained, on their own server -- making these integrations and these data connections requires quite a bit of effort from the IT people. So, the IT department crew is between the data, the integrations, the users, and the people.

What’s different now, with a provider like Workday moving to the SaaS model, is that the integration can happen more seamlessly as a result of the architecture and can be built into more frequent updates of the software. The interface, as I said earlier, becomes the analytics, rather than the integration and the IT department becoming the analytics -- or becoming a barrier to the analytics.

I wonder, Jim Kobielus, if you have a sense of what the architecture-as -destiny angle has here, moving to SaaS, moving to cloud models, looking at what BI can bring vis-à-vis these changes in the architecture. What should we expect to see?

Pervasive BI

Kobielus: "Architecture as destiny." That’s a great phrase. You'd better copyright that, Dana, before I steal it from you.

It comes down to one theme that we use to describe where it’s going, as pervasive BI ... Pervading all decisions, pervading everybody’s lives, but being there, being a ready decision support tool, regardless of where you are at and how you are getting into the data, where it’s hosted.

So in terms of architecture, we can look at the whole emerging cloud space in the most nebulous ways as being this new architecture for pervasive, hosted BI. But that is such a vague term that we have to peel the onion just a little bit more here.

I like what you said just before that, Dana, that the interface is the analytics. That’s exactly true. Fundamentally, BI is all about delivering action and more intelligence to decision agents. I use the term agents here to refer to the fact that the agents may be human beings or they may be workflows that you are delivering, analytic metrics, KPIs, and so forth to.

The analytics are the payload, and they are accessed by the decision agents through an interface or interfaces. Really, the interfaces have to fit and really plug into every decision point -- reporting, query, dashboarding, scorecarding, data mining, and so forth.

What we are really talking about is a data virtualization layer for cloud analytics to enable the delivery of analytics pervasively throughout the organization.



If you start to look, then, at the overall architecture we are describing here for really pervasive BI, hosted on demand, SaaS, cloud, they're very important. But, it's also very much the front-end virtualization layer for virtualization of access to this cloud of data, virtualization of access by a whole range of decision agencies and whatever clients and applications and tools they wish, but also very much virtualization of access to all the data that’s in the middle.

In the cloud, it has to be like a cloud data warehouse ecosystem, but it also has to be a interface. The interfaces between this cloud enterprise data warehouse (EDW) and all the back-end transactional systems have to be through cloud and service oriented architecture (SOA) approaches as well.

What we are really talking about is a data virtualization layer for cloud analytics to enable the delivery of analytics pervasively throughout the organization. At the very highest level, that’s the architecture that I can think of that actually fits this topic.

Gardner: All right. That’s the larger goal, the place where we can get to. I think what Workday is showing is an intermediary step, but an important one.

Stan, tell us a little bit about what Workday is doing vis-à-vis your release 10 update and what that means for the managers of HR, the ones that are looking at that system of record around all the employee information and activities and processes.

Swete: I agree with the holistic view of trying to develop pervasive analytics, but the thing that frequently gets left out, and it has gotten left out even in this conversation, is a focus on the transactional apps themselves and the things they can do to support pervasive analytics.

Maintaining security

For disparate data sources, you're going to need data warehouses. Any time you've got aggregation and separate reporting tools, you're going to need to build interfaces. But, if you think back to how you introduced this topic Dana, how you introduced SaaS, is when you look at IT’s involvement, if interfaces need to get built to convey data, IT has to get involved to make sure that some level of security is maintained.

From Workday’s point of view, what you want to do is reduce the times when you have to move data just to do analysis. We think that there is a role that you can play in applications where -- and this gets IT out of it -- if your application, that is the originator of transactional data, can also support a level of BI and business insight, IT does not have to become as involved, because they bought the app with the trust in the security model that’s inherent to the application.

What we're trying to is leverage the fact that we can be trusted to secure access to data. Then, what we try to do is widen the access within the application itself, so that we don’t have to have separate data sources and interfaces.

This doesn’t cover all cases. You still need data aggregation. But, where the majority of the data is sourced in a transaction system, in our case HR, we think that we, the apps vendor, can be relied on to do more BI.

What we've been working on is constantly enhancing managers' abilities to get access to their data. Up through 2009, that took the form of trying to enhance our report writer and deliver more options for reports, either the option to render reports in a small footprint, we call it Worklet, and view it side by side, whether they are snippets of data, or the option to create more advanced reports.

This is an ability to enhance our built-in report writer to allow managers or back-office personnel to directly create what become little analysis cues.



We had introduced a nice option last year to create what we call contextual reporting, the ability to sort of start with your data -- looking at a worker -- and then create a report about workers from there, with guidance as to all the Workday fields, where they applied to the worker. That made it easier for a manager not to have to search or even remember parts of our data dictionary. They could just look at the data they knew.

This year, we're taking, we think, a major step forward in introducing what we are calling custom analytics. This is an ability to enhance our built-in report writer to allow managers or back-office personnel to directly create what become little analysis cues. We call them matrix reports.

That’s a new report type in our report writer. Basically, you very quickly -- and importantly without coding or migrating data to a separate tool, but by pointing and clicking in our report writer -- get one of these matrix reports that allows slicing and dicing of the data and drilling down into the data in multiple dimensions. In fact, the tool automatically starts with every dimension of the data that we know about based on the source you gave us.

If you say, I want the worker, probably we will pop up about 12 different dimensions to analyze. Then, you actually reduce them down to the ones that you want to analyze -- maybe last performance review, business site, management reporting level, for example, and, let’s say, salary level. So, you could quickly create a cue for yourself to do the analysis.

Then, we let you share that out to other managers in a way in which you don’t have to think about the underlying security. I could write the thing and share it with either someone who works for me or a coworker, and the tool would apply the security that they head to the system, based on its understanding of their roles.

We're trying to make it simple to get this analysis into the hands of managers to analyze their data.

Self-service information

Kobielus: What you are saying there is very important. What you just mentioned there, Stan, is one thing I left off in my previous discussion, which is self-service information and exploration through hierarchical and dimensional drill down and also mashup in collaborative sharing of your mashups. It's where the entire BI space is going, both traditional, big specialized BI vendors, but also vendors like yourself, who are embedding this technology into back office apps, and have adopted a similar architecture. The users want all the power and they're being given the power to do all of that.

Swete: We would completely agree with that. Actually, we like to think that we completely thought this up on our own, but it really has been a path we have been pushed along by our customers. We see from the end users that same demand that you're talking about.

Gardner: Seth, to you. You've focused on web analytics and the interfaces involved with text and large datasets. When you hear about a specific application, like a HCM, providing these interfaces through the web browser, rich and intuitive types of menuing and drop-downs and graphics, does something spark an interest in you? When I saw this, I thought, "Wow, why can’t we do this with a lot more datasets across much more of the web?" Any thoughts about how what Workday is doing could be applied elsewhere?

Grimes: Let me pull something from my own consulting experience here. A few years ago I did a consulting stint to look at the analytics and data-warehousing situation at a cabinet level, U.S. federal government agency. It happens to be headed by a former 2008 Presidential candidate, so it’s actually internationally distributed.

They were using some very mainstream BI tools, with conventional data warehousing, and they had chaos. They had all kinds of people creating reports in different departments, very duplicative reports.

The web is going to be a great mechanism for interconnecting all of the distributed systems that you might have and bringing in additional data that might be germane to your business problems.



There was a lot of cost involved in all of this duplication, because stuff had to get re-proven over and over again, except that when you had all those distributed report creation, with no standards, then nothing was ever done quite the same in two different departments, and that only added to the chaos.

There were all kinds of definability problems, all kinds of standardization problems, and so on. When you do move to this kind of architecture that we are discussing here, architecture is destiny again. The architecture maybe isn't the destiny in my mind, but it creates an imprint for the destiny that you are going to have.

Add in the web. The web is going to be a great mechanism for interconnecting all of the distributed systems that you might have and bringing in additional data that might be germane to your business problems, that isn’t held inside your firewall, and all that kind of stuff. The web is definitely a fact nowadays and it’s so reliable finally that you can run operational systems on top of it.

That’s where some of the stuff that Stan was talking about comes into play. Data movement between systems does create vulnerability. So, it's really great, when you can bundle or package multiple functional components on a single platform.

For example, we've been discussing bundling analytics with the operational system. Whether those operational systems are for HCM, ERP, or for other business functions, it makes security sense, but there are a couple of dimensions that we haven’t discussed yet. When you don’t move your data, then you're going to get fresher data available to the analytical systems. When people create data warehouses, they still often do refreshes on a daily or even less-frequent basis.

See a demo on how Workday BI offers business users a new experience for accessing the key information to make smart decisions.

About Workday
This BriefingsDirect podcast features software-as-a-service (SaaS) upstart Workday, provider of enterprise solutions for human resources management, financial management, payroll, spend management, and benefits management.

Data is not moving

You're also going to have better performance, because the data is not moving. All this is also going to add up to lower support costs. We were talking about IT a little bit earlier. In my experience, IT actually wants to encourage this kind of hosted or as-a-service type of use, because it does speed the time for getting the applications in place. That reduces the IT burden and it really leverages the competencies, experience, and knowledge of the line-of-business users and managers. So, there's only good stuff that one can say about this kind of architecture’s destiny that we have been talking about.

Gardner: I'd like to dive in a bit more on this notion of "the interface is the analytics." What I mean by that is, when you open up the opportunity for people to start getting at the data, slicing it and dicing it based on what they think their needs are, to follow their own intuition about where they want to learn more, maybe creating templates along the way so they can reuse their path, maybe even sharing those templates with other people in the organization, it strikes me that you are getting toward a tipping point of some sort.

The more the people use the data, the better they are at extracting value, and the more that happens, the more that they will use the tools and then share that knowledge, and it becomes a bit of a virtuous adoption opportunity. So, analytics takes on a whole new level of value in the organization based on how it’s being used.

Stan, when you have taken what you are doing with Workday -- rolling out update 10 -- what’s been the response? What’s been the behavioral implication of putting this power in the hands of these managers?

We also have stories from customers who have used this in production to create reports for management that would have taken them weeks, and they did it in less than an hour.



Swete: We have been rolling out 10. I think about half of our customer population is on it, but we have worked through design with our customers and have done early testing. We've also gotten some stories from the early customers in production, and it’s playing out along a lot of the lines that you just mentioned.

A customer we worked particularly close with took their first look. We sat back and looked at what they would build for themselves. The very first analysis they did involved an aging analysis by job profile in their company. They were able to get a quick matrix report built that showed them the ages by job code across their organization.

Then, they could not only look at sort of just a high-level average age number, but click down on it and see the concentration of the detail. They found certain job categories where not only was there a high average age, but a tight concentration around that average, which is an exposure. That’s insight that they developed for themselves.

Pre-Workday 10, the thought might have occurred to us to build that and deliver it as a part of our application, but I don’t think it would have been in the top 10 reports that we would have delivered. And this is something that they wrote for themselves in their first hours using the functionality.

We also have stories from customers who have used this in production to create reports for management that would have taken them weeks, and they did it in less than an hour. That’s because we eliminated the need to move data and think about how that data was staged in another tool, secured in another tool, and then put that all back on to Workday.

Aggressive adoption

S
o, so far so good, I'd say. Our expectation is that these kinds of stories will just increase, as our customers fully get on to this version of Workday. We've seen fairly aggressive adoption of lot of the features that I have mentioned driving into Workday. I think that these requirements will continue to drive us forward to place sort even more power into the insight you can get from our reporting tools.

Grimes: Isn’t that what it's all about, speeding time to insight for the end-users, but, at the same time, providing a platform that allows the organization to grow. That evolves with the organization’s needs, as they do change over time. All of that kind of stuff is really important, both the immediate time to insight and the longer term goal of having in place a platform that will support the evolution of the organization.

Swete: We totally agree with that. When we think about reporting at Workday, we have three things in mind. We're trying to make the development of access to data simple. So that’s why we try to make it always -- never involve coding. We don’t want it to be an IT project. Maybe it's going to be a more sophisticated use of the creation of reports. So, we want it to be simple to share the reports out.

The second word that’s top of my list is relevance. We want the customers to guide themselves to the relevant data that they want to analyze. We try to put that data at hand easily, so they can get access to it. Once they're analyzing the data, since we are a transaction system, we think we can do a better job of being able to take action off of what the insight was.

I call it transalytics. It's a combination of transaction systems and analytics systems. And really it's a closed loop. It must be.



So, we always have what we call related actions as a part of all the reports that you can create, so you can get to either another report or to a task you might want to do based on something a report is showing you.

Then, the final thing, because BI is complex, we also want to be open. Open means that it still has to be easy to get data out of Workday and into the hands of other systems that can do data aggregation.

Kobielus: That’s interesting -- the related action and the capability. I see a lot of movement in that area by a lot of BI vendors to embed action links into analytics. I think the term has been coined before. I call it transalytics. It's a combination of transaction systems and analytics systems. And really it's a closed loop. It must be.

It's actionable intelligence. So, duh, then shouldn't you put an action link in the intelligence to make it really truly actionable? It's inevitable that that’s going to be part of the core uptake for all such solutions everywhere.

Gardner: Jim, have you seen any research or even some anecdotal evidence that making these interfaces available, making the data available without IT, without jumping through hoops of learning SQL or other languages or modeling tools, that it’s a tipping point or some catalyst to adoption? It adds more value to the BI analytics, which therefore encourages the investment to bring more data and analytics to more people. Have you seen any kind of a wildfire like that?

Tipping point

Kobielus: Wildfire tipping point. I can reference some recent Forrester Research. My colleague, Boris Evelson, surveyed IT decision makers -- we have, in fact, in the last few years -- on the priorities for BI and analytics. What they're adopting, what projects they are green lighting, more and more of them involve self-service, pervasive BI, specifically where you have more self-service, development, mashup style environments, where there is more SaaS for quick provisioning.

What we're seeing now is that there is the beginnings of a tipping point here, where IT is more than happy to, as you have all indicated, outsource much of the BI that they have been managing themselves, because, in many ways, the running of a BI system is not a core competency for most companies, especially small and mid-market companies.

The analytics themselves though -- the analysis and the intelligence -- are a core competency they want to give the users: information workers, business analysts, subject matter experts. That's the real game, and they don't want to outsource those people or their intelligence and their insights. They want to give them the tools they need to get their jobs done.

What's happening is that more and more companies, more and more work cultures, are analytic savvy. So, there is a virtuous cycle, where you give users more self-service -- user friendly, and dare I say, fun -- BI capabilities or tools that they can use themselves. They get ever more analytics savvy. They get hungry for more analysis. They want more data. They want more ways to visualize and so forth. That virtuous cycle plays into everything that we are seeing in the BI space right now.

What's happening is that more and more companies, more and more work cultures, are analytic savvy.



Boris Evelson is right now doing a Forrester Wave on BI SaaS, and we see that coming along on a fast track, in terms of what enterprises are asking for. It's the analytics-savvy culture here. There is so much information out there, and analytics are so important.

Ten years ago, it may have seemed dangerous to outsource your payroll or your CRM system. Nowadays, everybody is using something like an ADP or a Salesforce, and it's a no-brainer. SaaS BI is a no-brainer. If you're outsourcing your applications, maybe you should outsource your analytics.

Gardner: Alright, Stan, let's set this up to ask Workday. You've got your beachhead with the HCM application. You're already into payroll. How far do you expect to go, and what sort of BI payoff from your model will you get when your systems of record start increasing to include more and more business data and more applications?

Swete: There are a couple of ways we can go on that. First of all, Workday has already built up more than just HCM. We offer financial management applications and have spend-management applications.

A big part of how we're trying to develop our apps is to have very tight integration. In fact, we prefer not even to talk about integration, but we want these particular applications to be pieces of a whole. From a BI perspective, we wanted to be that. We believe that, as a customer widens their footprint with us, the value of what they can get out of their analysis is only going to increase.

I'll give you an example of that that plays out for us today. In the spend management that we offer, we give the non-compensation cost that relate to your workforce. A lot of the workforce reporting that you do all of a sudden can take on a cost component in addition to compensation. That is very interesting for managers to look at their total cost to house the workforce that they've developed and use that as input to how they want to plan.

Cost analysis

W
e do a good job of capturing and tracking contingent labor. So, you can start to do cost analysis of what your full-time employees and your contingent workers are costing you.

Our vision is that, as we can widen our footprint from an application standpoint, the payoff for what our end-users can do in terms of analysis just increases dramatically. Right now, it's attaching cost to your HR operations' data. In the future, we see augmenting HR to include more and more talent data. We're at work on that today, and we are very excited about dragging in business results and drawing that into the picture of overall performance.

You look at your workforce. You look at what they have achieved through their project work. You look at how they have graded out on that from the classical HR performance point of view. But, then you can take a hard look at what business results have generated. We think that that's a very interesting and holistic picture that our customers should be able to twist and turn with the tools we have been talking about today.

Grimes: There is a kind of truism in the analytics world that one plus one equals three. When you apply multiple methods, when you join multiple datasets, you often get out much more than the sum of what you can get with any pair of single methods or any pair of single datasets.

Some users are really going to get down and dirty with the data and with the analytical methods, and you want to support them, but you also want to deliver appropriate sophistication of analytics to other users.



If you can enable that kind of cross-business functions, cross-analytical functions, cross-datasets, then your end-users are going to end up farther along in terms of optimizing the overall business picture and overall business performance, as well as the individual functional areas, than they were before. That's just a truism, and I have seen it play out in a variety of organizations and a variety of businesses.

Swete: That’s why we think it’s really important not to introduce any seams in the application. Even today, when we've got a customer looking at their HR data, they're able to do analysis and the dimensions of how their cost centers are structured, not just how their supervisory organization is structured. So, they can get rollups and analysis along those lines. That’s just one example. We have to bridge into wider and wider financial and operational data.

Grimes: You get to a really good place, if your users don’t even know that they are pulling data from multiple sources. They don’t even really know that they are doing analytics. They just think that they are doing their job. That sounds like the direction that you all are going, and I would affirm that’s a very good direction to be going.

Some users are really going to get down and dirty with the data and with the analytical methods, and you want to support them, but you also want to deliver appropriate sophistication of analytics to other users. There are an awful lot of users in the organization who really do need analytics, but they actually don’t need to know that they are doing analytics. They just need to do their job. So, if you can deliver the analytics to them in a very unintrusive way, then you're in really good shape.

Swete: We would agree. Our challenge for doing multidimensional analysis, which you can do on these matrix reports, is to deliver that to a customer without using the word multidimensional.

Grimes: A lot of the jargon words that we have been throwing around in this podcast today, you don’t want to take those words anywhere near your end-users. They don’t need to know, and it might just cause some consternation for them. They don’t really need to know all that kind of stuff. We who provides those services and analyze them need to know that kind of stuff, but the end-users don’t usually.

Using small words

Swete: One vendor, of course, put the word pivot into the name of a product that does this dimensional exploration. Other vendors quite often talk about slice and dice. You definitely want to boil it down to words that maybe have fewer than four syllables.

Gardner: Let me throw this out to our analysts on the call today. Is there something about the SaaS model -- and I'll even expand that to the cloud model -- that will allow BI analytics to move to the end-user faster than it could happen with an on-premise or packaged application? And, is analytics, in effect, an accelerant to the adoption of the SaaS model?

I might be stretching it here, but, Jim Kobielus, what do you think? Is what Workday and Stan have been describing compelling on its own merits, regardless of some of the other SaaS benefit to start adopting more applications in this fashion?

Kobielus: Analytics generally as an accelerant to adopting a SaaS model for platforms and applications?

Grimes: Maybe it's the other way around. Maybe the platform is an accelerant to analytics. As we were talking about before, if you can eliminate some of the data movement and all of the extract, transform, and load, you're going to get faster time to data being analytically ready from the operational systems.

The analytics will migrate to where the data lives. If the data lives in the cloud or in a SaaS environment, the analytics will certainly migrate to that world.



If you adopt it as a service model, then you don’t need to have your IT staff install all the software, buy the machines to host it, all that kind of stuff. That’s a business consideration, not a technical one. You have faster time to analytics, just in the sense of the availability of those analytics.

Then, you also can accelerate the adoption of analytics, because you reduced the entry cost with a hosted solution. You don’t have to lay out a lot of money up front in order to buy the hardware and license the software. The cloud as a service will potentially enable on demand pricing, pay-as-you-go types of pricing. So, it’s a different business model that speeds the availability of analytics, and not even a technical question.

Kobielus: I agree. The analytics will migrate to where the data lives. If the data lives in the cloud or in a SaaS environment, the analytics will certainly migrate to that world. If all your data is in premises-based Oracle databases, then clearly you want a premises-based BI capability as well.

If all your data is in SaaS-based transactional systems, then your BI is going to migrate to that world. That’s why BI SaaS is such a huge and growing arena.

Also, if you look at just the practical issues here, more and more of the BI applications, advanced analytics, that we're seeing out there in the real world involve very large datasets. We're talking about hundreds of terabytes, petabytes, and so forth. Most companies of most sizes, with typical IT budgets, don’t have the money to spend on all of the storage and the servers to process all of that. They'll be glad to rent out a piece of somebody’s external cloud to host their analytical data mart for marketing campaign optimization, and the like.

A lot of that is just going into the SaaS world, because that’s the cheapest storage and the cheapest processing, multitenant. The analytics will follow the data, the huge big datasets to the cloud environment. SaaS is an accelerant for pervasive advanced analytics.

Gardner: Stan, did we miss anything in terms of looking at the SaaS model and your model in terms of where analytics fit in and the role they play?

Change delivery vehicle

Swete: I agree with everything that was just said. The thing that always occurs to me as an advantage of SaaS is that SaaS is a change delivery vehicle. If you look at the trend that we have been talking about, this sort of marrying up transactional systems with BI systems, it’s happening from both ends. The BI vendors are trying to get closer to the transactional systems and then transactional systems are trying to offer more built-in intelligence. That trend has several steps, many, many more steps forward.

The one thing that’s different about SaaS is that, if you have got a community of customers and you have got this vision for delivering built-in BI, you are on a journey. We are not at an endpoint. And, you can be on that journey with SaaS and make the entire trip.

In an on-premise model, you might make that journey, but each stop along the way is going to be three years and not multiple steps during the year. And, you might never get all the way to the end if you are a customer today.

SaaS offers the opportunity to allow vendors to learn from their customers, continue to feed innovation into their customers, and continue to add value, whereas the on-premise model does not offer that.

It’s not just about the time of the journey. It’s about do you bring all your customers along with you, because that’s the real value.



Gardner: So, a logical conclusion from that is that, if an on-premises organization takes three, six, nine years to make a journey, but their competitor is in a SaaS model that takes one, two, three years to make the journey, there is a significant competitive advantage or certainly a disparity between the data and analytics that one corporation is going to have, where it should be, versus the other.

Swete: We think so. It’s not just about the time of the journey. It’s about do you bring all your customers along with you, because that’s the real value, right? If we build the flashiest new analytic tool and there is an expensive upgrade to get there and all of our customers have to go through that at their own pace and with their own on-premise project, that’s sort of one value proposition that’s reduced.

I mentioned we are in the midst of delivering Workday 10. In two or three weeks, all of our customers will be on it, and we'll be looking forward to the next update. That’s the other value of SaaS. Not only are you able to deliver the new functionality, but you are able to keep all your customers up on it.

Gardner: Well, we're just about out of time. We've been discussing how SaaS applications can accelerate the use and power of business analytics.

I want to thank our panel today. We've been joined by Stan Swete. He is the Vice President of Product Strategy and CTO at Workday. Thank you, Stan.

Swete: Thanks.

Gardner: We've also been joined by Jim Kobielus, Senior Analyst at Forrester Research. Thanks, Jim.

Kobielus: It’s been a pleasure.

Gardner: And, Seth Grimes, Principal Consultant at Alta Plana Corp., and a contributing editor at TechWeb's Intelligent Enterprise. Thank you, Seth.

Grimes: You're welcome. Again, I appreciate the opportunity to participate.

Gardner: This is Dana Gardner, Principal Analyst at Interarbor Solutions. You've been listening to a sponsored BriefingsDirect podcast. Thanks for joining us, and come back next time.

See a demo on how Workday BI offers business users a new experience for accessing the key information to make smart decisions.

About Workday
This BriefingsDirect podcast features software-as-a-service (SaaS) upstart Workday, provider of enterprise solutions for human resources management, financial management, payroll, spend management, and benefits management.


Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Sponsor: Workday.

Transcript of a sponsored BriefingsDirect podcast on moving to a SaaS model to provide accessible data analytics. Copyright Interarbor Solutions, LLC, 2005-2010. All rights reserved.

You may also be interested in:

Monday, November 09, 2009

Part 3 of 4: Web Data Services--Here's Why Text-Based Content Access and Management Plays Crucial Role in Real-Time BI

Transcript of a sponsored BriefingsDirect podcast on information management for business intelligence, one of a series on web data services with Kapow Technologies.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Kapow Technologies.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect.

Today we present a sponsored podcast discussion on how text-based content and information from across web properties and activities are growing in importance to businesses. The need to analyze web-based text in real-time is rising to where structured data was in importance just several years ago.

Indeed, for businesses looking to do even more commerce and community building across the Web, text access and analytics forms a new mother lode of valuable insights to mine.

In Part 1 of our series on web data services with Kapow Technologies, we discussed how external data has grown in both volume and importance across the Internet, social networks, portals, and applications.

As the recession forces the need to identify and evaluate new revenue sources, businesses need to capture such web data services for their business intelligence (BI) to work better, deeper, and faster.

In Part 2, we dug even deeper into how to make the most of web data services for BI, along with the need to share those web data services inferences quickly and easily.

Now, in this podcast, Part 3 of the series, we discuss how an ecology of providers and a variety of content and data types come together in several use-case scenarios. We look specifically at how near real-time text analytics fills out a framework of web data services that can form a whole greater than the sum of the parts, and this brings about a whole new generation of BI benefits and payoffs.

Here to help explain the benefits of text analytics and their context in web data services, is Seth Grimes, principal consultant at Alta Plana Corp. Thanks for joining, Seth.

Seth Grimes: Thank you, Dana.

Gardner: We're also joined by Stefan Andreasen, co-founder and chief technology officer at Kapow Technologies. Welcome, Stefan.

Stefan Andreasen: Thank you, Dana.

Gardner: We have heard about text analytics for some time, but for many people it's been a bit complex, unwieldy, and difficult to manage in terms of volume and getting to this level of a "noise-free" text-based analytic form. Something is emerging that you can actually work with, and has now become quite important.

Let's go to you first, Seth. Tell us about this concept of noise free. What do we need to do to make text that's coming across the Web in sort of a fire hose something we can actually work with?

Difficult concept

Grimes: Dana, noise free is an interesting concept and a difficult concept, when you're dealing with text, because text is just a form of human communication. Whether it's written materials or spoken materials that have been transcribed into text, human communications are incredibly chaotic.

We have all kinds of irregularities in the way that we speak -- grammar, spelling, syntax. Putting aside any kind of irregularities, we have slang, sarcasm, abbreviations, and misspellings. Human communications are chaotic and they are full of "noise." So really getting to something that's noise-free is very ambitious.

I'm going to tell you straightforwardly, it's not possible with text analytics, if you are dealing with anything resembling the normal kinds of communications that you have with people. That's not to say that you can't aspire to a very high level of accuracy to getting the most out of the textual information that's available to you in your enterprise.

It's become an imperative to try to deal with the great volume of text -- the fire hose, as you said -- of information that's coming out. And, it's coming out in many, many different languages, not just in English, but in other languages. It's coming out 24 hours a day, 7 days a week -- not only when your business analysts are working during your business day. People are posting stuff on the web at all hours. They are sending email at all hours.

If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it.



Then, the volume of information that's coming out is huge. There are hundreds of millions of people worldwide who are on the Internet, using email, and so on. There are probably even more people who are using cell phones, text messaging, and other forms of communication.

If you want to keep up, if you want to do what business analysts have been referring to as a 360-degree analysis of information, you've got to have automated technologies to do it. You simply can't cope with the flood of information without them.

That's an experience that we went through in the last decades with transactional information from businesses. In order to apply BI or to get BI out of them, you have to apply automated methods with specialized software.

Fortunately, the software is now up to the job in the text analytics world. It's up to the job of making sense of the huge flood of information from all kinds of diverse sources, high volume, 24 hours a day. We're in a good place nowadays to try to make something of it with these technologies.

Gardner: Of course, we're seeing the mainstream media starts behaving more like bloggers and social media producers. We're starting to see that when events happen around the world, the first real credible information about them isn't necessarily from news organizations, but from witnesses. They might be texting. They might be using Twitter. It seems that if you want to get real-time information about what's going on, you need to be able to access those sorts of channels.

Text analytics

Grimes: That's a great point Dana, and it helps introduce the idea of the many different use-cases for text analytics. This is not only on the Web, but within the enterprise as well, and crossing the boundary between the Web and the inside of the enterprise.

Those use-cases can be the early warning of a Swine flu epidemic or other medical issues. You can be sure that there is text analytics going on with Twitter and other instant messaging streams and forums to try to detect what's going on.

You even have Google applying this kind of technology to look at the pattern of the searches that people are putting in. If people are searching on a particular medical issue centered in a particular geographic location, that's a good indicator that there's something unusual going on there.

It's not just medical cases. You also have brand and reputation management. If someone has started posting something very negative about your company or your products, then you want to detect that really quickly. You want early warning, so that you can react to it really quickly.

We have some great challenges out there, but . . . we have some great technologies to respond to those challenges.



We have a great use case in the intelligence world. That's one of the earliest adopters of text analytics technology. The idea is that if you are going to do something to prevent a terrorist attack, you need to detect and respond to the signals that are out there, that something is pending really quickly, and you have to have a high degree of certainty that you're looking at the right thing and that you're going to react appropriately.

We have some great challenges out there, but, as I said, we have some great technologies to respond to those challenges in a whole variety of business, government, and other applications.

Gardner: Stefan, I think there are very few people who argue with the fact that there is great information out there on the Web, across these different new channels that have become so prominent, but making that something that you can use is a far different proposition. Seth has been telling us about automated tools. Tell us what you see in terms of web data services and how we can make this information available to automated system.

Deep data

Andreasen: Thank you Dana. Let's just look at something like Google. You go there and do a search, and you think that you're searching the entire Internet. But, you're not, because you're probably not going to access data that's hidden behind logins, behind search forms, and so on.

There is a huge amount of what I call "deep web," very valuable information that you have to get to in some other way. That's where we come in and allow you to build robots that can go to the deep web and extract information.

I'd also like to talk a little bit more about the noise-free thing and go to the Google example. Let's say you go to Google and you search for "IBM software." You think that you will be getting an article that has something to do with IBM software.

You often actually find an article that has nothing to do with IBM software, but, because there are some advertisements from IBM, IBM was a hit. There is some other place that links to software, and you will find software. Basically, end up in something completely irrelevant.

Eliminating noise is getting rid of all this stuff around the article that is really irrelevant, so you get better results.

The other thing around noise-free is the structure. It would be great if you could say, "I want to search an article about IBM software which was dated after Oct. 7," or whatever, but that means you also need to have that additional structured information in it.

It's very important to have tools that can . . . understand where the content is within a page and what's the navigation on that page.



The key here is to get noise-free data and to get full data. It's not only to go to the deep web, but also get access to the data in a noise-free way, and in at least a semi-structured way, so that you can do better text analysis, because text analysis is extremely dependent on the quality of data.

Grimes: I have to agree with you there, Stefan. It's very important to have tools that can strip away not only the ads, but understand where the content is within a page and what's the navigation on that page.

We might not be interested in navigation elements, the fluff that's on a page. We want to focus on the content. In addition, nowadays on the Web, there's a big problem of duplication of material that's been hosted in multiple sites. If you're dealing with email or forums, then people typically quote previous items in their reprise, and you want to detect and strip that kind of stuff away and focus on the real relevant content. That is definitely part of the noise-free equation, getting to the authentic content.

Gardner: Stefan, you refer to the deep web. I imagine this also has a role, when it comes to organizations trying to uncover information inside of their firewalls, perhaps among their many employees and all the different tools that they're using. We used to call it the intranet, but is there an intranet effect here for this ability to gather noise-free text information that we can then start processing?

Extended intranet

Andreasen: Absolutely. I'd even say the extended intranet. If we're looking at a web browser, which is the way that most business analysts or other persons today are accessing business applications, we're accessing three different kinds of applications.

One involves applications inside the firewall. It could be the corporate intranet, etc. Then there are applications where you have to use a login, and this can be your partners. You're logging in to your supplier to see if some item is in stock. Or, it can be some federal reporting site or something.

The sites behind the login are like the extended enterprise. Then, of course, there is everything out of the World Wide Web -- more than 150 million web pages out there -- which have all kinds of data, and a lot of that is behind search forms, and so on.

Gardner: Seth, as a consultant and analyst, you've been focused on text analytics for some time, but perhaps a number of our listeners aren't that familiar with it. Could you maybe give us a brief primer on what it is that happens when you identify some information -- be it Internet, extended web, deep web? How do you go through some basic steps to analyze, cleanse, and then put data into a form that you can then start working with?

Grimes: Dana, I'm going to first give you an extremely short history lesson, a little factoid for you. Text analytics actually predates BI. The basic approaches to analyzing textual sources were defined in the late '50s. Actually, there is a paper from an IBM researcher from 1958, that defines BI as the analysis of textual sources.

People apply so-called machine-learning technologies in order to improve the accuracy of what they are doing.



What happened is that enterprises computerized their operations, their accounting, their sales, all of that in the 1960s. That numerical data from transactional systems is readily analyzable, where text is much more difficult to analyze. But, now we have come to the point, as I said earlier, where there is software and great methods for analyzing text.

What do they do? The front-end of any text analysis system is going to be information retrieval. Information retrieval is a fancy, academic type of term, meaning essentially the same thing as search. We want to take a subset of all of the information that's out there in the so-called digital universe and bring in only what's relevant to our business problems at hand. Having the infrastructure in place to do that is a very important aspect here.

Once we have that information in hand, we want to analyze it. We want to do what's called information extraction, entity extraction. We want to identify the names of people, geographical location, companies, products, and so on. We want to look for pattern-based entities like dates, telephone numbers, addresses. And, we want to be able to extract that information from the textual sources.

In order to do that, people usually apply a combination of statistical and linguistic methods. They look for language patterns in the text. They look for statistics like the co-occurrence of words in multiple text. When two words appear next to each other or close to each other in many different documents -- that can be web pages or other documents -- that indicates the degree of relationship. People apply so-called machine-learning technologies in order to improve the accuracy of what they are doing.

Suitable technologies

All of this sounds very scientific and perhaps abstruse -- and it is. But, the good message here is one that I have said already. There are now very good technologies that are suitable for use by business analysts, by people who aren't wearing those white lab coats and all of that kind of stuff. The technologies that are available now focus on usability by people who have business problems to solve and who are not going to spend the time learning the complexities of the algorithms that underlie them.

So, we're at the point now where you can even treat some of these technologies as black boxes. They just work. They produce the results that you need in the form that you need them. That can be in a form that extracts the information into databases, where you can do the same kind of BI that you have been used to for the last 20 years or so with BI tools.

It can be visualizations that allow you to see the interrelationships among the people, the companies, and the products that are identified in the text. If you're working in law enforcement or intelligence, that could be interrelationships among individuals, organizations, and incidents of various types. We have visualization technologies and BI technologies that work on top of this.

Then, we have one other really nice thing that's coming on the horizon, which is semantic web technology -- the ability to use text analytics to support building a web of data that can be queried and navigated by automated software tools. That makes it even easier for individuals to carry out everyday business and personal problems for that matter.

Obviously, any BI or any text analysis is no better than the data source behind it.



Gardner: I'd like to dig into some use-cases and understand a little bit better how this is being used productively in the field. Before we do that, Stefan, maybe you could explain from Kapow Technologies' perspective, how you relate to this text analytics field that Seth so nicely just described. Where does Kapow begin and end, and how do you play perhaps within an ecosystem of providers that help with text analytics?

Andreasen: Text analytics, exactly as Seth was saying, is really a form of BI. In BI, you are examining some data and drawing some conclusions, maybe even making some automated actions on it.

Obviously, any BI or any text analysis is no better than the data source behind it. There are four extremely important parameters for the data sources. One is that you have the right data sources.

There are so many examples of people making these kind of BI applications, text analytics applications, while settling for second-tier data sources, because they are the only ones they have. This is one area where Kapow Technologies comes in. We help you get exactly the right data sources you want.

The other thing that's very important is that you have a full picture of the data. So, if you have data sources that are relevant from all kinds of verticals, all kinds of media, and so on, you really have to be sure you have a full coverage of data sources. Getting a full coverage of data sources is another thing that we help with.

Noise-free data

We already talked about the importance of noise-free data to ensure that when you extract data from your data source, you get rid of the advertisements and you try to get the major information in there, because it's very valuable in your text analysis.

Of course, the last thing is the timeliness of the data. We all know that people who do stock research get real-time quotes. They get it for a reason, because the newer the quotes are, the surer they can look into the crystal ball and make predictions about the future in a few seconds.

The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future. If you are predicting what happens in two years, that doesn't really matter. You need to know what's happening tomorrow. So, the timeliness of the data is important.

Let me get to the approach that we're taking. Business analysts work with business applications through their web browser. They actually often cut and paste data out of business application into some spreadsheet.

The world is really changing around us. Companies need to look into the crystal ball in the nearer and nearer future.



You can see our product as a web browser, where you can teach it how to interact with the website, how to only extract the data that's relevant, and how you can structure that data, and then repeat it. Our product can give you automated, real-time, and noise-free access to any data you see in a web browser.

How does that apply to text analytics? Well, it gives you the 100-percent covered, real-time data source, with all of those values that I just explained.

Gardner: I really was intrigued by this notion of the crystal ball, and not two years from now, but tomorrow. It seems to me that so many people are putting up so much information about their lives, their preferences. People in business are doing the same around their occupation. We have this virtual focus group going on around us all the time. If we could just suck out the right information based on our products, we could get that crystal ball polished up.

Let me go back to you, Stefan. Can you give us an example of where a market research, customer satisfaction, or virtual focus group benefit is being derived from these text analytics capabilities?

Knowing the customer

Andreasen: Absolutely. For any company selling services or products, the most important thing for them to know is what the customers think about their product. Are we giving our customers the right customer service? Are we packaging our products the right way? How do we understand the customer's buying behavior, the customer communications, and so on?

Intuit is a customer we have together with a text analysis company called Clarabridge. They use text analysis solution to understand the TurboTax customers.

Before they had a text analysis system, they had some people that did one percent coverage sampling of forums on the web, their own customer support system, and emails into their contact center to get some rudimentary overview of what the customer thought.

We went in, and with Kapow Technologies they can now get to all these data sources -- forums online, their own customer support center, and wherever there are networks of TurboTax users -- and extract all the information in near real-time. Then, they use the text-analysis engine to make much, much better predictions of what the customers think, and they actually having the finger on the pulse.

With the web, you don't have to get those people together, because they come together on their own and participate in social media forums of various types.



If a set of customers suddenly talk about a feature that doesn't work, or that is much better in the competitor's product -- and thereby looking into the near future of the crystal ball --they can react early and try to deal with this in the best possible way.

Gardner: Seth Grimes, is this an area where you have seen a lot of the text analytics work focused on these sort of virtual focus groups?

Grimes: Definitely. That's an interesting concept. The idea behind a focus group is that it's a traditional qualitative research tool for market research firms. They get a bunch of people into a room and they have the facilitator lead those people through a conversation to talk about brand names, marketing, positioning, and then get their reactions to it.

With the web, you don't have to get those people together, because they come together on their own and participate in social media forums of various types. There are a whole slew of them. Together they constitute a virtual focus group, as you say.

The important point here is to get at the so-called voice of the customer. In other words, what is the customer saying in his own voice, not in some form where you're forcing that person to tick off number one, two, three, four, or five, in order to rate your product. They can bring up the issues that are of interest to them, whether they are good or bad issues, and they can speak about those issues however they naturally do. That's very important.

I've actually been privileged to share a stage with the analytics manager from Intuit, Chris Jones, a number of times to talk about what he is doing, the technologies, and so on. It's really interesting stuff that amplifies what Stefan had to say.

Broad picture

The idea is that you can use these technologies, both to get a broad picture of the issues, and no longer have to bend those issues into categories that your business analysts have predefined. Now, you can generate the topics of most interest, using automated, statistical methods from what the people are actually saying. In other words, you let them have their own voice.

You also get the effect of not only looking at the aggregate picture, at the mass of the market, but also at the individual cases. If someone posts about a problem with one of the products to an online forum, you can detect that there's an issue there.

You can make sure that the issues gets to the right person, and the company can personally address each issue in order to really keep it from escalating and getting a lot of attention that you really don't want it to get. You get the reputation of being a very responsive company. That's a very important thing.

The goal here is not necessarily to make more money. The goal is to boost your customer satisfaction rating, Net Promoter score, or however you choose to measure it. These technologies, the text technologies, are a very important package and part of the overall package of responding to customer issues and boosting customer satisfaction.

While you're doing it, those people are going to buy more. They're going to reduce your support costs, all of that kind of stuff, and you are going to make more money. So, by doing the right thing, you're also doing something good for your own company.

What you really want to know is who this person knows in all kinds of social networks on the 'Net, and to try to make a network of who are the real influencers and who are the network centers.



Gardner: In business, you want to reduce the guesswork to do better by your customers. Stefan, as I understand it, Kapow Technologies has been quite successful in working with a variety of military, government, and intelligence agencies around the world on getting this real-time information as to what's going on, but perhaps with the stakes being a bit higher, things like terrorism, and even insurrections and uprising.

Tell us a little bit about a second use case scenario, where text analytics are being used by government agencies and intelligence agencies.

Andreasen: As Seth said, the voice of the customer is very interesting and very valuable use case with text analysis. I'll add one thing to what Seth said. He was talking about product input, and of course, we all know that developing products -- maybe not so much a product like TurboTax, but developing a car -- is extremely expensive. So, understanding what kind of product your customers want in the future is an important part of the voice of the customer.

With a lot of the customers in the military intelligence, it's similar. Of course, they would like to know what people are writing from a sentiment point of view, an opinion point of view, but another thing that's actually even more important in the intelligence community is what I will call relationships.

Seth mentioned relationships earlier, and also understanding the real influencers and who are the ones that have the most connections in these relationships. Let's say somebody writes an article about how you mix some chemicals together to make an efficient bomb. What you really want to know is who this person knows in all kinds of social networks on the 'Net, and to try to make a network of who are the real influencers and who are the network centers.

Finding relationships

We see a lot of uses of our product, going out to blogs, forums, etc., in all kinds of languages, translating it often into English, and doing this relationship analysis. A very popular product for that, which is a partner of ours, is Palantir Technologies. It has a very cool interactive way of finding relationships. I think this is also very relevant for normal enterprises.

Yesterday I met with one of the big record companies, which is also a customer of ours. As soon as I explained this relationship stuff, they said, "We can really use this for anti-piracy, because it is really just very few people who do the major work when it gets to getting copies of new films out in the 'Net. So, understanding this relationship can be very relevant for this kind of scenario as well.

Grimes: Dana, when you introduced our podcast today, you used the term ecology or ecosystem, and that's a real great concept that we can apply here in a number of dimensions. We do have an ecosystem in at least two dimensions.

Stefan mentioned one of the Kapow partners, Palantir. We earlier mentioned the text analytics partner, Clarabridge. We have the ability now through integration technologies like Kapow to bring together different information sources, very disparate, different information sources with different characteristics, to provide an ecosystem of information that can be analyzed and brought to bear to solve particular business or government problems.

I find that ecosystem concept to be very useful here in framing the discussions about how the text technologies fit into something that's a much larger picture.



We have a set of software technologies that can similarly be integrated into an ecosystem to help you solve those problems. That might be text analysis technologies. It might be traditional BI or data warehousing technologies. It might be visualization technologies, whatever it takes to handle your particular business problem.

As we've been discussing, we do see applications in a whole variety of business and government issues, whether it's customer or intelligence or many other things that we haven’t even discussed today. So, I find that ecosystem concept to be very useful here in framing the discussions about how the text technologies fit into something that's a much larger picture.

Gardner: So, we are looking at the ecologies. We are looking at some of these use-cases. It seems to me that we also want to be able to gather information from a variety of different players, perhaps in some sort of a supply chain, ecosystem, business process, channel partners, or value added partners. The ecology and ecosystem concept works not only in terms of what we do with this information, but how we can apply that information back out to activities that are multi-player, beyond the borders or boundaries of any one organization.

I'm thinking about product recall, health, and public-health types of issues. Seth, have you worked with any clients or do you have any insights into how text analytics is benefiting an extended supply chain of some sort, and how the ecosystem of insight into the text analytics solves some unique problems there?

Product recall

Grimes: Product recall is an interesting one. Let me give you an example there. This is, like most examples that we are going to discuss, a multifaceted one.

People are all familiar with the problems with Firestone tires back a number of years ago, early in this decade, where the tread was coming off tires. Well, there are a number of parties that are going to be interested in this problem.

I am sorry, but put aside the consumers who are obviously affected by it, very badly affected by it. But, we have the manufacturers, not only of the tires, but also of the vehicles, the Ford Explorer in this case.

We have the regulatory bodies in the government, parts of the U.S. Department of Transportation. We have the insurance industry. All of these are stakeholders who have an interest in early detection, early addressing, and early correction of problem.

You don't want to wait until there are just so many cases here that it's just obvious to everyone, the issues really spill out into the press, and there are questions of negligence, and so on. So, how can you address something like a problem with tires where the tread is coming off?

You don't want to wait until there are just so many cases here that it's just obvious to everyone, the issues really spill out into the press, and there are questions of negligence, and so on.



Well, one way is warranty claims. For example, someone might file a claim through the vehicle manufacturer, Ford in this case, or through the tire manufacturer, claiming a defective product. Sometimes, just an individual tire is defective, but sometimes that's an indication of manufacturing or design issues. So you have warranty claims.

You also have accident reports that are filed by police departments or other government agencies and find their way into databases in the Department of Transportation and other places. Then, you have news reports about particular incidents.

There are multiple sources of information. There are multiple stakeholders here. And, there are multiple ways of getting at this. But, like so many problems, you're going to get at the issue much faster, if you combine information from all of these different sources, rather than relying on a single source.

Again, that's where the importance of building up an ecosystem of different data sources that come to bear on your problem is really important, and that's just a typical use case. I know of other organizations, manufacturing organizations, that are using this technology in conjunction with data-mining technologies for warranty claims, for example. Consumer appliances is another area that I have heard a lot about, but really there is no limitation in where you can apply this.

Gardner: Stefan, from your perspective, for these extended supply chains, public health issues, etc., again we get down to this critical time element -- for example, the Swine flu outbreak last spring. If folks could identify through text analytics where this was starting to crop up, they didn't have to wait for the hospital reports necessarily. Is that an instance where some of these technologies can really play an important role?

Big pitfall

Andreasen: Absolutely. Before I get into some more real examples, I want to emphasize some of the things that Seth was saying. He's talking about getting to multiple data sources. I cannot stress enough that what I have seen out there as one of the biggest pitfalls when people are making a text analysis solution or actually any BI solution is that they look at what data sources they have and they settle for that.

They should have said, "What are the optimal data sources to get the best prediction and get the best outcome out of this text analysis?" They should settle for no less than that.

The example here will actually explain that. I also have a tire example. We actually have two different kinds of customers using our products looking at tires, tire explosions, and tire recalls.

One is a tire company itself. They go to automated forums and try to monitor if people are doing exactly what Seth is saying, filing claims or writing on an automotive blog: "I got this tire, and it exploded." "It's just really bad." "Don't buy it." All those kinds of information from different sources.

If you get enough of the data source and you get that data in real-time, you can actually go in and contain the situation of a potential tire recall before it happens, which of course could be very valuable for your company.

Many different players here can use the same kind of information for different purposes, and that makes this really interesting.



The other use case is stock research. We have a lot of customers doing financial and market research with our technology. One of them is using our product, for example, to go out and check the same forums, but their objective is to predict if there is a tire recall. Then, they can predict that the stock is going to get a crash, when that happens, and project that beforehand.

Many different players here can use the same kind of information for different purposes, and that makes this really interesting as well.

Gardner: Well, it really seems the age old part of this is that, getting information first has many, many advantages, but the new element is that more and more information is in the form of analytics out in the web.

I wonder if we could cap this discussion -- we are about out of time -- by looking at the future. Seth, you mentioned earlier the semantic web. How automated can this get, and what needs to take place in order for that vision of a semantic web to take place?

Grimes: Well, the semantic web right now is a dream. It's a dream that was first articulated over a decade ago by Tim Berners-Lee, the person who created the World Wide Web, but it is one that is on the fast track to being realized. Being realized in this case means creating meaning.

What Stefan was referring to earlier when he talked about the dates of a published article, the title, perhaps other metadata fields such as the author, creating information that describes what's out there on the web and in databases.

Machine processable

Rendering that information into a form that's machine processable, not only in the sense of analysis, but also in the sense of making interconnections among different pieces of information, is what the semantic web is really about. It's about structuring information that's out there on the Web. That can include what Stefan referred to as the deep web, and creating tools that allow people to search and issue other types of queries against that web data.

It's something that people are working hard on now, but I don't think will be really realized in terms of any broad business usable applications for a fair number of years. Not next year or the year after, but maybe three to five years out, we will really start to see a very broadly useful business application. There is going to be niche applications in the near term, but later something much broader.

It's a direction that really hits on the themes that we have been talking about today, integrating applications and data from multiple sources and of multiple types in order to create a whole that is much greater than each of the parts.

We need software technologies that can do that nowadays, and fortunately we have them.



We need software technologies that can do that nowadays, and fortunately we have them, as we have been discussing. We need a path that will evolve us towards something that really creates much greater value for much larger massive applications in the future, and fortunately the technologies that we have now are evolving in that direction.

Gardner: Very good. I think we have to leave it there. I want to thank both of our guests. We have been discussing the role of text analytics and how companies can take advantage of that and bring that into play with their BI and marketing and other activities, and how the mining of this information is now being done by tools and is increasingly being automated.

I want to thank Seth Grimes, principal consultant at Alta Plana Corp., for joining us. Thanks so much, Seth.

Grimes: Again, thank you Dana, and thanks to Kapow for making this possible.

Gardner: Also, Stefan Andreasen, co-founder and CTO at Kapow Technologies. Thanks again for sponsoring and joining us, Stefan.

Andreasen: Well, thank you. That was a great discussion. Thank you.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. This is Part Three of a series from Kapow Technologies on using BI and web data services in unique forms to increase business benefits.

You have been listening to a sponsored BriefingsDirect podcast. Thanks and come back next time.

Listen to the podcast. Find it on iTunes/iPod and Podcast.com. Download the transcript. Learn more. Sponsor: Kapow Technologies.

Transcript of a sponsored BriefingsDirect podcast on information management for business intelligence, one of a series on web data services with Kapow Technologies. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.