Dana Gardner: Hello, and welcome to a special BriefingsDirect thought leadership interview series coming to you in conjunction with The Open Group Conference on Jan. 28 in Newport Beach, California.
We are here now with one of the main speakers at the conference, Michael Cavaretta, PhD, Technical Leader of Predictive Analytics for Ford Research and Advanced Engineering in Dearborn, Michigan.
We’ll see how Ford has exploited the strengths of big data analytics by directing them internally to improve business results. In doing so, they scour the metrics from the company’s best processes across myriad manufacturing efforts and through detailed outputs from in-use automobiles, all to improve and help transform their business. [Disclosure: The Open Group is a sponsor of BriefingsDirect podcasts.]
Cavaretta has led multiple data-analytic projects at Ford to break down silos inside the company to best define Ford’s most fruitful data sets. Ford has successfully aggregated customer feedback, and extracted all the internal data to predict how best new features in technologies will improve their cars.
As a lead-in to his Open Group presentation, Michael and I will now explore how big data is fostering business transformation by allowing deeper insights into more types of data efficiently, and thereby improving processes, quality control, and customer satisfaction.
With that, please join me in welcoming Michael Cavaretta. Welcome to BriefingsDirect, Michael.
Michael Cavaretta: Thank you very much.
Gardner: Your upcoming presentation for The Open Group Conference is going to describe some of these new approaches to big data and how that offers some valuable insights into internal operations, and therefore making a better product. To start, what's different now in being able to get at this data and do this type of analysis from, say, five years ago?
Cavaretta: The biggest difference has to do with the cheap availability of storage and processing power, where a few years ago people were very much concentrated on filtering down the datasets that were being stored for long-term analysis. There has been a big sea change with the idea that we should just store as much as we can and take advantage of that storage to improve business processes.
Gardner: That sounds right on the money, but how did we get here? How do we get to the point where we could start using these benefits from a technology perspective, as you say, better storage, networks, being able to move big dataset, that sort of thing, to wrenching out benefits. What's the process behind the benefit?
Sea change in attitude
Cavaretta: The process behind the benefits has to do with a sea change in the attitude of organizations, particularly IT within large enterprises. There's this idea that you don't need to spend so much time figuring out what data you want to store and worry about the cost associated with it, and more about data as an asset. There is value in being able to store it, and being able to go back and extract different insights from it. This really comes from this really cheap storage, access to parallel processing machines, and great software.
Gardner: It seems to me that for a long time, the mindset was that data is simply the output from applications, with applications being primary and the data being almost an afterthought. It seems like we sort flipped that. The data now is perhaps as important, even more important, than the applications. Does that seem to hold true?
Gardner: I suppose earlier, when cost considerations and technical limitations were at work, we would just go for a tip-of-the-iceberg level. Now, as you say, we can get almost all the data. So, is this a matter of getting at more data, different types of data, bringing in unstructured data, all the above? How much you are really going after here?
Cavaretta: I like to talk to people about the possibility that big data provides and I always tell them that I have yet to have a circumstance where somebody is giving me too much data. You can pull in all this information and then answer a variety of questions, because you don't have to worry that something has been thrown out. You have everything.
You may have 100 questions, and each one of the questions uses a very small portion of the data. Those questions may use different portions of the data, a very small piece, but they're all different. If you go in thinking, "We’re going to answer the top 20 questions and we’re just going to hold data for that," that leaves so much on the table, and you don't get any value out of it.
Gardner: I suppose too that we can think about small samples or small datasets and aggregate them or join them. We have new software capabilities to do that efficiently, so that we’re able to not just look for big honking, original datasets, but to aggregate, correlate, and look for a lifecycle level of data. Is that fair as well?
Cavaretta: Definitely. We're a big believer in mash-ups and we really believe that there is a lot of value in being able to take even datasets that are not specifically big-data sizes yet, and then not go deep, not get more detailed information, but expand the breadth. So it's being able to augment it with other internal datasets, bridging across different business areas as well as augmenting it with external datasets.
A lot of times you can take something that is maybe a few hundred thousand records or a few million records, and then by the time you’re joining it, and appending different pieces of information onto it, you can get the big dataset sizes.
Gardner: Just to be clear, you’re unique. The conventional wisdom for big data is to look at what your customers are doing, or just the external data. You’re really looking primarily at internal data, while also availing yourself of what external data might be appropriate. Maybe you could describe a little bit about your organization, what you do, and why this internal focus is so important for you.
Cavaretta: I'm part of a larger department that is housed over in the research and advanced-engineering area at Ford Motor Company, and we’re about 30 people. We work as internal consultants, kind of like Capgemini or Ernst & Young, but only within Ford Motor Company. We’re responsible for going out and looking for different opportunities from the business perspective to bring advanced technologies. So, we’ve been focused on the area of statistical modeling and machine learning for I’d say about 15 years or so.
And in this time, we’ve had a number of engagements where we’ve talked with different business customers, and people have said, "We'd really like to do this." Then, we'd look at the datasets that they have, and say, "Wouldn’t it be great if we would have had this. So now we have to wait six months or a year."
These new technologies are really changing the game from that perspective. We can turn on the complete fire-hose, and then say that we don't have to worry about that anymore. Everything is coming in. We can record it all. We don't have to worry about if the data doesn’t support this analysis, because it's all there. That's really a big benefit of big-data technologies.
Gardner: If you've been doing this for 15 years, you must be demonstrating a return on investment (ROI) or a value proposition back to Ford. Has that value proposition been changing? Do you expect it to change? What might be your real value proposition two or three years from now?
Cavaretta: The real value proposition definitely is changing as things are being pushed down in the company to lower-level analysts who are really interested in looking at things from a data-driven perspective. From when I first came in to now, the biggest change has been when Alan Mulally came into the company, and really pushed the idea of data-driven decisions.
Before, we were getting a lot of interest from people who are really very focused on the data that they had internally. After that, they had a lot of questions from their management and from upper level directors and vice-president saying, "We’ve got all these data assets. We should be getting more out of them." This strategic perspective has really changed a lot of what we’ve done in the last few years.
Gardner: As I listen to you Michael, it occurs to me that you are applying this data-driven mentality more deeply. As you pointed out earlier, you're also going after all the data, all the information, whether that’s internal or external.
In the case of an automobile company, you're looking at the factory, the dealers, what drivers are doing, what the devices within the automobile are telling you, factoring that back into design relatively quickly, and then repeating this process. Are we getting to the point where this sort of Holy Grail notion of a total feedback loop across the lifecycle of a major product like an automobile is really within our grasp? Are we getting there, or is this still kind of theoretical. Can we pull it altogether and make it a science?
Cavaretta: The theory is there. The question has more to do with the actual implementation and the practicality of it. We still are talking a lot of data where even with new advanced technologies and techniques that’s a lot of data to store, it’s a lot of data to analyze, there’s a lot of data to make sure that we can mash-up appropriately.
And, while I think the potential is there and I think the theory is there. There is also a work in being able to get the data from multiple sources. So everything which you can get back from the vehicle, fantastic. Now if you marry that up with internal data, is it survey data, is it manufacturing data, is it quality data? What are the things do you want to go after first? We can’t do everything all at the same time.
Our perspective has been let’s make sure that we identify the highest value, the greatest ROI areas, and then begin to take some of the major datasets that we have and then push them and get more detail. Mash them up appropriately and really prove up the value for the technologists.
Gardner: Clearly, there's a lot more to come in terms of where we can take this, but I suppose it's useful to have a historic perspective and context as well. I was thinking about some of the early quality gurus like Deming and some of the movement towards quality like Six Sigma. Does this fall within that same lineage? Are we talking about a continuum here over that last 50 or 60 years, or is this something different?
Cavaretta: That’s a really interesting question. From the perspective of analyzing data, using data appropriately, I think there is a really good long history, and Ford has been a big follower of Deming and Six Sigma for a number of years now.
The difference though, is this idea that you don't have to worry so much upfront about getting the data. If you're doing this right, you have the data right there, and this has some great advantages. You’ll have to wait until you get enough history to look for somebody’s patterns. Then again, it also has some disadvantage, which is you’ve got so much data that it’s easy to find things that could be spurious correlations or models that don’t make any sense.
The piece that is required is good domain knowledge, in particular when you are talking about making changes in the manufacturing plant. It's very appropriate to look at things and be able to talk with people who have 20 years of experience to say, "This is what we found in the data. Does this match what your intuition is?" Then, take that extra step.
Gardner: Tell me a little about sort a day in the life of your organization and your team to let us know what you do. How do you go about making more data available and then reaching some of these higher-level benefits?
Cavaretta: We're very much focused on interacting with the business. Most of all, we do have to deal with working on pilot projects and working with our business customers to bring advanced analytics and big data technologies to bear against these problems. So we work in kind of what we call push-and-pull model.
We go out and investigate technologies and say these are technologies that Ford should be interested in. Then, we look internally for business customers who would be interested in that. So, we're kind of pushing the technologies.
From the pull perspective, we’ve had so many successful engagements in such good contacts and good credibility within the organization that we've had people come to us and say, "We’ve got a problem. We know this has been in your domain. Give us some help. We’d love to be able to hear your opinions on this."
So we’ve pulled from the business side and then our job is to match up those two pieces. It's best when we will be looking at a particular technology and we have somebody come to us and we say, "Oh, this is a perfect match."
Those types of opportunities have been increasing in the last few years, and we've been very happy with the number of internal customers that have really been very excited about the areas of big data.
Gardner: Because this is The Open Group Conference and an audience that’s familiar with the IT side of things, I'm curious as to how this relates to software and software development. Of course there are so many more millions of lines of code in automobiles these days, software being more important than just about everything. Are you applying a lot of what you are doing to the software side of the house or are the agile and the feedback loops and the performance management issues a separate domain, or it’s your crossover here?
Cavaretta: There's some crossover. The biggest area that we've been focused on has been picking information, whether internal business processes or from the vehicle, and then being able to bring it back in to derive value. We have very good contacts in the Ford IT group, and they have been fantastic to work with in bringing interesting tools and technology to bear, and then looking at moving those into production and what’s the best way to be able to do that.
A fantastic development has been this idea that we’re using some of the more agile techniques in this space and Ford IT has been pushing this for a while. It’s been fantastic to see them work with us and be able to bring these techniques into this new domain. So we're pushing the envelope from two different directions.
Gardner: It sounds like you will be meeting up at some point with a complementary nature to your activities.
Gardner: Let’s move on to this notion of the "Internet of things," a very interesting concept that lot of people talk about. It seems relevant to what we've been discussing.
We have sensors in these cars, wireless transfer of data, more-and-more opportunity for location information to be brought to bear, where cars are, how they're driven, speed information, all sorts of metrics, maybe making those available through cloud providers that assimilate this data.
So let’s not go too deep, because this is a multi-hour discussion all on its own, but how is this notion of the Internet of things being brought to bear on your gathering of big data and applying it to the analytics in your organization?
Cavaretta: It is a huge area, and not only from the internal process perspective -- RFID tags within the manufacturing plans, as well as out on the plant floor, and then all of the information that’s being generated by the vehicle itself.
The Ford Energi generates about 25 gigabytes of data per hour. So you can imagine selling couple of million vehicles in the near future with that amount of data being generated. There are huge opportunities within that, and there are also some interesting opportunities having to do with opening up some of these systems for third-party developers. OpenXC is an initiative that we have going on to add at Research and Advanced Engineering.
Huge number of sensors
We have a lot of data coming from the vehicle. There’s huge number of sensors and processors that are being added to the vehicles. There's data being generated there, as well as communication between the vehicle and your cell phone and communication between vehicles.
There's a group over at Ann Arbor Michigan, the University of Michigan Transportation Research Institute (UMTRI), that’s investigating that, as well as communication between the vehicle and let’s say a home system. It lets the home know that you're on your way and it’s time to increase the temperature, if it’s winter outside, or cool it at the summer time.
The amount of data that’s been generated there is invaluable information and could be used for a lot of benefits, both from the corporate perspective, as well as just the very nature of the environment.
Gardner: Just to put a stake in the ground on this, how much data do cars typically generate? Do you have a sense of what now is the case, an average?
Cavaretta: The Energi, according to the latest information that I have, generates about 25 gigabytes per hour. Different vehicles are going to generate different amounts, depending on the number of sensors and processors on the vehicle. But the biggest key has to do with not necessarily where we are right now but where we will be in the near future.
With the amount of information that's being generated from the vehicles, a lot of it is just internal stuff. The question is how much information should be sent back for analysis and to find different patterns? That becomes really interesting as you look at external sensors, temperature, humidity. You can know when the windshield wipers go on, and then to be able to take that information, and mash that up with other external data sources too. It's a very interesting domain.
Gardner: So clearly, it's multiple gigabytes per hour per vehicle and probably going much higher.
Gardner: Let's move forward now for those folks who have been listening and are interested in bringing this to bear on their organizations and their vertical industries, from the perspective of skills, mindset, and culture. Are there standards, certification, or professional organizations that you’re working with in order to find the right people?
It's a big question. Let's look at what skills do you target for your group, and what ways you think that you can improve on that. Then, we’ll get into some of those larger issues about culture and mindset.
Cavaretta: The skills that we have in our department, in particular on our team, are in the area of computer science, statistics, and some good old-fashioned engineering domain knowledge. We’ve really gone about this from a training perspective. Aside from a few key hires, it's really been an internally developed group.
The biggest advantage that we have is that we can go out and be very targeted with the amount of training that we have. There are such big tools out there, especially in the open-source realm, that we can spin things up with relatively low cost and low risk, and do a number of experiments in the area. That's really the way that we push the technologies forward.
Gardner: Why The Open Group? Why is that a good forum for your message, and for your research here?
Cavaretta: The biggest reason is the focus on the enterprise, where there are a lot of advantages and a lot of business cases, looking at large enterprises and where there are a lot of systems, companies that can take a relatively small improvement, and it can make a large difference on the bottom-line.
Talking with The Open Group really gives me an opportunity to be able to bring people on board with the idea that you should be looking at a difference in mindset. It's not "Here’s a way that data is being generated, look, try and conceive of some questions that we can use, and we’ll store that too." Let's just take everything, we’ll worry about it later, and then we’ll find the value.
Gardner: I'm sure the viewers of your presentation on January 28 will be gathering a lot of great insights. A lot of the people that attend The Open Group conferences are enterprise architects. What do you think those enterprise architects should be taking away from this? Is there something about their mindset that should shift in recognizing the potential that you've been demonstrating?
Cavaretta: It's important for them to be thinking about data as an asset, rather than as a cost. You even have to spend some money, and it may be a little bit unsafe without really solid ROI at the beginning. Then, move towards pulling that information in, and being able to store it in a way that allows not just the high-level data scientist to get access to and provide value, but people who are interested in the data overall. Those are very important pieces.
The last one is how do you take a big-data project, how do you take something where you’re not storing in the traditional business intelligence (BI) framework that an enterprise can develop, and then connect that to the BI systems and look at providing value to those mash-ups. Those are really important areas that still need some work.
Gardner: Another big constituency within The Open Group community are those business architects. Is there something about mindset and culture, getting back to that topic, that those business-level architects should consider? Do you really need to change the way you think about planning and resource allocation in a business setting, based on the fruits of things that you are doing with big data?
Cavaretta: I really think so. The digital asset that you have can be monetized to change the way the business works, and that could be done by creating new assets that then can be sold to customers, as well as improving the efficiencies of the business.
High quality data
This idea that everything is going to be very well-defined and there is a lot of work that’s being put into making sure that data has high quality, I think those things need to be changed somewhat. As you're pulling the data in, as you are thinking about long-term storage, it’s more the access to the information, rather than the problem in just storing it.
Gardner: Interesting that you brought up that notion that the data becomes a product itself and even a profit center perhaps.
Cavaretta: Exactly. There are many companies, especially large enterprises, that are looking at their data assets and wondering what can they do to monetize this, not only to just pay for the efficiency improvement but as a new revenue stream.
Gardner: We're almost out of time. For those organizations that want to get started on this, are there any 20/20 hindsights or Monday morning quarterback insights you can provide. How do you get started? Do you appoint a leader? Do you need a strategic roadmap, getting this culture or mindset shifted, pilot programs? How would you recommend that people might begin the process of getting into this?
Cavaretta: We're definitely a huge believer in pilot projects and proof of concept, and we like to develop roadmaps by doing. So get out there. Understand that it's going to be messy. Understand that it maybe going to be a little bit more costly and the ROI isn't going to be there at the beginning.
But get your feet wet. Start doing some experiments, and then, as those experiments turn from just experimentation into really providing real business value, that’s the time to start looking at a more formal aspect and more formal IT processes. But you've just got to get going at this point.
Gardner: I would think that the competitive forces are out there. If you are in a competitive industry, and those that you compete against are doing this and you are not, that could spell some trouble.
Gardner: We’ve been talking with Michael Cavaretta, PhD, Technical Leader of Predictive Analytics at Ford Research and Advanced Engineering in Dearborn, Michigan. Michael and I have been exploring how big data is fostering business transformation by allowing deeper insights into more types of data and all very efficiently. This is improving processes, updating quality control and adding to customer satisfaction.
Our conversation today comes as a lead-in to Michael’s upcoming plenary presentation. He is going to be talking on January 28 in Newport Beach California, as part of The Open Group Conference.
You will hear more from Michael and others, the global leaders on big data that are going to be gathering to talk about business transformation from big data at this conference. So a big thank you to Michael for joining us in this fascinating discussion. I really enjoyed it and I look forward to your presentation on the 28.
Cavaretta: Thank you very much.
Gardner: And I would encourage our listeners and readers to attend the conference or follow more of the threads in social media from the event. Again, it’s going to be happening from January 27 to January 30 in Newport Beach, California.
This is Dana Gardner, Principal Analyst at Interarbor Solutions, your host and moderator throughout this thought leadership interview series. Thanks again for listening, and come back next time.
Listen to the podcast. Find it on iTunes. Download the transcript. Sponsor: The Open Group.
Transcript of a BriefingsDirect podcast on how Ford Motor Company is harnessing multiple big data sources to improve products and operations. Copyright The Open Group and Interarbor Solutions, LLC, 2005-2013. All rights reserved.
You may also be interested in:
- The Open Group Trusted Technology Forum is Leading the Way to Securing GLobal IT Supply Chains
- Corporate Data, Supply Chains Remain Vulnerable to Cyber Crime Attacks Says Open Group Conference Speaker
- Open Group Conference Speakers Discuss the Cloud: Higher Risk or Better Security?
- Capgemini's CTO on Why Cloud Computing Exposes the Duality Between IT and Business
- San Francisco Conference observations: Enterprise transformation, enterprise architecture, SOA and a splash of cloud computing
- MIT's Ross on how enterprise architecture and IT more than ever lead to business transformation
- Overlapping criminal and state threats pose growing cyber security threat to global Internet commerce, says Open Group speaker