Monday, January 05, 2009

A Technical Look at How Parallel Processing Brings Vast New Capabilities to Large-Scale Data Analysis

Transcript of BriefingsDirect podcast on new technical approaches to managing massive data problems using parallel processing and MapReduce technologies.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you're listening to BriefingsDirect. Today we present a sponsored podcast discussion on new data-crunching architectures and approaches, ones designed with petabyte data sizes in their sights.

It's now clear that the Internet-size data gathering, swarms of sensors, and inputs from the mobile device fabric, as well as enterprises piling up ever more kinds of metadata to analyze, have stretched traditional data-management models to the breaking point.

In response, advances in parallel processing, using multi-core chipsets have prompted new software approaches such as MapReduce that can handle these data sets at surprisingly low total cost.

We'll examine the technical underpinnings that support the new demands being placed on, and by, extreme data sets. We'll also uncover the means by which powerful new insights are being derived from massive data compilations in near real time.

Here to provide an in-depth look at parallelism, modern data architectures, MapReduce technologies, and how they are coming together, is Joe Hellerstein, professor of computer science at UC Berkeley. Welcome, Joe.

Joe Hellerstein: Good to be here, Dana.

Gardner: Also Robin Bloor, analyst and partner at Hurwitz & Associates. Thanks for joining, Robin.

Robin Bloor: It's good to be here.

Gardner: We're also joined by Luke Lonergan, CTO and co-founder of Greenplum. Welcome to the show, Luke.

Luke Lonergan: Hi, Dana, glad to be here.

Gardner: The technical response to oceans of data is something that has been building for some time. Multi-core processing has also been something in the works for a number of years. Let's go to Joe Hellerstein first. What's different now? What is in the current confluence of events that is making this a good mixture of parallelism, multi-core, and the need to crunch ever more data?

Hellerstein: It's an interesting question, because it's not necessarily a good thing. It's a thing that's emerged that seems to work. One thing you can look at is data growth. Data growth has been following and exceeding Moore's Law over time. What we've been seeing is that the data sets that people are gathering and storing over time have been doubling at a rate of even faster than every 18 months.

That used to track Moore's Law well enough. Processors would get faster about every 18 months. Disk storage densities would go up about every 18 months. RAM sizes would go up by factor of two about every 18 months.

What's changed in the last few years is that clock speeds on processors have stopped doubling every 18 months. They're growing very slowly, and chip manufacturers like Intel have moved instead to utilizing Moore's Law to put twice as many transistors on a chip every 18 months, but not to make those transistors run your CPU faster.

Instead, what they are doing is putting more processing cores on every chip. You can expect the number of processors on your chip to double every 18 months, but they're not going to get any faster.

So data is growing faster, and we have chips basically standing still, but you're getting more of them. If you want to take advantage of that data, you're going to have to program in parallel to make use of all those processors on the chips. That's the confluence that's happening. It's the slowdown in clock speed growth against the continued growth in data.

Effects on mainstream compute problems

Gardner: Joe, where do you expect that this is going to crop up first? I mentioned a few examples of large data sets from the Internet, such as with Google and what it's doing. We're concerned about the mobile tier and how much data that's providing to the carriers. Is this something that's only going to affect a select few problems in computing, or do you expect this to actually migrate down into what we consider mainstream computing issues?

Hellerstein: We tend to think of Google as having the biggest data sets around. The amazing thing about the Web is the amount of data there that was typed in by people. It's just phenomenal to think about the amount of typing that's gone on to generate that many petabytes of data.

What we're going to see over time is that data production is going to be mechanized and follow Moore's Law as well. We'll have devices generating data. You mentioned sensors. Software logs are big today, and there will be other sources of data ... camera feeds and so on, where automated generation is going to pump out lots of data.

That data doesn't naturally go to Web search, per se. That's data that manufacturers will have, based on their manufacturing processes. There is security data that people who have large physical plants will have coming from video cameras. All the retail data that we are already capturing with things like Universal Product Code (UPC) and radio-frequency identification (RFID) is going to increase as we get finer-grain monitoring of objects, both in the supply chain and in retail.

We're going to see all kinds of large organizations gathering data from all sorts of automated sources. The only reason not to gather that data is when you run out of affordable processing and storage. Anybody with the budget will have as much data as they can budget for and will try to monetize that. It's going to be pervasive.

Gardner: Robin Bloor, you've been writing about these issues for some time. Now, we have had multi-core silicon, and we've had virtualization for some time, but there seems to be a lag between how the software can take advantage of what's going on on the metal. What's behind this discrepancy, and where do you expect that to go?

Bloor: There are different strands to this, because if we talk about parallelization, then with large database products, to a certain extent, we have already moved into the parallelization.

It's an elastic lag that comes from the fact that, when a chip maker does something new on the chip, unless its just a speed -- which was a great thing about clock speed -- you have to change your operating system to some degree to take advantage of what's new on the chip. Then, you may have to change the compilers and the way you write code in order to take advantage of what's on the chip.

It immediately throws a lag into the progress of software, even if the software can take advantage of it. With multi-core, we don't have specific tools to write parallel software, except in one or two circumstances, where people have gone through the trouble to do that. They are not pervasive.

You don't have operating systems naturally built for sharing the workload of multi-core. We have applications like virtualization, for example, that can take advantage of multi-core to some degree, but even those were not specifically written for multi-core. They were written for single-core processes.

So, you have a whole lag in the works here. That, to a certain extent, makes multi-core compelling for where you have parallel software, because it can attack those problems very, very well and can deliver benefit immediately. But you run into a paradox when Intel comes out with a four-way or an eight-way or a 16-way chip set. Then the question is how are you going to use that?

Multi-core becomes the killer app

Gardner: You've written recently Robin that the killer app, so to speak, for multi-core is data query. Why do you feel that's the case?

Bloor: There are a lot of reasons for it. First of all, it parallelizes extremely well. Basically, you have a commanding node that's looking after a data query. You can divide the data and the resources in such a way that you just basically run everything in parallel.

The other thing that's really neat about this application is it's a complete batch application, in the sense that you just keep pushing the data through an engine that keeps doing the queries. So, you're making pretty effective use of all the processes that are available to you. It's very high usage.

If you run an operating system that's based upon intervals, you're waiting for things to happen. At various times, the operating system is idle. It doesn't seem like they're very long times, but mostly on a PC the operating system is never doing anything. Even when you're running applications on a PC, it's rarely doing very much, even in a single-CPU situation. In a multiple-CPU situation, it's very hard to divide the workload.

So that's the situation. You've got this problem that we have with very large heaps of data. They've been growing roughly at a factor of about 1,000 every six years. It's an awesome growth rate. At the same time, we have the technology where we can take a very good dash at this and use the CPU power we've got very effectively.

Gardner: Luke Lonergan, we now have a data problem, and we have some shifts and changes in the hardware and infrastructure. What now needs to be brought into this to create a solution among these disparate variables?

Longergan: Well, it's interesting. As I listened to Joe and Robin talk about the problem, what comes to mind is a transition in computing that happened in the 1970s and 1980s. What we've done at Greenplum is to make a parallel operating system for data analysis.

If you look back on super computing, there were times when people were tackling larger and larger problems of compute. We had to invent different kinds of computers that could tackle that kind of problem with a greater amount of parallelism than people had seen before -- the Connection Machine with 64,000 processors and others.

What we've done with data analysis is to make what Robin brings forward happen -- have all units available within a group of commodity computers, which is the popular computing platform. It's really required for cost-efficient analysis to bring that to bear automatically on structured query language (SQL) queries and a number of different data-intensive computing problems.

The combination of the software-switch interconnect, which Greenplum built into the Greenplum product, and the underlying use of commodity parallel computers, is brought together in this database system that makes it possible to do SQL query and languages like MapReduce with automatic parallelism. We're already handling problems that involve thousands of individual cores on petabytes of data.

The problem is very much real. As Joe indicated, there are very many people storing and analyzing more data. We're very encouraged that most of our customers are finding new uses for data that are earning them more money. Consequently, the driver to analyze more and more data continues to grow. As our customers get more successful, this use of data is becoming really important.

Gardner: Back to Joe. This seem to be a bright spot in computer science, tackling these issues, particularly in regard to massive data sets, not just relational data, of course, but a multitude of different types of content and data. What's being done at the research level that backs up this direction or supports this new solution direction?

Data-centered approach has huge power

Hellerstein: It's an interesting question, because the research goes back a ways. We talked about how database systems and relational query, like SQL, can parallelize neatly. That comes straight out of the research literature projects, like the Gamma Research Project at Wisconsin in 1980s, and the Bubba Project at MCC. What's happened with that work over time is that it has matured in products like Greenplum, but it's been kind of cornered in the SQL world.

Along came Google and borrowed, reused, and reapplied a lot of that technology to text- and Web-processing with their MapReduce framework. The excitement that comes from such a successful company as Google tackling such a present problem as we have today with the Web, has begun to get the rest of computer science to wake up to the notion that a data-centric approach to parallelism has enormous power.

The traditional approach to parallelism and research in the 1980s was to think about taking algorithms -- typically complicated scientific algorithms that physicists might want to use -- and trying to very cleverly figure out how to run them on lots of cores.

Instead, what you're seeing today is people say, "Wow, well, let's get a lot of data. It's easy to parallelize the data. You break it up into little chunks and you throw it out to different machines. What can we do cleverly in computing with that kind of a framework?" There are a lot of ideas for how to move forward in machine learning and computer vision, and a variety of problems, not just databases now, where you are taking this massively parallel data-flow approach.

Gardner: I've heard this term "shared nothing architecture," and I have to admit I don't know anything about what it means. Robin, do you have a sense of what that means, and how that relates to what we are discussing?

Bloor: Yeah, I do. The first time I ran into this was not in respect to this at all. I did some work for the Hong Kong Jockey Club in the 1990s. What they do is take all the gambling on all the horse racing that goes on in Hong Kong. It's a huge operation, much, much bigger than its name sounds.

In those days, they got, I think, the largest transaction rate in the world, or at least it was amongst the top ten. They were getting 3,000 bets in the last second before a race, and they lose the money from the bet if the bet doesn't go on.

The law in Hong Kong was that the bet has to be registered on disk, before it was actually a real bet. So, if in any way, anything fell over or broke during the minute leading up to a race, a lot of money could be lost.

Basically they had an architecture that was a shared nothing architecture. They had a router in front of an awful lot of servers, which were doing nothing but taking bets and writing them to disk. It was server, after server, after server. If at any point, there was any indication that the volume was going up, they would just add servers, and it would divide the workload into smaller and smaller chunks, so it could do it.

You can think of almost being like a supermarket in the sense of lots and lots of different tools and lots of queues for people, but each tool is a resource on its own, and it shares nothing with anything else. Therefore, no bottlenecks can build up around any particular line.

If you have somebody directing the traffic, you can make sure that the flow goes through. So you go from that, straight into a query on a very large heap of data, if you manage to divide the data up in an efficient way.

A lot of these very big databases consist of nothing more than one big fact table -- a little bit more, but not much more than one big fact table. You split that over 100 machines, and you have a query against a whole fact table. Then, you just actually have 100 queries against 100 different data sets, and you bring the answer back together again.

You can even do fault tolerance in terms of the router for all this. So, with that, you can end up with nothing being shared, and you just have the speed. Basically, any device that's out there is doing a bit of query for you. If you've got 1,000 of them, you go 1,000 times faster. This scales extraordinarily well, because nothing is shared.

Gardner: Luke, tell me how these concepts of being able to scale relate to what the developers need to do. It seems to me that we've got some infrastructure benefits, but if we don't have the connection between how these business analysts and others that are seeking the results can use these infrastructure benefits, we're not capitalizing. What needs to happen now in terms of the logic as that relates to the data?

The net effects on users

Longergan: It's a good question, because, in the end, it's about users being able to gain access to all that power. What really turned the corner for general data analysis using SQL is the ability for a user to not to have to worry about what kind of table structure they have. They can have lots of small tables joining to lots of big tables, and big tables joining to each other.

These are things they do to make the business map better to the data analysis they're doing. That throws a monkey wrench in the beautiful picture of just subdividing the data and then running individual queries.

What the developer needs is an engine that doesn't care how the data is distributed, per se, just being able to use all of that parallelism on the problems of interest. The core problem we've solved is the ability for our engine to redistribute the data and the computation on the fly, as these queries and analysis are being performed.

It's the combination, as Robin put it earlier, of a compiler technology, which is our parallelizing optimizer, and a software interconnect, which we call a soft switch technology. The combination of those two things enables a developer of business logic and business analysis to not to have to worry about what is underneath them.

The physical model of how the database is distributed in a shared nothing architecture in a Greenplum system is not visible to the developer. That is where the SQL-focused data analytics realm has gone by necessity. It really has made it possible to continue to grow the amount of data, and continue to be able to run SQL analysis against that data. It's the ability to express arbitrarily constructed business rules against a large-scale data store.

Gardner: We did one of these podcasts not too long ago with Tim O'Reilly. He mentioned that he'd heard from Joe Hellerstein that every freshman now at UC Berkeley studying computer science is being taught Hadoop, which is related on an open-source development level and community to MapReduce. SQL is now an elective for seniors.

It seems that maybe we've crossed a threshold here in terms of how people are preparing themselves for this new era. Joe, how does that relate to how this new logic and ability to derive queries from these larger data sets is unfolding?

Hellerstein: What you're seeing there is three things happening at once. The first is that we have a real desire on the educational side to teach the next generation of programmers something about parallelism. It's really sticking your head in the sand to teach programming the way we have always taught it and not address the fact that every efficient program over the next ... forever is going to have to be a parallel program. That's the first issue.

The second issue is what's the simplest thing you can teach to computer science students to give them a tangible feeling for parallelism, to actually get them running code on lots of machines and get it going? The answer is data parallelism -- not a complicated scientific algorithm that's been carefully untangled, but simple data parallelism in a language that doesn't really require them to learn any new conceptual ideas that they wouldn't have learned in a high school AP course where they learned say Python or Java.

When you look at those requirements, you come up with the Google MapReduce model as instantiated in the open-source code of Hadoop. They can write simple straight-line programs that are procedural. They look just like "For" loops and "If-Then" statements. The students can see how that spreads out over a lot of data on a lot of machines. It's a very approachable way to get students thinking about parallelism.

The third piece of this, which you can't discount, is the fact that Google is very interested in making sure that they have a pipeline of programmers coming in. They very aggressively have been providing useful pedagogical tools, curriculum, and software projects, to universities to ramp this up.

So it's a win-win for the students, for the university, and frankly for Google, Yahoo, and IBM, who have been pushing this stuff. It's an interesting thing, an academic-industrial collaboration for education.

At the business level

Gardner: Let's bring this from a slightly abstract level down to a business level. We seem to be focusing more on purpose-built databases, appliances, packaging these things a little differently than we had in a distributed environment. Luke, what's going on in terms of how we package some of these technologies, so that businesses can start using them, perhaps at a crawl, walk, run type of a ramp up?

Longergan: Businesses have invested a tremendous amount of their time over the last 15 to 25 years in SQL, and some of the more traditional kinds of business analysis that pay off very well are ensconced in that programming model. So, packaging a system that can do transactional, mixed workloads with large amounts of concurrency, with applications that use the SQL paradigm, is very important.

Second, the ability to leverage the trends in microprocessors and inexpensive servers, and combine those with this kind of software model that scales and takes advantage of very high degree of parallelism, requires a certain amount of integration expertise.

Packaging this together as software plus hardware, making that available as a reference architecture for customers, has been very important and has been very successful in our accounts at New York Stock Exchange, Fox, MySpace, and many others.

Finally, as Joe and you were hinting at, there are changes in the programming paradigm. In being able to crawl, walk, and then run, you have to support the legacy, but then give people a way to get to the future. The MapReduce paradigm is very interesting, because it bridges the gap between traditional data-intensive programming with SQL and the procedural world of unstructured text analysis.

This set of technologies, put together into a single operating system-like formulation and package, has been our approach, and it's been very popular.

Gardner: Robin Bloor, this whole notion of legacy integration is pretty important. A lot of enterprises don't have the luxury of starting out "green field," don't have the luxury of hiring the best and brightest new computer scientists, and working on architecture from a pure requirements-based perspective. They have to deal with what they have in place. Increasingly, they want to relate more of what they have in place into an analytic engine of some kind.

What's being done from your perspective vis-à-vis parallelization and things like MapReduce that allow for backward compatibility, as well as setting yourself up to be positioned to expand and to take advantage of some of these advancements?

Bloor: The problem you have with what is fondly called legacy by everybody is that it really is impossible. The kind of things that were done in the past, very strongly bound the software to the data, to the environment it ran in. Therefore, unhooking that, other than starting again from scratch, is a very difficult thing to do.

Certainly, a lot of work is going on in this area. One thing that you can do is to create something -- I don't know if there is an official title to it, but everybody seems to use the word data fabric. The idea being that you actually just siphon data off from all of the data pools that you have throughout an organization, and use the newer technology in one way or another to apply to the whole data resource, as it exists.

This isn't a trivial thing to do, by the way. There are a lot of things involved, but it's certainly a direction in which things are actually going to move. It's possibly not as well acknowledged as it should be, but most of the things that we call data warehouses out there, the implementations have been done in the area of business intelligence (BI), actually don't run very well.

You have situations where people post queries, and it may take hours to answer a query. Because it takes hours to answer a query, and you have a whole scheme, a reason why you are actually mining the data for something, if every step takes a couple of hours, it's very difficult to carry out an analysis like that in a particularly effective way.

A 100-to-1 value improvement

If you take something like the Greenplum technology, and you point to the same problem, even though you are not dealing with petabytes of data, you can still have this parallel effect. You can get answers back that used to take 100 minutes, and you will get 100 to 1 out of this. You may get more, but you will certainly get 100 to 1 out of this, and it changes the way that you do the job that you have.

One thing that's kind of invisible is that there is a lot of data out there that's not being analyzed fast enough to be analyzed effectively. That's something that I think parallelism is going to address.

The other thing where it is going to play a part is that organizations are going to build data fabrics. In one way or another, they will siphon the data off and just handle it in a parallel manner. There is a lot you can do with that, basically.

Gardner: Joe Hellerstein, is there more being brought to this from the data architecture perspective, jibing the old with the new, and then providing yet better performance when it comes to these massive analytic chores?

Hellerstein: What I'm excited about, and I see this at Greenplum -- there's another company called Aster Data that's doing this, and I wouldn't be surprised if we see more of this in the market over time -- is the combination of SQL and MapReduce in a unified way in programming environments. This is short-term step, but it's a very pragmatic one that can help with people's ability to get their hands on data in an organization.

The idea is that, first of all, you want to have the same access to all your data via either an SQL interface or a MapReduce programming interface. When I say all the data, I mean the stuff you used to get with SQL, the database data, and the stuff you might currently be getting with MapReduce, which might be text files or log files in a distributed-file system. You ought to be able to access those with whatever language suits you, mix and match.

So, you can take your raw log files, which are raw text, and use SQL to join those against a customer table. Or, if you're a MapReduce programmer who does analytics and doesn't know SQL, say you're a statistician, you can write a MapReduce program that does some fancy statistical analysis. You can point it at text fields in a database full of user comments, or at purchase records that you used to have to dump out of the database into text formats to get your hands on. So, part of this is getting more access to more people who have programing paradigms at their fingertips.

Another piece of this is that some things are easier to do in MapReduce, and some things are easier to do in SQL, even when you know both. Good programmers have a lot of tools in their tool belt. They like to be able to use whatever tool is appropriate for the task. Having both of these things interleaved is really quite helpful.

Gardner: Luke, to what degree are they interleaved now, and to what degree can we expect to see more?

Longergan: It's been very gratifying that just making some of those pragmatic capabilities available and helping customers to use them has so far yielded some pretty impressive results. We have customers who have solved core business problems, in ways they couldn't have before, by unifying the unstructured text-file data sources with the data that was previously locked up inside the database.

As Joe points out, it's a good programmer who knows how to use all of the various tools that they have at their disposal. Being able to pull one that's just right for the task off the shelf is a great thing to do. With the Greenplum system we've made this available as a simple extension and just another language that one can use with the same parallel data engine, and that's been very successful so far.

Impact on cloud computing

Gardner: Let's look at how this impacts one of the hot topics of the day, and that's cloud computing, the idea that sourcing of resources can come from a variety of organizations. You're not just going to get applications as a service or even Web services, but increasingly infrastructure functionality as a service.

Does this parallelization, some of these new approaches to programming, and the ability to scale have an impact on how well organizations can start taking advantage of what's loosely defined as cloud computing? Let's start with you Joe.

Hellerstein: I'm not quite sure how this is going to play out. There are a couple of questions about how an individual organization's data will end up in the cloud. Inevitably it will, but in the short-term, people like to keep their data close, particularly database data that's traditionally been in the warehouses, very carefully managed. Those resources are very carefully protected by people in the organization.

It's going to be some time until we really see everybody's data warehouses up in the cloud. That said, as services move into the cloud, the data that those services spit out and generate, their log files, as well as the data that they're actually managing, are going to be up in the cloud anyway.

So, there is this question of, how long will it be until you really get big volumes of data in the cloud. The answer is that certainly new applications will be up there. We may start to see old data getting uploaded in the cloud as well.

There's another class of data that's already becoming available in the cloud. There is this recent announcement from Amazon that they are going to make some large data sets available on their platform for public access. I think we'll see more of this, of data as a utility that's provided by third parties, by governments, by corporations, by whomever has data that they want to share.

We'll start to see big data sets up there that don't necessarily belong to anyone, and they are going to be big. In that environment, you can imagine big data analytics will have to run in the cloud, because that's where the data will be.

One of the fun things about the cloud that's really exciting is the elasticity of the resources. You don't buy yourself a data center full of machines, but you rent as many machines as you need for a task.

If you have a task that's going to look at a lot of data, you would rent a lot of machines for a few hours, and then you would shrink your pool. What this is going to allow people to do is that even small organizations may, for a short period of time, look at an enormous amount of data, which perhaps doesn't originate in their own data production environment, but is something that they want to utilize for their purposes.

There is going to be a democratization of the ability to take advantage of information, and it comes from this ability to share these resources that compute, as well as the actual content to share them in a temporary way.

Gardner: Let's go to Robin on that. It seems that there is a huge potential payoff if, as Joe mentioned, you can gather data from a variety of sources, perhaps not in your own applications, not your own infrastructure and/or legacy, but go out and rent or borrow some data, but then do some very interesting things with it. That requires joins, that requires us to relate data from one cloud to another or to suck it into one cloud, do some wonderful magic-dust pixie sprinkling on it, and then move along.

How do you view this problem of managing boundaries of clouds, given that there is such a potential, if we could do it well, with data?

Looking at networks

Bloor: There would have to be, because you are looking at a technical problem, and you really are going to have to have specific interfaces for doing that, especially if you are joining data across clouds. Let's drop the word "cloud" and just think large network, because everything that is representative of the cloud ultimately comes down to being somewhat of a larger network.

When you've got something very large, like what Google and Amazon have, then you have this incredible flexibility of resources. You can push resources in or redeploy these resources very, very effectively. But you're not going to be able to do joins across data heaps in one cloud and another cloud, and in perhaps a particular network without there being interfaces that allow you to do that, and without query agents sitting in those particular clouds that are going to go off and do the work. You're going to care very much as to how fast they do that work as well.

This is going to be a job for big engines like Greenplum, rather than your average relational database, because your average relational database is going to be very slow.

Also, you have to master the join. In other words, the result has to arrive somewhere, and be brought together. There are a number of technical issues that are going to have to be addressed, if we're going to do this effectively, but I don't see anything that stops it being done. We have the fast networks to enable this. So, I think it can be done.

Gardner: Luke, last word goes to you. I don't expect you to pre-announce necessarily, but how do you, from Greenplum's perspective, address this need for joining, but recognizing it's a difficult technical problem?

Longergan: Well, the cloud really manifests itself as a few different things to us. When Joe was talking about how people are going to be putting, and are already putting, a lot of services up in the cloud that are generating a lot of new data, then it requires that the kinds of data analysis, as Robin was hinting at, scale to meet that demand.

We already have the engine that implements those kinds of join in between networks abilities. So we are cloud capable. The real action is going to be when people start to do business that counts on public clouds to function properly, and are generating enormous amounts of very valuable data that requires the kind of parallel compute that we provide.

Joining inside clouds, using cloud resources to do the kind of data analysis work, this is all happening as we speak, and this is another aspect of what's forcing the change from an earlier paradigm of database to the modern massively parallel one.

Gardner: I just want to wrap up quickly now. Thank you. Joe Hellerstein, you mentioned earlier on Moore's Law and how it stalled a bit on the silicon. Are we going to see a similar track, however, to what we did with processing over the last 15 years -- a rapid decrease in the total cost associated with these tasks? Even if we don't necessarily see the same effect in terms of the computing, are we going to be able to do what we've been describing here today at an accelerating decreased total cost?

Hellerstein: Absolutely. The only barrier to this is the productivity of programmers.

Just think about storage. I have a terabyte disk in my basement that holds videos, and it costs $100 or so dollars at Amazon.com. Ten years ago a terabyte was referred to by the experts in the field as a "terror byte." That's how worried people were about data volumes like that.

We'll see that again. Disk densities show no signs of slowing down. So, data is going to be essentially no cost. The data-gathering infrastructure is also going to be mechanized. We're going through what I call the industrial revolution of data production. We're just going to build machines to generate data, because we think we can get value out of that data, and we can store it essentially for free.

The compute cost of multi-core with parallelism is going to continue Moore's Law. It's just going to continue it in a parallel programming environment. If we can get all those cores looking at all that data, it won't cost much to do that, and the cost of that will continue to shrink by half.

The only real barrier to the process is to make those systems easy to program and manageable. Cloud helps somewhat with manageability, and programming environments like SQL and MapReduce are well-suited to parallelism. We're going to just see an enormous use of data analysis over time. It's just going to grow, because it gets cheaper and cheaper and bigger and bigger.

Gardner: Well, great, that's very exciting. We've been discussing advances in parallel processing using multi-core chipsets and how that's prompted new software approaches such as MapReduce that can handle these large data sets, as we have just pointed out, at surprisingly low total cost.

I want to thank our panel for today. We have been joined by Joe Hellerstein, professor of computer science at UC Berkeley, and I should point out also an adviser at Greenplum. Thank you for joining, Joe.

Hellerstein: It was a pleasure.

Gardner: Robin Bloor, analyst and partner at Hurwitz & Associates. I appreciate your input, Robin.

Bloor: Yeah, it was fun.

Gardner: Luke Lonergan, CTO and co-founder at Greenplum. Thank you, sir.

Longergan: Thanks, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions, you have been listening to a sponsored podcast from BriefingsDirect. Thanks and come back next time.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Learn more. Sponsor: Greenplum.

Transcript of BriefingsDirect podcast on new technical approaches to managing massive data problems using parallel processing and MapReduce technologies. Copyright Interarbor Solutions, LLC, 2005-2009. All rights reserved.

Monday, December 29, 2008

BriefingsDirect Analysts Make 2009 Predictions for Enterprise IT, SOA, Cloud and Business Intelligence

Edited transcript of BriefingsDirect Analyst Insights Edition podcast, Vol. 35, on how analysts see cloud computing, SOA, the economy, and Obama Administration in 2009, recorded Dec. 19, 2008.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Charter Sponsor: Active Endpoints.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Dana Gardner: Hello, and welcome to the latest BriefingsDirect Analyst Insights Edition, Vol. 35. This periodic discussion and dissection of IT infrastructure related news and events, with a panel of industry analysts and guests, comes to you with the help of our Charter Sponsor, Active Endpoints, maker of the ActiveVOS visual orchestration system. I'm your host and moderator Dana Gardner, principal analyst at Interarbor Solutions.

Our topic this week, and this is the week of Dec. 15, 2008, marks our year-end show. Happy holidays to you all! But, rather than look back at this year in review, because the year changed really dramatically after September, I think it makes a lot more sense to look forward into 2009.

We're going to look at what trends may have changed in 2008, but with an emphasis on the impacts for IT users, and buyers and sellers in the coming year. We're going to ask our distinguished panel of analysts and experts for their predictions for IT in 2009.

To help us gaze into the crystal ball, we're joined by this week's BriefingsDirect Analyst Insights panel. Please let me welcome Jim Kobielus, senior analyst at Forrester Research.

Jim Kobielus: Hi, Dana. Hi, everybody.

Gardner: Tony Baer, senior analyst at Ovum.

Tony Baer: Happy holidays, Dana.

Gardner: Brad Shimmin, principal analyst at Current Analysis.

Brad Shimmin: Hi there, Dana, thanks for having me.

Gardner: Joe McKendrick, independent analyst and prolific blogger.

Joe McKendrick: Hi, Dana, and a happy Festivus to all.

Gardner: Dave Linthicum, founder of Linthicum Group.

Dave Linthicum: Hey, guys.

Gardner: Mike Meehan, senior analyst at Current Analysis.

Mike Meehan: Hello, all.

Gardner: And joining us for the first time, JP Morgenthal, senior analyst at Burton Group. Good to have you, JP.

JP Morgenthal: Thanks, Dana, and I'll jump on the Festivus wagon as well.

Shadow IT


Gardner: Let me start with the predictions. It gives me a chance to steal the thunder and get out there first.

My first prediction for 2009 is that spending from shadow IT activities will actually grow, and that the amount of money devoted to shadow IT activities will come from outside traditional IT budgets, from a variety of different sources, maybe even petty cash, and we'll see a bit of growth in these rogue activities.

At the same time, I think we will see a flattening, and in many cases a reduction, in officially sanctioned IT activities, but that the net result will actually be more spending overall across a variety of activities based on services and consulting as much as actual buying of licensed software and hardware products.

The risk is that these rogue applications can make it complex for governance, management, and even security, but that moving into these areas for business development purposes is going to be an overwhelming temptation. There will be more opportunities in the cloud, software as a service (SaaS), applications as a service, and for folks like marketers, business analysts, and business development professionals to take advantage and move in the market.

We're going to be looking at aggressive sales activities and new ways of reaching consumers of all kinds, across B2B and B2C activities.

I expect very little staff erosion in IT, but I think there will be a change in emphasis as to what IT is, defining it differently. Service-oriented architecture (SOA) is going to continue to grow, but Web oriented architecture (WOA) will probably overtake it and perhaps become a catalyst to some of these rogue activities. There will be a blurring between which WOA activities happen inside IT and outside.

So, my second prediction is that inside of traditional IT we're going to find a lot of new ways to quickly cut costs. This is going to be a drill for organizations to not spend money or spend less money. Virtualization will be a big part of that. Hypervisors will perhaps go commodity, and the value-add in the virtualized environment is going to be at the stacks -- virtualized stacks or containers at the applications level.

This could then lead to more direction toward a cloud operating system and a de-facto standard could begin to emerge, which would then spur even more adoption of virtualization.

We're going to see a lot more dumping of Unix and mainframes. We are going to sunset a lot of applications that aren't essential and save on the underlying costs of supporting them. There will be some modernization of applications, but only in areas where there is low risk.

There are still going to be a lot of organizations that aren't going to want to tinker with applications that are important, even if they are running on expensive infrastructure.

My third prediction is around extreme business intelligence (BI). There will be a move in scale, larger sets of data, larger sets of content, and more mingling or joining of disparate types of data and content in order to draw inferences about what the customers are willing to do and pay across both B2B and B2C activities.

We'll start to see an increased use of multi-core and parallelism to support these BI activities, and we will begin to see IT have a big role in this. This isn't something you can do as a rogue activity, but it might end up supporting rogue activities. That is to say, these new extreme BI activities might lead organizations to seek out services outside of IT. They then can execute on what they find through their analysis.

I also predict, at number four, that upgrades will suffer. Were not going to see a lot of swapping out of one system for another, unless there's a very compelling return-on-investment (ROI) scenario with verifiable short-term metrics. This is going to hurt companies like SAP and Microsoft, and Oracle and IBM to a lesser extent, given their diversification.

Trouble for Windows 7

I think Windows 7 is in trouble. People are not going to just run to Windows 7. They're going to continue to stay with XP, and this makes the timing around the Vista debacle all the more injurious to Microsoft. In hindsight, Vista needed to be a winner. Now that we're in a downturn, people are going to stick with what they have, and, of course, upgrades are essential for Microsoft to continue with its back-end strategy on data-center architecture and infrastructure.

This provides more of an opening for Linux and non-Microsoft virtualization, and that will continue. This could mean that Microsoft needs to move to its cloud offerings all the more quickly, which then could actually spell earnings troubles for the company, at least in the short to medium term.

My last prediction is that the role of social media and networks will continue to grow and be impactful for enterprises, as marketers and salespeople begin to look to these organizations from the metadata and inference about what customers are willing to buy, particularly under tight economic conditions.

There's going to be a need to tie traditional customer relationship management (CRM) and sales applications with some sort of a process overlay into the metadata that's available from these Web-based cloud environments, where users have shared so much inference and data about themselves.

So, I look for some mashups between social data and the sales and business development, perhaps through these rogue applications and approaches outside of IT, but IT activities nonetheless, in 2009. Thanks.

Jim Kobielus, you're up. What are your five predictions?

Kobielus: I need to go home now. You stole all my predictions. Actually, that was great, Dana. I was taking notes, just to make sure that I don't repeat too many of your points unnecessarily, although I do want to steal everything you just said.

My five predictions for 2009 ... I'll start by listing them under a quick phrase and then I'll elaborate very quickly. I don't want to steal everybody else's thunder.

The five broad categories of prediction for 2009 are: Number one, Obama. Number two, cloud. Number three, recession. Number four, GRC -- that's governance, risk, and compliance. Then, number five, social networking.

Let me just start with [U.S. President Elect Barack] Obama. Obviously, we're going to have a new president in 2009. He'll most likely appoint a national chief technology officer or a national tech policy coordinator. Based on his appointment so far, I think Obama is going to choose a heavy hitter who has huge credibility and stature in the IT space.

We've batted around various names, and I'm not going to add more to the mix now. Whoever it is, it's going to be someone who's going to focus on SOA at a national level, in terms of how we, as a country, can take advantage of reusing agility, transformation, optimization, and all the other benefits that come from SOA properly implemented across different agencies.

So, number one, I think Obama is going to make a major change in how the government deploys IT assets and spends them.

The maturing of clouds


Number two, cloud. Dana went to town on cloud, and I am not going to say much more, beyond the fact that in 2009, clouds are going to become less of a work in progress, in terms of public clouds and private clouds, and become more of a mature reality, in terms of how enterprises acquire functionality, how they acquire applications and platforms.

I break out the cloud developments in 2009 into a long alliterative list. Clouds will start up in greater numbers. They will stratify, which means that the vendors, like Google, Microsoft, and Amazon and others with their cloud offerings, will build full stacks, strata, in their cloud services that include all the appropriate layers, application components, integration services, and platforms. So, the industry will converge on a more of a reference model for cloud in 2009.

They'll also stabilize the clouds. In other words, they'll become more mature, stable and less scary for corporate IT to move applications and data to. They'll standardize, and the clouds will standardize around SOA and WOA standards. There will be more standards, interfaces, and application programming interfaces (APIs) focused on cloud computing, so you can move your applications and data from one cloud to another a bit more seamlessly than you can now with these proprietary clouds that are out there. And, there are other "S" items that I won't share here.

Number three, recession. Clearly, we are in a deep funk, and it might get a lot worse before it gets better. That's clearly hammering all IT budgets everywhere. So, as Dana said, every user and every organization is going to look for opportunities to save money on their IT budgets.

They're going to put a freeze on projects. They're going to delay or cancel upgrades. Their users, as you said very nicely, Dana, are going to dip into petty cash and go around IT to get what they need. They're going to go to cloud offerings. So, the recession will hammer the entire IT industry and all budgets.

As far as GRC, government is cracking down. If it has to bail out the financial-services industry, bail out the auto industry, and bail out other industries, the government is not going to do it with no strings attached.

Compliance, regulations, reporting requirements, the whole apparatus of GRC will be brought to bear on the industries that the government is saving and bailing out.

Then finally, social networking. Dana provided a very good discussion of how social networking will pervade everything in terms of applications and services.

The Obama campaign set the stage clearly for more WOA-style, Web 2.0, or social-networking style governance in this country and other countries. So, we'll see more uptake of social networking.

We'll see more BI become social networking, in the sense of mashup as a style of BI application, reporting, dashboards, and development. Mashups for user self-service BI development will come to the fore. It will be a huge theme in the BI space in 2009 and beyond of that.

That really plays into the whole cost control theme, which is that IT will be severely constrained in terms of budget and manpower. They're going to push more of the development work to the end user. The end user will build reports that heretofore you've relied on data modelers to build for you. Those are my five.

Gardner: Thank you, Jim. Tony Baer, you're up. What did we miss?

Cost savings, cost savings

Baer: It's going to be hard to top both of you folks, so I'm going to just add some things in the margins. If I were to make one elevator statement on this, I feel like the guy [Kenan Thompson as Oscar Rogers] from Saturday Night Live, the economic expert, who they interview on "Weekend Update." He starts to give all the causes. Then, he just says, "Well, just fix it!"

That's essentially going to be the theme this year. The top five are going to be cost savings, cost savings, cost savings.

That does involve a lot of the strategies that both you and Jim have just described. For one thing, it's going to put a lot more emphasis on using the resources and infrastructure that you already have. It's going to damp down entering into new long-term contracts for anything.

Ironically, one result of that is that for the moment, you'll actually see little less emphasis on outsourcing, because that does imply a long-term contract. The fact is, I don't think anyone is really doing any meaningful projecting beyond Q1. I was just reviewing Adobe's year-end numbers and projections. Normally, they project out for the full fiscal year, and they are only going to project out for the Q1.

I'll just go through a very quick laundry list. For one thing, as I mentioned, it's going to be a lot of low cost, no cost. There will be a lot more use of open source, a lot more. This is definitely the year that the cloud and SaaS come into their own, but with a key qualification.

I think it's going to be managed clouds. Essentially, to take advantage of raw clouds, like Amazon EC2 you have to put in more of your own management infrastructure. I don't see the use of what I would call "clouds in the wild." I see more managed clouds from that standpoint.

For IT organizations, it's going to dictate more attention to IT service management to show that we're not just keeping systems going and keeping the lights on, but more along the lines of, "Here are the services that we're delivering to the business," as they try to justify the system.

On the back-end, it will be "Use more of what you have," and huge renewed investments in BI. So, Jim, I do think you still have a job this year.

Finally, because it's going to take a while for this to unfold -- you just don't regulate overnight -- there will be much greater attention to GRC.

Gardner: Thank you, Tony. Brad Shimmin, you're up.

Shimmin: Thanks, Dana. For my predictions for 2009 I took a different tact in anticipation of a new analytical concern we're starting up here in January. It's going to focus on collaboration. So, everything I did settled on that.

All the predictions I have stem from the themes that you guys have been talking about: cutting cost, such as travel, and squeezing efficiencies out of the IT infrastructure, as well as the users themselves. So, bear that in mind as I go through this.

Collaborative social networks


The first one for me is vendors tackling enterprise-plus-consumer based social networks, a blended view of those. Enterprise-focused vendors are going to do more than simply sink info from public sites like Facebook. They're going to take that information and build into or out from the enterprise into those social networks and drive information from those. It's going to become a two-way street.

You're going to see folks like Facebook, and most notably, LinkedIn, working in the other direction themselves, and with third parties, to develop enterprise-bound social networks. Look for those to emerge next year.

The second thing for me is cloud software, now that it's jumped the shark. I know we've all been talking about it, but it's definitely jumped the shark for me. I see the vendors within the collaboration space settling beyond the small and medium business (SMB) market and looking more toward the larger enterprises that are looking to squeeze more out of their existing IT infrastructure or cut costs.

Folks like IBM and Microsoft have already shown us that they can hit the long tail with stuff like Bluehouse and Microsoft Online Services (MOS) for collaboration. But, you're going to see vendors like Cisco and Oracle take up this challenge with more of a focus on managed hosting services that look more like SaaS, but they are really managed.

That's something that will appeal to the larger enterprises, owing to security, manageability, and other assurances that you get from that, not just pure-play, do-it-yourself SaaS.

The third thing for me is that enterprises are going to move away from a steep hierarchy, or the word might be "oligarchy," of an organizational model internally. This is just about how enterprises structure themselves.

This goes back to what you were saying, Dana, with stuff going off the books, and what Tony was saying about driving revenue from places other than CAPEX. Instead, to become not just more efficient and agile, companies are going to want to self organize to create these internal ecosystems, if you will, where organizations are built around employee experience, associations, interests, and energy levels -- what they want to focus on.

That's going to allow companies to more efficiently harness the users. The people, as Jim was saying earlier, perhaps are going to be tasked with setting up their own BI queries and mashing up their own applications. It's really thinking about those people, giving them the ability to run the show inside of an organization, instead of waiting for everything to come top-down.

The fourth thing for me is -- speaking in terms of communities, both internally and externally -- I am seeing silos breakdown between those.

Gone are the days of consumer-faced social networking and enterprise-faced social networking existing as independent entities, as I was saying earlier. Thanks to user profile standards like OpenID and expansion of APIs, community providers and third-party aggregation and integration tool vendors are going to allow applications and users to flow between what were heretofore closed communities.

For example, you already have vendors moving in that direction with Yahoo's YOS, which now allows the My Yahoo start page to host third-party applications from nemesis Google.

The fifth and final thing for me -- and this might be more of a wish than a prediction; I'm an eternal optimist I guess -- I'm looking for virtual worlds to gain a foothold in the enterprise.

We've seen folks like [Cisco Chairman and CEO] John Chambers use Second Life to do a dog-and-pony show. Those are great marketing tools, but they're nothing compared to the efficiencies and benefits you can gain from using the software for other things. Dana, you alluded earlier to being able to leverage that mechanism for communication with CRM. I think we're going to see that change how virtual networks can be utilized inside the enterprise.

It's not just for marketing and sales, but also to support B2B and B2C communities, where effective communication between your supply channel members is really paramount. To date, nobody has tackled that.

So, we'll see virtual worlds actually make an impact in terms of allowing these global, loosely coupled entities communicate more effectively in 2009. That's it for me.

Gardner: Thanks. Joe McKendrick, how do you see things shaking out?

McKendrick: Thanks, Dana. You guys are a hard act to follow. My first prediction -- are you ready for this -- the government, the U.S. Treasury, is going to swoop in with the Troubled Assets Relief Program (TARP) funds and swoop up all the troubled IT assets across the country -- those IBM mainframes, older mainframes, DEC units, Windows NT.

Then, the Fed is going to come in with zero percent liquidity to help finance it, and that's going to raise all boats.

Gardner: Joe, are you defining a new sector called "Toxic IT?"

McKendrick: Toxic IT, there you go.

Gardner: Joe, April 1 is not for several months.

McKendrick: Okay, just kidding. My other prediction: President Obama is going to make Tony Baer the National CTO/CIO, because he wants to "just fix it," and that's a good philosophy.

It's the economy

Okay, all seriousness aside now. The top issue, of course, is the economy. It's going to dominate our thinking through 2009. But, recession planning is so 2008, because SOA, which I focus on as well as IT, is a long-term process. You need to look three years down the road.

The economy is going to turn around. I see it turning around at some point in 2009. That's what economists are saying, and companies have to prepare for a growth mode and the ability to grow within a new environment.

Let's face it. IT has already been tight. IT has been tight since the dot-bomb era of 2001-2002. As some of us have already been saying, there probably is not going to be a huge diminishment in IT departments, because of the fact that the budgets have been lean, things have already been tight, companies already know, or have been running very efficiently, and IT departments have been overworked as it is.

An interesting sidelight is the whole Enterprise 2.0. JP, you and I have discussed this a little bit. The recession and downturn isn't going to be like it's been in the past. People are more empowered with social networking tools, as employees and as people looking for jobs. They're looking to start new businesses

We have a lot of tools available to us now that we didn't have back in 2000, or we didn't have back in 1991 or 1982, or any of those previous eras. People don't have to be victims of an economic downturn, as they have been in the past. We have the capability to network across the globe. We have the capability to start new businesses.

I've talked on this webcast before about a company that started a business with an $80 investment in IT infrastructure, thanks to cloud computing. I just heard about another company that spent about $200 for its first two months of IT.

Gardner: The question is, Joe, are they getting their money's worth?

McKendrick: I think they are. They don't have to invest in servers. They don't have to go out and buy servers. They don't have to go out and buy disk arrays, and worry about the maintenance, hiring people, and know how to maintain those things. There are a lot of opportunities for companies, and we are going to see that. We are going to see folks -- maybe IT people, or people who work for vendors and have been laid off -- have the ability to start their own business at a very low cost of entry.

On the flip side of that, the whole social-networking and cloud-computing phenomena, companies have these tools as well to employ low-cost methods to reach their markets and to interact with their customers. We're going to see a lot more of that as well.

A marketing campaign doesn't have to cost $200,000 to reach your customers. You can use the social network, the Web 2.0 tools, to interact and collaborate and find out what's going on in your markets at a very relatively low cost.

Gardner: From your mouth to God's ears. All right. Dave Linthicum, we have the entire future before us. What should we expect?

Linthicum: You guys took a lot of my better ideas, but I'll just expand on some of them.

The first thing I'd like to do is throw my firm out there for a bailout from the government. I think a billion dollars. I'm cash-flow positive, but I think I can do a lot with the money, including throwing one hell of a New Year's Eve party. So, hopefully the money will start coming in.

Cloud comes into its own

Number one is that the interest in cloud computing, which I have been focusing on in my career, at least for the last eight years, is finally going to come into its own, like everybody has been saying here. That's rather obvious at this point.

As far as what I can add to what's been said so far, what we're going to see in 2009 is a lot of startups, specifically some cloud-computing startups. You're going to see even more around what I call "cloud mediation." That is guys like RightScale, and a few other folks in the space that sit between you and the major cloud providers. They basically mediate issues around data semantics, performance management, load balancing, and those sorts of things.

One thing that's a big hole in the cloud computing movement so far is that most of the solutions out there, even the database solutions, are proprietary. They use different APIs, different interfaces, and different sets of standards. It's going to be a play for a lot of companies to get in there and provide more reliable infrastructure in and between these various guys out there.

I'm aware of one startup a week, and they're coming in through the funders, not necessarily through the entrepreneurs, which is unusual.

The links to social networking will be there. They're not going to be quite as pervasive as everybody thinks. Social networking is going to have its place, but once we figure it out, it will be, "Okay, yeah." It's going to have its value, but we're just going to move on as far as this revolution goes. I don't think that's going to happen in 2009.

People are going to use it as a marketing opportunity, just like they used email, Web sites and those sorts of things, and now blogging opportunities, but eventually it's just going to fall into place.

There will be a huge explosion in the rogue cloud movement, as you mentioned, Dana, and also the platform-as-a-service (PaaS) space. The architects and CIOs out there are going to be scrambling around trying to figure out how to place governance around that.

Everybody is going to be building applications, typically using free platforms like Google App Engine. They're going to start launching these things into production, and there is going to be no rhyme or reason around how they fit into the existing infrastructure. That's happening now and it's going to happen more in 2009.

In switching gears to SOA, there's going to be a larger focus on inter-domain SOA technology. The focus will still be on the short-term tactical and the ability to provide quick value in the SOA space to justify it, so you can get additional funding.

As we start building these things, people are going to look at the departments that are implementing their SOA projects and try to figure out how to bind these things at an enterprise level. I call this the micro domain versus the macro domain.

Technology doesn't scale typically to that point, as people are finding, and it's going to take a different set of technologies and a different set of architectural skill sets to solve that problem.

On the downside, the jig will be up for poor SOA technology out there. Guys who haven't been able to get acquired or haven't been able to hit that inflection point and are still stumbling along -- typically making $2-$5 million a year and burning about that much in cash -- are going to eventually just going to have the plug pulled. And, 2009 is going to be when it's going to happen. They're just going to run out of steam.

We have a few of them right now. Ultimately, they're going to have lots of cuts, start hemorrhaging cash, and they're just going to go out. Some of them may be bought on the cheap, but the majority of them are just going to shut their doors.

Decline of the SOA buzzword

Finally, the SOA buzzword out there is going to diminish in relevancy. I'm talking about the buzzword, not necessarily the notion of SOA. SOA predates when the buzzword was created, and it's going to postdate when the word "SOA" was created. It's going to morph into different things, and the cloud computing movement is going to get into it and define it in different directions.

Enterprise architecture had a chance to get in there and figure out how SOA relates back into their world. They're been fairly successful in some aspects of it, but they have been too slow in moving. The whole SOA movement is going to be more defined by the cloud. That's good for me and probably for everybody on this call.

Gardner: You predicted a couple of years ago, Dave, that SOA would get subsumed into enterprise architecture. I assume that's what you are talking about?

Linthicum: Yeah, that's what I am talking about. Most SOA is going to get practiced in '09 and '10, at least the new stuff, in the cloud-computing movement, even though it’s still SOA. Basically, It's going to encompass cloud resources. Enterprise architecture will ultimately morph with SOA, and they'll become fundamentally the same concept.

SOA, which has always been an architectural pattern under the domain of enterprise architecture, will be subsumed by enterprise architecture and will be an architectural pattern under enterprise architecture. But, we're not going to be talking as much about SOA in '09.

Gardner: Just one quick follow-up. In terms of startups, you don't seem to think that there is going to be much funding left, no IPOs to speak of. What's the business model for these startups that you're seeing, the ones that can take advantage of PaaS with low upfront costs? How do they get funded? Do they need funding? And, what's their end strategy as a business?

Linthicum: They do need funding, but they don't need as much as funding as a company a couple of years ago, just because of everything you can get on demand. The strategy for the business is basically to glom onto the cloud-computing movement.

Some of the larger enterprises out there, some of my clients who are moving into the cloud-computing space by leaps and bounds, are realizing there are huge holes in the area, such as monitoring, event management, security, data mediation, all these sorts of things that aren't built into the larger cloud providers out there.

They have an immediate demand right now, a pent-up demand that's being created by the desire to lower cost, and driving a lot of these enterprises out into cloud computing. They're seeing these holes, and they are looking for solutions to make these happen. Both the entrepreneurs and the funders have realized that these things exist, and they are scrambling around trying to get them up and running.

As far as funding goes, it doesn't take that much to get a company, the assets, and the infrastructure up and running. Most of these solutions you will find will be leveraging on-demand platforms themselves. So, they'll be coming out of the cloud, providing services to clouds.

Gardner: They might actually find some engineers to hire from all those other startups that went away.

Linthicum: There are a lot of them on the streets right now.

Gardner: All right. Mike Meehan, there must be something we've missed so far.

Meehan: I don't know if there's anything you really missed, but I am going to pretend like you have and try to get some stuff in there.

The first three have to do with the economy, because obviously everybody is dealing with what we expect to be a down economy.

Rise of the 'Yankee Swap'


The first one is going to be a blast from the recent past. If everybody remembers back in 2001, when that recession hit, all of a sudden you could buy wonderful amounts of gear on eBay for next to nothing. I remember talking to one guy who was smiling like a Cheshire Cat, because he had replaced $45,000 worth of Unix with $500 worth of Linux. I think you are going to see a lot of that.

People are going to be shutting down data centers. That's going to cause a glut of servers and storage gear and network gear, and you are going to be able to get it cheap and affordable. That's going to hit the storage and network and server companies.

New sales are going to be tough to come by, because you're going to be able to get previously owned gear at affordable prices.

Gardner: So, a great disruption to the existing channel then?

Meehan: Exactly. It's really going to hit the channel vendors. CIOs are going to be able to come in and say, "Hey, look, I'm genius. I bought all of this stuff for next to nothing." And, there are going to be other CIOs who come in and say, "Hey, you know what. I was able to get some money by liquidating our assets." That financial pressure is going to affect everybody in the hardware market.

Gardner: They use to call it a Yankee Swap. Didn't they?

Meehan: Yeah. I think you are going to see a big international Yankee Swap. So that's going to be out there.

The next one is license wars. The CIOs are coming in, they are going to be asked to cut budget, and there is only so much flesh you can cut out before you have to deal with that maintenance license budget. I think every company in the world is aware of the fact that they pay more in licenses than they want to. They have always theoretically wanted to lower those costs. The pressure now is going to be too great for them to not consider options.

This is going to be great for open source companies, which are going to be able to come in and say, alright, you don't have to pay me a rolling license, here is my support cost, see how much its going to lower your license.

It is going to be bad for Microsoft, because again, to a degree they are becoming commoditized across their portfolio, and that's going to hit them right in the breadbasket.

Gardner: Do you agree with me that in hindsight the fact that Vista didn't live up to its potential is really going to hurt them?

Meehan: Absolutely. There are still companies out there working on Windows 2000, and those companies are going to be looking to switch, that they haven't gone to Vista just makes them a free agent. And this is going to also apply to Office.

Gardner: Whoever that architect was on that Vista project, he's fired, right?

Meehan: I think he's long gone. I think he is running the charitable foundation. They not only missed it, but they reinforced every negative perception of Microsoft when they came out with Vista: The inability to meet a product deadline; the security flaws that have been long associated with Microsoft; you need a zillion patches just to get it to work and do basic things.

Everything that they were supposed to have addressed, they failed to address, and then they reinforced that. Now, companies are just sitting there asking, "Why am I paying this much money for bad software?"

Bad year on the sell side

Gardner: So, it will be a really a good year, if you are a negotiator on the buy side, but a terrible year if you're on the sell side.

Meehan: I'd think so. This should hit some enterprise resource planning (ERP) vendors too. Anybody who can sell SaaS in the ERP market is going to be doing better. I think you are going to see some erosion on the SAP and Oracle side, as far as enterprise apps go.

"Make my life easier or go away." That basically means, users are going to need productivity and ease-of-use integration. You're going to see those in requests for proposals (RFPs). If they're not stated explicitly, they will be there implicitly.

Referring to SOA projects, for example, don't come in and tell me how much work I'm going to have to do to make all of this come together. Come in and tell me how this is going to make my life easier on day one. The companies that can deliver that will be the ones making the sales. The ones who are telling you that you're going to need to do eight months of work to get this up and running are going to be pushed to the back burner.

I really think that's the lure of the Web-oriented stuff. I take issue with the notion of WOA, because I don't necessarily buy into the architecture portion of it, but I do buy into the notion that it makes your life easier. It makes things easier to do. If you are a developer, it can get your stuff up and running quickly. If you can do that in some sort of organized governable fashion, then go with that.

What you're going to see in a lot of the SOA projects out there in particular is, "All right. Make it easy for me to assemble an application. Make it easy for me to reuse my assets. Make it easy for me to modify my existing applications. Make it easy for me to integrate different applications and even information between different divisions of my company."

Gardner: When you say "make it easy," are you talking about governance?

Meehan: I'm actually just talking about the mechanical process of doing it. You almost want it to be governable on the fly. What you really want is that you don't have to dedicate too much time and resources to undertake these functions. Users aren't going to have that much time or that many resources.

For example, imagine I'm a financial-services company and I've picked up a good loan portfolio from a distressed corporate loan company that had to sell their good loans off, because they were distressed, because they had made bad private loans. I got a good package of corporate loans from them. I need to integrate that quickly into my system, otherwise I am not going to be able to effectively govern that. I'm also not going to be able to effectively create the future programs around those customers, which is what I am looking to do.

So, how quickly can I do things now, as opposed to how thoroughly can I do things? You're going to want to be thorough to an extent, but really it's going to be speed to market and speed to end of project that's going to be a determinant in there.

Telecom shakeup. The U.S. government is going to start treating telecom like its our national road system, and you are going to see some serious investment in that area. That's going to become one of the key points in the economic stimulus package that you're going to see.

I also think you are going to see European telcos begin to encroach, either through acquisition or just through offering services into the U.S. market.

The last one, HP buys Sun. Somebody is going to get bought this year, somebody fairly big. I'm saying HP is buying Sun.

Gardner: They don't need to buy them. They can just replace all their servers in the marketplace.

Meehan: Basically.

Gardner: JP Morgenthal, you're up. The predictions swan song. We must be missing something?

Morgenthal: The funny thing is, I have had you on mute, listening to everybody, and struggling, because while this was going on, I had a visit from my media-services-in-the-cloud provider. He had to come set up my new entertainment in-the-cloud service box. We still need people is the point there. So, I found that very interesting and humorous to be going on when everyone was talking about clouds.

Age of reformation

Gardner: You're talking about the cable guy?

Morgenthal: Exactly, the cable guy. The cable guy was here setting up my TiVo box. I'm going to preface my five by saying that I see we're entering into a modern age of reformation, and there are some really interesting things that are going to start occurring this year, moving forward to 2012. I know. It's my own prophecy, and it's out there, hanging on a limb.

My first prediction is that we're going to see a greater focus on the business process. Not business process management (BPM) per se, although initially people will target that thinking they are doing business process, but eventually they will get it.

I think SOA is dead, and I believe companies have no stomach for IT initiatives that cannot immediately be attributed to a value. They're going to do some small-scale business process re-engineering, they're going to get tremendous value from it, and they are going to get it.

They're going to see that simplification is the way to go. Why are we doing all these complex things -- this hooking to that, hooking to this, hooking to that? I can just go into this one box and get everything done there. I don't care that it's not sexy, okay.

The age of disposable computing is here. We have had disposable electronics, disposable cars, and disposable appliances. The age of disposable computing is here.

Number two: The backlash of social networking. We're just on the precipice. Everyone is getting into it, having a little fun. Certain ones of us are on the leading edge. We're already getting bombarded and tired. We're already fried and overloaded from these social networks. The new people think it's a great new toy.

Give it a couple of years and you are going to see a tremendous backlash. You're going to see a rise of firms that will get paid to get people off the grid -- people who made big mistakes in thinking they were having fun during their early social networking experiment.

Gardner: This is sort of like tattoos, but in the cloud?

Morgenthal: Exactly. Angelina Jolie has got to get Bobby off her butt, and it's going to cost her. We're going to start to see that. We'll see the real backlash come into effect in 2010, but we'll start to see forms of it in this coming year.

Third, the pain from the economy is going to impact the open-systems market. We're seeing the rise of what I call the "anti IT." You hit upon that. You read about people reaching into petty cash, doing things on the cheap, finding other ways to get things done.

The one that's going to be the biggest impact is that people are treating open source like free software. That will destroy the open-source market for sure. It's the death knell. It's the stake in the vampire's heart.

People don't get it. I remind every one of my customers of that, when I talk to them, and they ask about an open-source solution. I've got to put my warning out there. Open source is not free software. You're either contributing dollars to the team that's doing it, or you are contributing your time and effort. It's not free software. You just don't take it and use it. That will be the death knell for open source for sure.

Gardner: Wait a minute, a death knell for open source or death knell for commercial open source as a business model?

Morgenthal: That's a good question. I won't differentiate at this point, because I'm looking at it from the perspective of the event horizon, where people are treating it like free software. There is no free lunch. Somewhere it's going to take hold. There's going to be a lack of support or a lack of desire to continue this thing, if people are abusing the system. It happens all the time. Nothing will drive greater abuse of open source than a bad economy, where there are no dollars.

Gardner: Okay. What else have you got?

Morgenthal: Number four: the millennial workforce is starting. This is going to change everything, and it's starting to already. These people have attitude that I haven't seen in a workforce since marketing people came out in the dot-com era.

They definitely feel like, "I want my toys. I want to be able to use my phone at work. I want to use my computer at work. I want to be able to access my sites at work." I see companies dealing with this issue in a unique way.

Their attitude isn't, "If you want a job, then you have to deal with it in our way." It's, "I'm scared. I don't know where I am going to get my workforce for the 21st Century, and I don't know how to deal with these people." Their first inclination isn't to push back with the old adage and the old way of talking about it, saying, "Hey, it's our way or the highway. We've got the money." It's "Okay, what do you want?"

This is going to really change things. How? It's yet to be seen, but clearly the introduction of a much more mobile force, more telecommuters.

Gardner: Most of us.

Morgenthal: That's a lifestyle choice. Yeah, it's pretty interesting. The millennial workforce is going to change things dramatically.

Shift in patent landscape

The last one is that there's a big change coming in Digital Rights Management (DRM) and patent and copyright. It's being lead by this initiative out of Harvard with the Recording Industry Association of America (RIAA). RIAA may have just started a war for everybody in the industry who has any copyright or any patent infringement suit. The judge in case said, "All you people, you big companies with big lawyers and big money, are taking on these poor little schnooks, and it has got to stop. They are coming in here and they don't even know what their legal rights are."

Gardner: Do you think this what Nathan Myhvold is up to?

Morgenthal: I didn't see his name associated with it. It was actually a Harvard law class, I believe, represented by a Harvard law professor [Charles Nesson], backing it. They're representing it as unconstitutional. So this case could be landmark for DRM, copyright infringement, and patent infringement.

Gardner: So, the basic message is kill all the patent trolls.

Morgenthal: It could be, and it would have a tremendous impact going into the potential for a startup economy. Dave talked about the startup economy, where downtime is a great time to start a new company and a great time to get out there and get your technology done early.

Landmark cases like this will do a lot to further the opportunities of these firms to go out there and build something without worrying, "Am I going to get taken out by Microsoft? Am I going to get taken out by Apple? I can't afford that." It's really interesting what could happen, given the cases like this are now falling on the side of the small guy, and not on the side of big companies.

Gardner: Right. Big companies were the victims of the patent trolls, now they are becoming patent trolls themselves.

Morgenthal: Yeah. They're hiring companies to go eat these things up, and then they are going after the small guy. We had multi-million dollar lawsuits over patent infringement for technologies that people hadn't even built or owned. I really think that the greed of Wall Street is also going to see that backlash, and it's going to lead to more of the same, or at least help those cases significantly.

People who have made big money pillaging the system over the years, in the age of reformation, are the ones that are going to get hung in the next two to three years.

Gardner: We're just about out of time. Let's go quickly down our list for any last synthesis insights.

Jim Kobielus, senior analyst at Forrester Research, thanks for joining. What's your synthesis of what you have heard?

Kobielus: My synthesis is that we are living in a very turbulent and volatile time in the industry. Things are changing on many levels simultaneously, and a lot of it will just be hammered by the recession. Approaches like cloud, social networking, and everything will be driven by the need to cut cost and to survive through fiscal austerity for an indefinite period.

Gardner: Tony Baer, senior analyst, Ovum, what's your takeaway?

Baer: It's hard to know where to start, but if there is one way to look at, it's back to basics. There are a lot of complex issues, and I think it's all going to be resolved locally, which in the long run, is going to present a huge governance challenge.

Gardner: Brad Shimmin, principal analyst, Current Analysis, what's your current analysis?

Shimmin: Currently, I'm thinking that the millennial generation and the down economy are converging like a perfect storm to wipe away what we have known for the last 10 years, and then ushering either perfect terror or a great new economy. I'm not sure which yet.

Gardner: Joe McKendrick, independent analyst and blogger, what's your toxic IT prediction?

McKendrick: We're definitely at a turning point. I agree with what everybody is saying out there about growth mode. Dana, I like your observations about the rogue or the shadow IT. You're going to see a lot more of that. It's been predicted for quite a few years actually that IT is going to be less of an entity onto itself and more of a function that's built into business units.

Business people are getting more involved in IT. Business people are getting more savvy about IT. JP talked about the millennial generation. They're very savvy about what IT and the power of IT can provide. We're going to see less of IT as a distinct area of the business and more part of the business, an enabler of the business. This year is going to accelerate that.

Gardner: Dave Linthicum, founder of Linthicum Group, what are you seeing from what you have heard and what's your net-net?

Linthicum: I think it's going to be one of the most exciting couple of years in IT. Just by sheer cost pressure, we're going to have to get down to simplifying and solving some of these issues, and not just playing around with technology. Things are going to get more simplistic, more effective, and more efficient than they have been over the last 20 years of building layer upon layer of complexity. We just can't afford to do that anymore, and now we are going to have to go fix it.

Gardner: Mike Meehan, senior analyst, Current Analysis, any additional takeaways?

Meehan: There's a lot of panic out there, and in keeping with one of the great holiday traditions, I think the winner is going to be Mr. Potter. The future belongs to warped, frustrated old men.

Gardner: He's buying up all those mortgages for pennies.

Meehan: Exactly.

Gardner: Alright. JP Morgenthal, one last go. What do you see from what you have heard on a high-level takeaway?

Morgenthal: Opportunity and fear -- and it's a matter of which one is stronger. I have no prediction as to which will win out. They're both equally powerful right now, and it's going to be, as Dave said, exciting to watch these two clash and see which one wins.

Gardner: I guess my takeaway is that we don't know how long it's going to take, but we will come out of this period. Survive anyway you can, but be mindful that on the other end it's going to be something quite new, with a lot of opportunities, and it's going to look a lot more like Internet time, and the clicks will mean more than the bricks.

Well, thanks all very much. Have a great holiday season. Please take a few days off and relax with your families.

I also want to thank our Charter Sponsor for the BriefingsDirect Analyst Insights Edition podcast series, and that is Active Endpoints, maker of the ActiveVOS visual orchestration system.

This is Dana Gardner, principal analyst at Interarbor Solutions, thanks for listening. Have a good year in 2009, somehow.

Listen to the podcast. Download the podcast. Find it on iTunes/iPod. Charter Sponsor: Active Endpoints.

Special offer: Download a free, supported 30-day trial of Active Endpoint's ActiveVOS at www.activevos.com/insight.

Edited transcript of BriefingsDirect Analyst Insights Edition podcast, Vol. 35,on how analysts see cloud computing, SOA, the economy, and Obama in 2009. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.