Monday, April 07, 2008

XML-Empowered Documents Extend SOA’s Connection to People and Processes

Transcript of BriefingsDirect podcast on XML structured authoring tools and dynamic documents’ extended role in SOA.

Listen to the podcast here. Sponsor: JustSystems.

Dana Gardner: Hi, this is Dana Gardner, principal analyst at Interarbor Solutions, and you’re listening to BriefingsDirect. Today, a sponsored podcast discussion about two large growth areas in IT, and these are two areas that are actually going to coalesce and intersect in a relationship that we are still defining. This is very fresh information.

We're going to talk about dynamic documents. That is to say, documents that have form and structure and that are things end-users are very familiar with and have been using for generations, but with a twist. That's the ability to bring content and data, on a dynamic lifecycle basis, in and out of these documents in a managed way. That’s one area.

The second area is service-oriented architecture (SOA), the means to automate and reuse assets across multiple application sets and data sets in a large complex organization.

We're seeing these two areas come together. Structured documents and the lifecycle around structured authoring tools come together to provide an end-point for the assets and resources managed through an SOA, but also providing a two-way street, where the information and data that comes in through end-users can be reused back in the SOA to combine with other assets for business process benefits.

To help us understand this interesting intersection and the somewhat complex relationship between structured documents and SOA, we are joined by Jake Sorofman. He is the senior vice president of marketing and business development, for JustSystems North America. Welcome to the show, Jake.

Jake Sorofman: Thank you, Dana, great to be here.

Gardner: There has been a lot of comment around SOA. It’s been discussed and debated for some time. What I'm seeing in the market is the need for bringing more assets, more information, more data, and more aspects of application activities into SOA to validate the investment and the growth.

Tell us what it is about SOA, in the sense that it is data-obsessed? What is it that we need to bring more of into SOA to make it valued?

Sorofman: We’ve all heard the statistics for ages about 80-plus percent of all the information and the enterprise being unstructured information, and how it’s contained within documents, reports, and email etc., and doesn't fit between the columns and rows in the database.

That’s the statistic we’ve all grown comfortable with. The reality, though, is that the SOA initiative today and the whole SOA conversation has really centered on structured data, transactional data, and hierarchical data, as opposed to unstructured content that’s stored within these documents. The documents, as they are created and managed today, are often monolithic artifacts and all the information within those artifacts is locked up and isolated from the business services that comprise our SOA.

Our premise is that you need to find new and unique ways to author your content as extensible markup language (XML), to make it more richly described and widely accessible in the context of SOAs, because it’s an important target source for a lot of these services that comprise your SOA applications.

Gardner: So, there are a number of tactical benefits to recognizing the dynamic nature of documents. Then, to me, there is also this strategic benefit from XML enabling them to provide a new stream or conduit between the content within the lifecycle of these documents and then what can be used in applications and composite applications that an SOA underpins. Help us understand the tactical, and then perhaps the strategic, when it comes to a lifecycle of document and content.

Sorofman: That’s a really good way to think about it. A lot of companies will take on this notion of XML authoring from a tactical perspective. They are looking for new and improved ways to accelerate the creation, maintenance, quality, and consistency of the content that they produce.

It could be all their branded language, all their lock-down regulated language, various technical publications, etc. They need to streamline and improve that process. So, they embrace XML authoring tools as the basis for creating valid XML, to manage the lifecycle of those documents and deliverables.

What they realize in the process of doing so is that there is a strategic byproduct to creating XML content. Now, it’s more accessible by various line-of-business applications and composite applications that can consume it much more readily.

So, it’s enriching the corpus that various applications can draw from, beyond traditional or relational databases, and allowing this more unstructured content to be more widely accessible.

Gardner: In the past, we’ve seen this document management and content management value through some very large, complex, cumbersome, and frankly expensive, standalone management infrastructure that would, in a sense, find a way of bringing these structured and the unstructured worlds together. It seems to me you’ve found a quicker and more direct way of doing this, or am I overstating it?

Sorofman: I think that’s largely right, to the extent that, at author time, the content is created as XML, particularly when that XML is organized within a taxonomy that makes some sense and makes it discoverable in context. Then, that content can just be reused. It can be reused like any other data asset that’s richly described and that doesn’t require heavyweight infrastructure or sizable strategic investments in content infrastructure.

Gardner: Another thing that fascinates me about this topic is a problem with SOA, and that has been the disconnect between the people and the processes that the IT systems can support. We've heard it referred to as "human-oriented architecture," versus SOA. The people that are in the trenches, that are in maintenance types of activities, that are in a highly compliance-oriented environments, need to adhere very closely to regulations, and that the documents become the way that they do that.

It seems to me that if you take the documents that these people thrive on and create en masse, and make those available to the SOA and the composite business processes that that architecture is supporting, then you are able to bridge, this gap between the people, the process, and the systems. Help me understand that a little better.

Sorofman: That makes a great deal of sense. Thus far we’ve been talking about the notion of unstructured content as a target source to SOA-based applications, but you can also think about this from the perspective of the end application itself -- the document as the endpoint, providing a framework for bringing together structured data, transactional data, relational data, as well as unstructured content, into a single document that comes to life.

Let me back up and give you a little context on this. You mentioned the various documents that line workers, for example, need to utilize and consume as the basis for their jobs. Documents have unique value. Documents are portable. You can download a document locally, attach it to an email, associate it with a workflow, and share it into a team room. Documents are persistent. They exist over a period of time, and they provide very rich context. They're how you bring together disparate pieces of information into a cohesive context that people can understand.

Documents allows information to stand alone. They're how knowledge is transferred, and how information is shared between people. Those are all the good things about documents. But, historically, documents have been a snapshot in time. So, even when you have embraced an XML publishing processes, the documents as published as a static artifact. It’s a snapshot in time. As the information feeding these documents changes, what you see within the documents as a published artifact is effectively out of date.

Gardner: I suppose one way that people have gotten around that is to create portals and Web applications, where there is a central way of controlling the data that gets distributed through many views and could be updated. I suppose there must be some drawbacks to the portal perspective. What do we do in here? We take in the best of a Web and portal application and the best of a document and try to bring them together?

Sorofman: Bingo! It’s really about blurring the lines between documents and data or documents and applications. So the portability, the persistence, and the rich context of a document, because documents matter and sometimes and on the glass portal-style application experience, is just not a substitute for what you need out of the document.

But, providing a container for much more dynamic and interactive information and ensuring what you find in that document is always authoritative is just the direct reflection of the sources of truth in the enterprise. All this information is introduced as a set of persistent links back to the sources of record. What you are looking at isn’t an embedded snapshot. You are looking at a reflection of these various systems of record.

Gardner: I was reminded of the importance of the format of a document, just recently when I was doing some tax forms. It’s fine for me to have all this information on my computer about the numbers and the figures, but I have to then present that back to the IRS through this very refined and mandatory format. I need to bring these two together, and, once I have done that, I can see that the IRS is benefiting from the standardization that the format and document brings, and I am of course benefiting from the fact that I can bring fresh data into that.

But, we are now proposing instead these documents that hold value based on their format, their taxonomy, their relevance to a specific regulatory impetus or a vertical industry imperative. What we get beyond that is not just bringing that data from a Web application out, but from perhaps myriad applications and/or this entire SOA, and using the policy-driven benefits of an enterprise service bus (ESB) and governance to help direct the right data to the right document.

Sorofman: Absolutely. The other thing that I mentioned is making these documents semantically aware. The document actually becomes intelligent about its environment. It knows who you are as a user, what your role is, what your permission profile is.

Gardner: And that’s because of the XML that they can make that leap to intelligence?

Sorofman: Well, it’s actually because of the various dynamic document formats that are emerging today, including xfy from JustSystems. We provide the ability to embed this application logic within the document format. The document becomes very attuned to its environment, so it can render information dynamically, based on who you are, what your role is, and where the document is within a process. It can even interact with its environment. The example I would like to use is interactive electronic technical manuals (IETM) for aerospace and defense. These are all the methods and procedures for maintaining the aircraft, often very, very complex documents.

Gardner: We're talking about large tomes, not just a document, but really a publication.

Sorofman: Exactly, and there are really a couple of different issues at work here. The first is the complexity of a document makes it very difficult to keep it up to date. It’s drawing from many different sources of record, both structured and unstructured, and the problem is that when one of the data elements changes, the whole document needs to be republished. You simply can’t keep it up-to-date.

This notion of the dynamic documents ensures that what you’re presenting is always an authoritative reflection of the latest version of the truth within the enterprise. You never run the risk of introducing inaccurate, out of date, or stale information to field base personnel.

The second issue is pinpointing the information that someone needs in the context of the task they are performing, so, targeting the information appropriately. You can lose valuable minutes and hours by thumbing through manuals and trying to find the appropriate protocols for addressing a hydraulic fluid leak, for example.

The environment can actually ping the document. For example, a fault is detected in-flight, and the fault detection that happens in real time can actually interact with the document itself, ping the document, and serve up the set of methods and procedures that represent the fix that needs to be made when the plane reaches its destination. The maintenance crew can start picking the parts and preparing to make fix before the plane lands.

Gardner: It almost sounds like we are bringing some of the benefits that people associate with search into the realm of documents, because they are now structured XML-published and authored documents. There’s XML integration among and between them and their sources. You could do a search and not just come up with an 800-page document, but a search within discrete aspects of that document.

Sorofman: That’s exactly right. You start seeing some blurring some between all these categories of technology around information, search and retrieval, semantics, and document management and data integration. It’s all resulting in a much a richer way of working with and utilizing information.

Gardner: So, we are bringing together what had been document management, content management, data integration, data mashups, compound documents, forms, and requirements for regulatory compliance. That’s why I think it relates to SOA so well.

We're finding a commonality between these, rather than having them be completely separate things that only people physically shuffling complex documents around their desktops could manage. We're starting to automate and bring the IT infrastructure to help in this mixing and matching between these formally siloed activities.

Sorofman: Yes, pretty much so.

Gardner: Alright. One of the things that is a little bit complex for me is understanding the way that the content, the XML, and the data flows among and between documents, and then also how it could flow within the SOA. I think this is still a work in progress. We are really on the cutting edge of how these two different areas come together.

Maybe we could go a little bit into the blue-sky realm for a moment. How do you think the SOA architects should start thinking about dynamic documents, and, then perhaps conversely, how should those that are into structured document authoring start thinking of how that might benefit a larger SOA type of activity?

Sorofman: Great questions. To start with, I don’t think that SOA architects have given a great deal of thought to date to unstructured content and how it plays into SOA architectures. So, there certainly needs to be consideration paid to how you get the information in, in a way that makes it rich to describe, reusable, more akin to relational data than documents themselves.

Structured authoring needs to be part of the thinking around any company’s knowledge management (KM) strategy in general and with a specific importance around how it feeds into the overall SOA strategy. Today, I don’t think that there has really been an intersection between KM and SOA in this respect.

Structured authoring professionals need to start looking beyond their traditional domain of technical publications and into other areas where XML authoring is relevant and appropriate in the enterprise. That’s becoming much more broadly deployed and considered outside of traditional domains of tech-docs.

There’s also this convergence that’s happening between structured documents, structured authoring, and application development, particularly as it relates to this notion of dynamic documents that we are talking about. The creation of business-critical documents becomes much more akin to application development processes, where you are essentially assembling various reusable fragments and components from across the enterprise, into a document that’s really treated more like an application than a monolithic artifact itself, an application that has its own life cycle and needs to be treated and governed in more of an adaptive centric way. So, it’s starting to really impact people’s role in thinking, both from the architect side and on the traditional structured authoring side.

Gardner: Sure, it’s really about people, process, and policy coming together, not just inside the domain of IT, but in the domain of where people actually do their work and where they have traditionally done work for generations.

Sorofman: Very true.

Gardner: Okay, I think I get it now. But to help better understand this it's not just "tell." A lot of times it helps to "show." Can you give me some examples in the real world, where people are starting to move towards these values, where there are some use-case scenarios around dynamic documents extending beyond the document function and getting into application development too?

Sorofman: Absolutely. There are three usage patterns I like to speak about that are illustrative of dynamic documents and how they are being applied today. The first I call "information sharing" sort of broadly. It’s the idea of one-to-many dissemination of information in the form of a document, to various distributed field-based personnel.

A good example of that is the IETM, any kind of business-critical technical manual or a publication that needs to be shared with a variety of different people and where there is a very high cost of that information being either poorly targeted or easily out of date.

This is the idea of bringing together all these different information sources mashed up into the single dynamic document that comes to life. So, as the source information changes, what you see in that document changes and it also has the ability to be semantically intelligent about its environments, about the person who is accessing it, so it can render a view of information that’s appropriate to the context of its usage.

The second example is really taking the same concept of dynamic documents and applying it to collaborative processes, where you need to bring together various stakeholders internally and externally toward the goal of getting some sort of team based process executed or completed.

Think about something like sales and operations planning (S&OP), where you have various stakeholders cross functionally come together periodically, maybe monthly or quarterly, to make trade-off decisions, horse-trading decisions, about which projects to invest in and which ones to disinvest in, how to optimally align, supply and demand.

That’s typically the sales and marketing group, the manufacturing group with a view of capacity and a view of inventory, and then the finance team with a view of a return on it's investment, return on assets, and internal rate of return. These teams are coming together to work on making these decisions, and they often do this by sharing documents. They pull reports from all their various systems of record, manufacturing execution systems, inventory control systems, ERP systems, supply chain, CRM.

Even though these systems have fairly authoritative trustworthy information within them, as soon as you pull a report, it’s frozen in time. So, these teams tend to wrestle with validating and reconciling all this disconnected and static information, before they can make decisions. The dynamic document allows all this information to come together as an authoritative reflection of all these different source systems, but still allows these teams to work in the format they are most comfortable with, which is to say, documents.

Gardner: Because there is a semantic and intelligent aspect of this, this content has been shared collaboratively and would present itself to each of these individuals through a different document format, based on what it is that they are doing within their traditional role.

Sorofman: That’s exactly right. It will serve itself up dynamically, based on what’s appropriate for stakeholders to see, based on their permission profile or on their role. It could be a different level of abstraction or a little different level of detail. It can actually change the information that’s being displayed, based on where it is in a workflow process. The document can actually become aware of its workflow lifecycle state and render a different information based on where it’s been, where it’s going, and where it is in the process.

Gardner: This is strikingly different than what's done by many organizations that I am aware of. They have one big spreadsheet that everyone shares, which really is sort of one-size-fits-all, which isn’t the way people really work.

Sorofman: Everyone has had some experience with spreadsheets gone wrong and the high cost and perverse consequences of trying to force-fit spreadsheets into critical planning process. So, I think most people can empathize with this specific challenge.

Gardner: Alright. Let’s talk about the business case for this. Now, it sounds good theoretically. We’ve certainly got a technology that can help this productivity improvement by extending data in the formats that people are familiar with. There is compliance, and regulatory and risk reduction as a result.

And, of course, as we mentioned earlier, there is the sharing and repurposing and reusing of this across the SOA value stream in the business. But, dollars and cents, how do people go and say, “Wow, this sounds like a good idea. I want to convince somebody to invest in it, but I need to talk to them about return on investment.”

Sorofman: You can make a business case for this sort of approach from a very basic to a much more sophisticated level. At the most basic level, the ROI around XML authoring is pretty straightforward. Rather than creating document authoring as sort of monolithic artifacts, creating them as reusable components helps to accelerate and reduce the cost of creating new documents and deliverables, and it makes information much more reusable. That has a cost implication and a time-to-market implication.

If, for example, you are launching a product that’s highly dependent on documentation -- and documentation is typically one of the things that we do at the end of the product launch cycle -- that becomes a bottleneck that can have implications for revenue that’s foregone, excessive cost, and missed deadlines, etc.

There is also an issue around localization, multi-format output, and multi-channel output of this various content, taking the content, translating it into different languages and into different output formats.

Gardner: Localization. So, you have the same document format, but the input and output can be in a variety of different languages.

Sorofman: That’s exactly right.

Gardner: That would save a lot of time and money. Instead of the full soup-to-nuts translation, you only have to translate exactly the metadata that’s required.

Sorofman: That’s exactly right, and that’s a tremendous ROI. There are many companies that look at the ROI of XML authoring exclusively from the perspective of localization, and it’s often said to have between a 40 and 60 percent cost impact on localization itself.

Gardner: In fact, you are automating a large portion of the translation process.

Sorofman: Yes. Also, think about the change time implications of what we are talking about. In the traditional monolithic model, when you need to make changes to documentation, you are making changes across all the various documents that consume information fragments in all the various formats, in all the various localized versions, and all the derivations and permutations of an information source. That becomes extremely complex, extremely costly, and error prone.

In the XML authoring world, you are authoring once, publishing many times, and maintaining a single native format. So, you are maintaining that one reusable component and allowing those changes to be propagated across all the various consuming documents and deliverables.

Gardner: And, because we are doing this separation, it also strikes me that there is a security benefit here. One of the things that troubles a lot of IT folk and keep them up at night is the idea that there are different versions and copies of full-blown documents and databases, in some cases, on each and every PC and laptop, some of which may disappear in an airport. It strikes me that by separating this out, what might only go into some notorious hands at the airport would be the form, but not the data.

Sorofman: It’s a great point.

Gardner: So, there’s a security benefit here as well, when you are able to control things, and not have all the dynamic data distributed at the end point, but, in a sense, communicated to that end point when it’s the right time.

Sorofman: Absolutely. I guess the benefits we are looking at are really these sorts of operational benefits of the XML authoring and how that impacts the bottom line and time to market etc. There are also bigger benefits that come from the actual consumption of dynamic documents, and how you ensure that you are only putting information in the hands of the people that need it, that it's always up-to-date.

That clearly has an implication for risk and compliance in many different application areas, and accelerating, improving, and optimizing business processes by eliminating the error introduction that comes from the re-keying of information between disconnected process steps, where documents are involved.

Gardner: So the human error factor goes down as well?

Sorofman: Dramatically.

Gardner: How does that work exactly?

Sorofman: Let me give you a quick example of one of the other usage patterns that’s worth speaking about. It's what I like to call "document process transformation." If you think about any business process flow, there are typically silos of automation, and these are the flows within the process that are highly tuned, very transactional, with virtually no human intervention.

They are highly automated, because they can be. Everything can be reduced down to a transaction and thus handled by machines, but then there are manual gaps between these silos of operations or automation that often eliminate, or at least erode, some of the benefits of automation.

This is typically highly human-centric phases of a process, often very document centric. It’s where people need to get involved. For example, if you think of a loan application, in the front end of the application there is a form. It’s very form based, and it’s about capturing information about the applicant.

Some of the information can be handled transactionally, so the form is able to send the information to a back-end system where it’s processed transactionally, but some of the information needs to be viewed and analyzed by human beings, who actually have to look at it in context and make a judgment about the applicant.

In the front end of the form, it becomes a transaction, and then it needs to be served up as a set of document renditions, based on the various personal roles within the process that needs to be viewed to make a judgment about the loan.

The document can actually morph as it moves through the process, based on what that person needs to see or what’s appropriate for them to see. At the end of the process, a judgment is made about the loan. It’s either approved or it’s rejected and it becomes a transaction again.

The information can be extracted from the document set itself automatically, pulled down to a back-end process, like "open the account," the account opening procedure, and then information can be extracted from the document set to serve a traditional publishing pipeline to send a custom acknowledgment letter back to the applicant, welcoming them to the bank, and letting them know that the loan has been approved.

So, you've gone from silos of automation separated by manual gaps, to a much more streamlined and straightened process, where you have transactions driving document renditions and document renditions driving transactions.

Gardner: This is a great example of why this is relevant for SOA. First off, you're talking about how the human input of the data needs to be improved -- and that’s the garbage-in, garbage-out value. If you are going to be reusing this data across multiple applications, you want to make sure that‘s good data to begin with. So, that’s one value.

The other is this controlled workflow, an event-driven workflow, which again is part of what people are trying to build that SOAs will support, these composite workflow process oriented types of activities that are very much core to any business.

Then, the last fascinating aspect is the notion that we are combining what needs to be a human judgment with what is going to be a computer-driven process. These dynamic documents in a sense giving little stop signs that say, “Stop, wait, let the human activity take place.” The human can relate back to the document, the document relates back to the business process, the business process is managed and directed through the SOA.

Sorofman: That’s exactly right. As long as people are involved, there will be documents, but traditionally documents have been fairly unintelligent and inefficient in how they have been authored, organized, managed, and used as a basis for consuming information. This is just what documents have always wanted to be.

Gardner: I dare say that documents have been under-appreciated in the context of SOA.

Sorofman: I couldn’t agree more.

Gardner: Well, great! Thanks for shedding some more light on these issues. Tell us a little bit about how JustSystems works its value in regard to the dynamic documents that are now holding much more relevance in a larger SOA.

Sorofman: JustSystems has two product lines that are very relevant to this discussion. The first is the product called XMetaL, which is one of the leading structured authoring and publishing solutions that provides the basis for creating valid XML content, as part of the authoring process. I mentioned this idea of being able to create valid XML, as opposed to monolithic document artifacts at author time. This provides a basis for both technical authors, but also business authors. It’s sort of the occasional contributor or the subject matter expert, the accidental author, the occasional author to create valid XML without ever seeing an angle bracket, so a very intuitive WYSIWYG environment for creating XML as a byproduct of a very intuitive authoring process.

That’s how you feed the beast, how you get the XMLs into the systems, to make it much more richly described and more reusable, this part of downstream processes.

On the other side of the equation, we have a product line called xfy, which is a document-centric composite application framework that allows you to bring together all these various information sources, structured and structured, and mash them up within a single dynamic document application.

It’s blurring the lines between documents and applications, providing the user experience that people appreciate from a document, but with the authoritative, dynamic and interactive information that has been most closely associated with traditional business applications, the document becomes the application.

Gardner: Of course, we are using XML, which is a standardized markup language. We are also going to be using vertical industry taxonomies and schemas that are shared, and, therefore, this is a fairly open opportunity to share and communicate and collaborate.

Sorofman: That’s right.

Gardner: Well great! Thanks again. We’ve been talking about XML empowerment of documents and how to extend Services-Oriented Architecture’s connection to people and process through these types of documents and structured authoring tools. To help us to understand this, we have been talking with Jake Sorofman. He is the senior vice president of marketing and business development at JustSystems North America. Thanks for joining us, Jake.

Sorofman: Thank you, Dana.

Gardner: This is Dana Gardner, principal analyst at Interarbor Solutions. You have been listening to a sponsored BriefingsDirect podcast. Thanks, and comeback next time.

Listen to the podcast here. Sponsor: JustSystems.

Transcript of BriefingsDirect podcast on XML structured authoring tools and dynamic documents’ role in SOA. Copyright Interarbor Solutions, LLC, 2005-2008. All rights reserved.