Friday, March 16, 2012

Information Agility and Data Virtualization: A Thought Experiment

In an agile world, what is the value of data virtualization?

For a decade, I have been touting the real-world value-add of what is now called data virtualization – the ability to see disparate data stores as one, and to perform real-time data processing, reporting, and analytics on the “global” data store – in the traditional world of IT and un-agile enterprises. However, in recent studies of agile methodologies within software development and in other areas of the enterprise, I have found that technologies and solutions well-adapted to deliver risk management and new product development in the traditional short-run-cost/revenue enterprises of the past can actually inhibit the agility and therefore the performance of an enterprise seeking the added advantage that true agility brings. In other words: your six-sigma quality-above-all approach to new product development may seem as if it’s doing good things to margins and the bottom line, but it’s preventing you from implementing an agile process that delivers far more.

In a fully agile firm – and most organizations today are far from that – just about everything seems mostly the same in name and function, but the CEO or CIO is seeing it through different eyes. The relative values of people, processes, and functions change, depending no longer on their value in delivering added profitability (that’s just the result), but rather on the speed and effectiveness with which they evolve (“change”) in a delicate dance with the evolution of customer needs.

What follows is a conceptualization of how one example of such a technology – data virtualization – might play a role in the agile firm’s IT. It is not that theoretical – it is based, among other things, on findings (that I have been allowed by my employer at the time, Aberdeen Group, to cite) about the agility of the information-handling of the typical large-scale organization. I find that, indeed, if anything, data virtualization – and here I must emphasize, if it is properly employed – plays a more vital role in the agile organization.

Information Agility Matters

One of the subtle ways that the ascendance of software in the typical organization has played out is that, wittingly or unwittingly, the typical firm is selling not just applications (solutions) but also information. By this, I don’t just mean selling data about customers, the revenues from which the Facebooks of the world fatten themselves on. I also mean using information to attract, sell, and follow-on sell customers, as well as selling information itself (product comparisons) to customers.

My usual example of this is Amazon’s ability to note customer preferences for certain books/music/authors/performers, and to use that information to pre-inform customers of upcoming releases, adding sales and increasing customer loyalty. The key characteristic of this type of information-based sale is knowledge of the customer that no one else can have, because it involves their sales interactions with you. And that, in turn, means that the information-led sell is on the increase, because the shelf life of application comparative advantage is getting shorter and shorter – but comparative advantage via proprietary information, properly nurtured, is practically unassailable. In the long run, applications keep you level with your competitors; information lets you carve out niches in which you are permanently ahead.

This, however, is the traditional world. What about the agile world? Well, to start with, the studies I cited earlier show that in almost any organization, information handling is not only almost comically un-agile but also almost comically ineffective. Think of information handling as an assembly line (yes, I know, but here it fits). More or less, the steps are:

1. Inhale the data into the organization (input)

2. Relate it to other data, including historical data, so the information in it is better understood (contextualize)

3. Send it into data stores so it is potentially available across the organization if needed (globalize)

4. Identify the appropriate people to whom this pre-contextualized information is potentially valuable, now and in the future, and send it as appropriate (target)

5. Display the information to the user in the proper context, including both corporate imperatives and his/her particular needs (customize)

6. Support ad-hoc additional end-user analysis, including non-information-system considerations (plan)

The agile information-handling process, at the least, needs to add one more task:

7. Constantly seek out new types of data outside the organization and use those new data types to drive changes in the information-handling process – as well, of course, as in the information products and the agile new-information-product-development processes (evolve).

The understandable but sad fact is that in traditional IT and business information-handling, information “leaks” and is lost at every stage of the process – not seriously if we only consider any one of the stages, but (by self-reports that seem reasonable) yielding loss of more than 2/3 of actionable information by the end of steps one through six. And “fixing” any one stage completely will only improve matters by about 11% -- after which, as new types of data arrive, that stage typically begins leaking again.

Now add agile task 7, and the situation is far worse. By self-reports three years ago (my sense is that things have only marginally improved), users inside an organization typically see important new types of data on average ½ year or more after they surface on the Web. Considering the demonstrated importance of things like social-media data to the average organization, as far as I’m concerned, that’s about the same as saying that one-half the information remaining after steps one to six lacks the up-to-date context to be truly useful. And so, the information leading to effective action that is supplied to the user is now about 17% of the information that should have been given to that user.

In other words, task 7 is the most important task of all. Stay agilely on top of the Web information about your customers or key changes in your regulatory, economic, and market environment, and it will be almost inevitable that you will find ways to improve your ability to get that information to the right users in the right way – as today’s agile ad-hoc analytics users (so-called business analysts) are suggesting is at least possible. Fail to deliver on task 7, and the “information gap” between you and your customers will remain, while your information-handling process falls further and further out of sync with the new mix of data, only periodically catching up again, and meanwhile doubling down on the wrong infrastructure. Ever wonder why it’s so hard to move your data to a public cloud? It’s not just the security concerns; it’s all that sunk cost in the corporate data warehouse.

In other words, in information agility, the truly agile get richer, and the not-quite-agile get poorer. Yes, information agility matters.

And So, Data Virtualization Matters

It is one of the oddities of the agile world that “evolution tools” such as more agile software development tools are not less valuable in a methodology that stresses “people over processes”, but more valuable. The reason why is encapsulated in an old saying: “lead, follow, or get out of the way.” In a world where tools should never lead, and tools that get out of the way are useless, evolution tools that follow the lead of the person doing the agile developing are priceless. That is, short of waving a magic wand, they are the fastest, easiest way for the developer to get from point A to defined point B to unexpected point C to newly discovered point D, etc. So one should never regard a tool such as Data Virtualization as “agile”, strictly speaking, but rather as the best way to support an agile process. Of course, if most of the time a tool is used that way, I’m fine with calling it “agile.”

Now let’s consider data virtualization as part of an agile “new information product development” process. In this process, the corporation’s traditional six-step information-handling process is the substructure (at least for now) of a business process aimed at fostering not just better reaction to information but also better creation and fine-tuning of combined app/information “products.”

Data virtualization has three key time-tested capabilities that can help in this process. First is auto-discovery. From the beginning, one of the nice side benefits of data virtualization has been that it will crawl your existing data stores – or, if instructed, the Web – and find new data sources and data types, fit them into an overall spectrum of data types (from unstructured to structured), and represent them abstractly in a metadata repository. In other words, data virtualization is a good place to start proactively searching out key new information as it sprouts outside “organizational boundaries”, because they have the longest experience and the best “best practices” at doing just that.

Data virtualization’s second key agile-tool capability is global contextualization. That is, data virtualization is not a cure-all for understanding all the relationships between an arriving piece of data and the data already in those data stores; but it does provide the most feasible way of pre-contextualizing today’s data. A metadata repository has been proven time and again to be the best real-world compromise between over-abstraction and zero context. A global metadata repository can handle any data type thrown at it. For global contextualization, a global metadata repository that has in it an abstraction of all your present-day key information is your most agile tool. It does have some remaining problems with evolving existing data types; but that’s a discussion for another day, not important until our information agility reaches a certain stage that is still far ahead.

Data virtualization’s third key agile-tool capability is the “database veneer.” This means that it allows end users, applications, and (although this part has not been used much) administrators to act as if all enterprise data stores were one gigantic data store, with near-real-time data access to any piece of data. It is a truism in agile development that the more high-level the covers-all-cases agile software on which you build, the more rapidly and effectively you can deliver added value. The database veneer that covers all data types, including ones that are added over time, means more agile development of both information and application products on top of the veneer. Again, as with the other two characteristics, data virtualization is not the only tool out there to do this; it’s just the one with a lot of experience and best practices to start with.

And, for some readers, that will bring up the subject of master data management, or MDM. In some agile areas, MDM is the logical extension of data virtualization, because it begins to tackle the problems of evolving old data types and integrating new ones in a particular use case (typically, customer data) – which is why it offers effectively identical global contextualization and database-veneer capabilities. However, because it does not typically have auto-discovery and is usually focused on a particular use case, it isn’t as good an agile-tool starting point as data virtualization. In agile information-product evolution, the less agile tool should follow the lead of the more agile one – or, to put it another way, a simple agile tool that covers all cases trumps an equivalently simple agile tool that only covers some.

So, let’s wrap it up: You can use data virtualization, today, in a more agile fashion than other available tools, as the basis of handling task 7, i.e., use it to auto-discover new data types out on the Web and integrate them continuously with the traditional information-handling process via global contextualization and the database veneer. This, in turn, will create pressure on the traditional process to become more agile, as IT tries to figure out how to “get ahead of the game” rather than drowning in a continuously-piling-up backlog of new-data-type requests. Of course, this approach doesn’t handle steps 4, 5, and 6 (target, customize, and plan); for those, you need to look to other types of tools – such as (if it ever grows up!) so-called agile BI. But data virtualization and its successors operate directly on task 7, directly following the lead of an agile information-product-evolution process. And so, in the long run, since information agility matters a lot, so does data virtualization.

Some IT-Buyer Considerations About Data Virtualization Vendors

To bring this discussion briefly down to a more practical level, if firms are really serious about business agility, then they should consider the present merits of various data virtualization vendors. I won’t favor one against another, except to say that I think that today’s smaller vendors – Composite Software, Denodo, and possibly Informatica – should be considered first, for one fundamental reason: they have survived over the last nine years in the face of competition from IBM among others.

The only way they could have done that, as far as I can see, is to constantly stay one step ahead in the kinds of data types that customers wanted them to support effectively. In other words, they were more agile – not necessarily very agile, but more agile than an IBM or an Oracle or a Sybase in this particular area. Maybe large vendors’ hype about improved agility will prove to be more than hype; but, as far as I’m concerned, they have to prove it to IT first, and not just in the features of their products, but also in the agility of their data-virtualization product development.

The Agile Buyer’s Not So Bottom Line

Sorry for the confusing heading, but it’s another opportunity to remind readers that business agility applies to everything, not just software development. Wherever agility goes, it is likely that speed and effectiveness of customer-coordinated change are the goals, and improving top and bottom lines the happy side-effect. And so with agile buyers of agile data-virtualization tools for information agility – look at the tool’s agility first.

To recap: the average organization badly needs information agility. Information agility, as in other areas, needs tools that “follow” the lead of the person(s) driving change. Data virtualization tools are best suited for that purpose as of now, both because they are best able to form the core of a lead-following information-product-evolution toolset, and because they integrate effectively with traditional information-handling and therefore speed its transformation into part of the agile process. Today’s smaller firms, such as Composite Software and Denodo, appear to be farther along the path to lead-following than other alternatives.

Above all: just because we have been focused on software development and related organizational functions, that doesn’t mean that information agility isn’t equally important. Isn’t it about time that you did some thought experiments of your own about information agility? And about data virtualization’s role in information agility?

No comments: