ftit channel bar
FT IT - November 7 2001
The key to successful searches
By Paola di Maio
Published: November 5 2001 13:03GMT | Last Updated: November 5 2001 15:30GMT
image

In a world where corporate data is proliferating at an unprecedented rate, the cost of managing information is very high. But the cost of not managing it is even higher.

To support and optimise knowledge organisation in complex environments, a new class of IT products is gaining popularity with enterprises: taxonomy creation software.

Taxonomies, normally associated with the plant and animal world, are classification structures, and their definition can vary. Jim Nisbet, senior vice-president of California-based Semio, describes a taxonomy as a "systematic classification of a conceptual space".

Semio is one of the leading vendors in the emerging "information categorisation solutions" market. Other big names include Autonomy and Verity, and a string of smaller, specialist companies are active too.

The technologies used by these vendors vary widely, (see separate article on solutions providers). However, all the vendors believe that a well-defined taxonomic structure, used as part of a corporate information system, can address some of the inefficiencies caused by the limitations and imperfections of current search and retrieval methods.

As San Francisco-based EoExchange, another of the vendors in this field, points out, corporations are implementing business-to-employee portals with the common goal of helping employees find information.

Today's portal solutions, it says, "present employees with multiple search boxes that provide only a lens into the information contained in enterprise applications. What staff really want is universal search - the ability to search across all information sources at once".

Taxonomies, says EoExchange, are the cornerstone of a successful universal search application. "The fundamental purpose of a taxonomy is to lay structure over content so it can be categorised and organised to improve relevancy of search results."

However, it is still early days for these technologies. "Most products are still untested on a large scale, and the market is still small," says Dan Rasmus of Giga Information Group.

"The lack of accuracy of classification has been the most noted reason to date for enterprises to suspend their effort in this area.

"There is a clear trend towards the adoption of more structured data models, contributing to a proliferation of products in the market," adds Mr Rasmus.

There is general agreement about what an ideal taxonomy should offer. "It should be user-oriented, transparent and be able to sustain development," says Alan Gilchrist, senior associate with UK-based TFPL, an information management solutions provider and consultancy.

"Among essential functionalities we find the automatic categorisation of text topics by machine analysis, coupled with editing facilities to allow human intervention at any stage of a project."

"It is important to remember that a taxonomy should be built around users' requirements," agrees Bob Ainsbury, chief executive at EoExchange.

"A well-designed end product will demand the expertise of different professional figures within the organisation, such as domain expert, search and portal expert, classification expert as well as IT and business managers."

A good taxonomy, he adds, should include metadata or "data about data", providing information about the content of a document when available, yet still classify documents that are poorly tagged or described.

It is important, says Mr Ainsbury, that taxonomy developers have "a solid understanding of not only the semantic meaning and context of the content being classified, but also the applications and technologies environment used to build and support the taxonomy".

One important use of taxonomies in the corporate world is to support the "reasoning" that allows knowledge management functions such as elicitation and storage, to be partially automated.

Another crucial business problem, where management often fails to succeed and which technology may help to solve, is the integration of information systems after the merger of two large company, or the separation after a de-merger. A well-defined taxonomic structure will make those processes more manageable.

One of the most significant projects in this area that Mr Gilchrist recalls is the combination of various sets of scientific thesauri required by the merger in 1995 of Glaxo and Wellcome, the two large UK pharmaceutical companies.

Taxonomies can also help in the creation of an explicit and functional map of an organisation's knowledge base.

This facility will aid the valuation and management of its digital assets and intellectual property, a task often vastly underestimated by managements.

However, using taxonomies in the corporate search effort is not without its problems. For a start, taxonomy creating software can start operating only after humans have defined an ontology - a high level categorisation hierarchy - for the relevant domains, says Mr Nisbet at Semio.

The need for human intervention in the categorisation process seems widely accepted, as most emerging technologies tend to integrate human editing functionalities with default automatic categorisation functions.

Secondly, the limitations of knowledge representation systems are widely acknowledged. "Hybrid" data, which cannot easily be classified in one category or another, does not fit well in predefined structures, and it could be impractical to expand the taxonomic hierarchy indefinitely to accommodate for the exceptions.

Finding the optimal level of "data granularity" - the ratio between generalization and detail - is also a challenge that requires an element of arbitrary and skilful human ability.

Furthermore, taxonomies are relatively static structures unless they are linked dynamically to data sources. "They provide a datum point or semi-permanent structure, but to be effective must be capable of being updated. These problems are not insoluble," says Mr Gilchrist.

Finally, there are the traditional problems associated with integrating new technologies with legacy systems. This will be a continuing challenge for the software industry as more companies seek to add taxonomies to their existing information system infrastructures.

Paola di Maio is editor of Content-Wire: www.content-wire.com