telecoms-bar
FT Telecoms May 16 2001 - Voice recognition
VoiceXML is key to new web voice services
by Edwin Colyer
Published: May 14 2001 10:04GMT | Last Updated: May 16 2001 09:52GMT
image

First it was personal computers, then access to the internet. Now everyone wants a mobile phone. But all-pervasive technology does not just depend on clever marketing. Only when manufacturers agree on common standards do sales really take off.

Without international telephony standards, for instance, we would never make a call abroad. Without HTML (hypertext mark-up language), the world wide web would just be a handful of unconnected sites. And without VoiceXML, the development of a speech-powered internet would be the plaything of a few large corporations.

VoiceXML is a language for writing internet pages accessible by voice over the phone. Just as HTML is used to design visual web pages - where to put text and pictures, where links will take you - VoiceXML lays out their spoken counterparts.

It specifies when to play voice prompts, what speech to recognise, how to respond to touch-tone inputs and provides basic commands to control call connections. Most importantly, VoiceXML uses the same technology as the web. It connects the telephone to the internet - and promises the explosion of a so-called voice web.

VoiceXML 1.0, the current version of the standard, stems from the collaboration of four big voice experts: IBM, Lucent, AT&T and Motorola. "We talked to our customers about using the web to deliver voice services," says Pete Danielsen, a researcher at Lucent's Bell Labs. "But companies didn't want to learn another proprietary language."

Developing a common mark-up language made sense. Similar to HTML, it would put the design of voice pages within the scope of many companies and help voice applications to expand. Furthermore, it would separate the proprietary aspects of voice services from the universal functions, making applications more portable across different telephony and web platforms.

To cut down on the time-consuming discussions that usually accompany the development of industry standards, IBM, Lucent, AT&T and Motorola made an astute decision: they thrashed out the specifications of VoiceXML 1.0 between themselves, with only a little consultation. At the same time they encouraged its support through their VoiceXML Forum.

But limiting conflict has also limited VoiceXML's util ity. Some developers claim that it is not robust, ignores many scenarios, and is hardly a standard at all. "VoiceXML is open to interpretation," acknowledges Mr Danielsen. "Even so, it will be beneficial to customers and vendors as something against which products can be compared."

"In the interests of getting 1.0 out, there were things left out," admits Bill Dykas, strategic alliance manager at IBM and the current chairman of the VoiceXML Forum. "These other factors are now being built around the standard."

"You don't get a technical leap with VoiceXML," echoes Steve Chirokas, director of product and channel marketing at SpeechWorks. "Hardly any installations use it at the moment because it is still not standardised enough.

"Adoption won't happen overnight, but we're excited by this. People will use VoiceXML once it has solidified."

This "solidification" will take the form of Version 2.0, now in the final stages of approval in the Voice Browser working group of the World Wide Web Consortium (W3C), the internationally recognised body that sets internet standards. The W3C took over the technical development of VoiceXML from the VoiceXML Forum when it adopted Version 1.0 as the working standard.

Dave Raggett, a staff member of the W3C, sees the development of the voice web from a much wider perspective. VoiceXML 2.0, for instance, will provide standard grammar rules for speech dialogues, but the Voice Browser group is working on much more.

Mr Raggett lists natural language processing, a speech synthesis mark-up language (used to create speech from text "on the fly") and the development of standards for multimodal systems (Voice-in/Wap-out, for example). He also mentions a standard for marking pronunciation.

"An engine uses its own rules to go from text to phonetics. If a developer finds that the engine he's using doesn't recognise the terms, he has to go to the vendor. It would be better if he could use a standard way of showing how a piece of text should be pronounced."

Despite the flaws, many companies are doing their bit to drive VoiceXML's acceptance. SpeechWorks, for instance, has made its VoiceXML interpreter available as open source. It also offers developers access to so-called speech links, which allow the transfer of both voice and data over an internet connection.

Nuance has taken another approach by offering several pre-built modules for common voice operations. "At the moment, there are great limits to the standard. If a web developer doesn't know much about speech then our SpeechObjects may help," says John Shea, director of product marketing and management. "There's an art to building dialogue flow and SpeechObjects encapsulate the 'best of breed' understanding."

Nuance has also submitted to the W3C around 25 SpeechObjects which could be incorporated into the standard as set routines.

While developers wait for Version 2.0, there is little doubt that VoiceXML will soon be defining and expanding the voice web. "The goal of VoiceXML is to move forward," says Bill Dykas. "Today, you can write applications that are 70-80 per cent portable. One or two functions aren't standard, but it's much better than it was."

Mr Dykas says that for all its apparent weaknesses, VoiceXML will drive the expansion of voice applications. "When you create a standard, you create a certain commoditisation which drives down the cost of technology," he says.

Voice applications may never be as prevalent as the world wide web where people jump about for information or surf around for fun. Instead, users will access voice sites for very specific information - there will be far less interlinking between separate sites. Consequently, as interoperability is less of an issue, there may not be the same convergence of the VoiceXML standard as seen with HTML. Mr Shea still believes that VoiceXML marks the beginning of a "voice era".

"The acceptance of VoiceXML will enable people with not a lot of speech skill to build an application. Just like there's an art to good visual web design, there's an art to laying out a dialogue, but the people who implement the application could be an average web developer. Making voice accessible to these millions of developers could get the web dynamics to happen in the voice world."