telecoms-bar
FT Telecoms May 16 2001 - Voice recognition
Technology that needs to speak up for itself
by Fiona Harvey
Published: May 14 2001 11:01GMT | Last Updated: July 18 2001 09:32GMT
image

It sounds like the most obvious thing in the world: how else would one communicate with a phone but by voice? And yet we do not - only half of Nokia's current range of handsets incorporate voice recognition technology today, for example, and most older units still in use exclude the technology. While we chatter away happily to other people, when we communicate with the phones themselves, we are tied to a tiny alphanumeric keyboard.

We take the technology involved in sending voice conversations between mobile devices for granted. But the technology required to take the spoken word and convert it into terms that a digital device can recognise turns out to be far more complex.

It requires a means to recognise the words spoken, then convert them into the relevant instructions. However, recognising speech is notoriously hard for computers, so voice recognition on handsets has mostly been limited to a few basic commands letting users call people in their contacts book without using the keyboard. In fact, the only thing voice command technology really has in common with normal phone technology is the means of collecting the voice signal - the microphone within the handset.

The advantages of having speech recognition built into the phone are that it frees up our hands - useful when driving, for instance - and allows the handsets to adopt completely different interfaces to the conventional 10 alphanumeric buttons.

As Hitesh Seth, chief technology evangelist for software company Silverline Technologies, says: "Let's face it, using a cellphone's keypad to enter numeric information such as your phone or credit card numbers, let alone alphanumeric information such as your name and address, can be a challenge."

As safety concerns about emissions drive more people to use hands-free sets, and as manufacturers look to radically overhaul phone design to incorporate the new capabilities third-generation networks will bring, both these considerations take on greater significance.

The first voice-enabled phones started appearing in the mid 1990s, to little excitement. Philips, for instance, was one of the first electronics companies to take up the technology, and for several years has been producing phones capable of being controlled by simple commands - say the name of the person you want to ring, for example, and the phone could open the address book, find the number and ring it automatically. Higher-end phones have slightly more advanced voice features: the Genie range lets users record a voice command and associate it with a specified function.

In the UK, voice recognition has been championed chiefly by Orange, with its Wildfire product.

Quest to find the 'killer application'

Users seem to have been slow to take to the new capabilities, however. Bernt Ostergaard, research director at the Giga Information Group, says: "Voice recognition is just a nice feature to users, not a 'killer application'."

Ray Woabank, mobile commerce researcher at the Butler Group research company, believes that mobile phone companies are wary of pushing users too hard towards voice recognition because the technology does not yet deliver high enough quality, and they fear turning users off.

They want to avoid another scenario such as Wap (Wireless application protocol), the format for presenting internet pages on mobile phone screens which has so far failed to live up to the expectations created in its initial marketing.

When 3G services start arriving next year, that situation may change. Operators want users to take advantage of 3G capabilities, such as e-mail and internet access, so they have encouraged handset manufacturers to accommodate these functions. In order to have room for a screen, some of these handsets dispense with the conventional buttons. Many use handwriting recognition or touch screens with styluses, but manufacturers are also experimenting with speech-driven interfaces.

However, analysts such as Mr Ostergaard believe that the real developments in voice recognition will not be on the handset, but at the back end. Voice-enabled servers running websites will allow us to speak to sites instead of having to navigate through a normal browser. The most important technology in this space, says Mr Ostergaard, is the VoiceXML standard, (Voice eXtensible Mark-up Language). "VoiceXML will be the real crunch - and in two years' time we will see it being used widely on the internet," he predicts.

VoiceXML, the development of which has been led by IBM, Lucent, AT&T and Motorola (see report below), will allow websites to exchange information gathered by voice systems in the same way that they do textual information today. Version 1.0 of the standard has already been approved by the World Wide Web Consortium.

Voice recognition technology carries another advantage - it can be used for added security, as each person's voice imprint is unique. "In the next 12 months we will see phones that use speech recognition instead of passwords, because the voice is a very accurate instrument for determining identity, and it will work better on phones than other biometrics such as fingerprint recognition," says Mr Ostergaard.

Persay, a UK subsidiary of Comverse technology, is one company specialising in the development of speaker verification technology. Persay's software, intended for use in banks, identifies the user during the course of a natural conversation based on the unique pattern or 'signature' of his or her voice. This eliminates the need for passwords or PINs (personal identification numbers).

Voice recognition systems on phones are really just the beginning. Handsfree units in cars, for instance, show the way to a future in which we will be able to ask for, and receive, information on traffic, say, or the amount of petrol in the tank simply by asking aloud. Motorola supplies its voice recognition systems to car manufacturers, for instance.

Services are already springing up to take advantage of this technology. Yeoman Group in the UK launched in April what it claims to be the world's first navigation technology that gives drivers turn-by-turn voice directions through their GSM phone, based on the belief that voice is the best way of delivering information to mobile phones while the owner is driving. It has enlisted 20/20 Speech to provide the voice synthesis and recognition technologies for the system.

Mobile phone companies are even looking to extend voice technology to static environments. Orange's Wildfire really comes into its own in the company's prototype 'home of the future', a detached house in rural Hertfordshire, UK, that cost £2m to equip with state-of-the-art technology.

Here, residents speak into Bluetooth headsets or directly into microphones set about the house in order to activate the many labour-saving features. Commands such as "Wildfire, run me a bath" or "Wildfire, make me a cup of tea" result in a satisfyingly instant response.

Orange estimates that the technology in the house will be within the reach of an average family in a few years' time.