The challenges of adding speech recognition to hand-held devices have as much to do with their size, restricted processing power and memory as with the complexities of modelling speech patterns and controlling background noise. Sophisticated voice recognition systems capable of handling the variety of languages and accents are just becoming available on personal computers. It will take time before voice recognition can be shrunk to fit into the limited space of hand-held devices. Yet necessity is driving innovation. Mobile phones and personal digital assistants (PDAs), are becoming smaller, feature-rich and converging with other technologies. Display screens are increasing in size as keyboards are diminishing, making voice control over phone functions more essential. Many new mobile phones are equipped with simplified voice recognition in which one-word commands activate the dialling mechanism and connect calls to pre-recorded and pre-stored numbers. Advanced phones include voice commands to manage and direct incoming calls, record/play memos or as short cuts through menus. Adding more vocabulary to phone books and calendars, and accommodating sophisticated speech recognition, consumes a great deal of memory. To contain costs and because there is no room to add a third processor on the circuit board, speech recognition must be integrated within the existing architecture of the device. This is becoming easier as more powerful digital signal processors (DSPs) are being developed. "Batteries and chips are getting more efficient and using less power, making it possible to add more features," says Gordon Clyne, products manager for the enterprises group at Palm, the California-based hand-held manufacturer. "Handset chips have become smaller - and size is now a question of design. It is possible to obtain 64Mbps [megabits per second] chips the size of a fingernail. "New lithium polymer batteries are like sheets of paper and can be stacked to achieve the right size and moulded into any shape to fit around chips built. They can be layered to a thickness of 1mm and connected together for more wattage," he says.
Vocabularies
Incorporating automatic speech recognition is complicated because phones do not have fast internal access rates, so data moves too slowly between the processor memories. Speed and capacity problems can be reduced by using modern flash memory which can handle bigger vocabularies and enables more sophisticated speech controlled functions. However, natural speech is too fast, complex, and power- and memory-hungry for today's mobile devices. "The art of making mobile phones is a careful balance between processing power, memory, power consumption and radio functions," says Pekka Isosomppi, communications manager at Nokia Mobile Phones. Since mobiles are used everywhere, the speech recogniser must be robust enough to eliminate the highly variable background noise which degrades voice signals. Techniques, such as spectral subtraction, filter and extract so-called stationary noise which is both constant and unvarying. Unstable background noise includes other voices which are more difficult to filter, but phones must be able to respond only to the voice giving instructions. Stefan Dobler, group manager for speech recognition at Ericsson, the Swedish telecoms equipment group, says noise is the most severe problem for mobile speech recognisers: "In the trade-off between noise capability and vocabulary size, noise wins. The most important criteria for voice recognition systems is that they must be very robust and able to function everywhere - in the noisiest situations. "Speech can be modelled more accurately and more words added to vocabularies. Memory is increasing and it is possible to adapt the hardware architecture and re-write algorithms to solve the memory access problem." In speaker-dependent systems, users train the device to associate a particular word command with a given function. The logic in the handset uses the principal of pattern matching, so that any word can be used. Speaker-dependent recognisers can therefore be installed in phones at manufacture and will work regardless of their ultimate destination and user language. "Training phones is a huge barrier to use," suggests Dominic Strowbridge, director for developer support at Motorola. "Yet all voice recognition systems in mobiles now are cheap-and-cheerful speaker-dependent systems. They pose very few difficulties and simply need very little extra memory, making them easy to install." Although more sophisticated voice recognition systems can reside in network servers, putting some functions in handsets gives local control and affects signal accuracy. "The best way to capture and digitise voice is in the phone, rather than sending it to the network where voice patterns may be lost in transmission," explains Mark Frankel, director of product management at Qualcomm. "Voice prints will be taken in the device and a digital print sent to the network. This will be important in next-generation applications where voice is used to navigate the web." Developing intelligent software which works in real time and understands natural language is a few years away for mobile handsets although versions are appearing on network servers. Laureen Cook, partner for mobile telecoms at KPMG, the consultancy, says 500m speech-enabled devices are expected to be in the world marketplace within the next five years which should equate to $1,000bn in speech recognition business by 2005.
|