|
|
Speech - the future interface
Feature (for TES 2001)
The fantasy where people talk to machines must be one of the common science fiction clichés. In fanciful tales of the future, computers listened, replied and even translated from one alien dialect to some other. It was all part of growing up in the last century. As we enter this century with my printer still thinking it has US letter, not A4 paper, it is easy to underrate the progress. But with computer-power skyrocketing some of that future inescapably comes into view. Lets imagine it is time for end of term reports, so you pick up the phone, dial an extension and dictate till you are done. Next, your words are transcribed by machine, checked against the stored recording and in some way, this finally gets to the parent. Desirable or otherwise, its half an idea based on a system found in US hospitals. Developed by Dictaphone, the company that transformed the take a letter, Miss Smith culture, medics can file their reports to a wireless mouthpiece. With just a bit of human intervention - offsite and somewhere on the Internet, their voice recordings are transcribed. The text is standardised so that a machine can analyse it and generate the patients bills. The savings in time are not only great, the the millions of reports each day forms a huge bank of symptoms, history, and treatment that bode well for making medicine work better in future. The only fantasy here, is that teachers time might be as valued as a doctors. The reporting system is the result of welding together technology by Lernout and Hauspie. This speech and language company own Dictaphone, Dragon Systems and make call centre, translation and speech recognition systems. They are poised well for the start of a century where the SUI (speech user interface) may succeed the mouse-clicking, GUI (graphical user interface). In Asia, an L&H creation called RealSpeak is used to give language teaching an unusual boost. With RealSpeak, the computer reads back text on screen using a more natural human sounding voice. To make the voice, someone has read and have their recorded words broken into phonemes. RealSpeak, which now available as part of Voice Express dictation software, gets a plain PC to read text and assemble the phonemes given the rules of a language. It can take months just to make one language but the result is less nasal than computer-speak of the past. In China, the demand for learning English is very high. Almost 98 percent of the population are Cantonese, few go overseas to be educated, so the region suffers a severe shortage of native English teachers. As Louis Woo, president of L&H Asia Pacific explained, RealSpeak can help. If I'm the teacher and know that my pronunciation is questionable, I can ask students to listen to a passage and they can hear it with a professional voice. They will go to the lab, read the passage and repeat it after. The quality of RealSpeak offers tremendous opportunities to help both students and teacher. The computer can also measure their pronunciation and be a wonderful help. For a taste of a stage further along, you can visit Ananova (www.ananova.com), the Press Association site where daily headlines are read by Ananova, a virtual newscaster. The character, called an avatar, was developed using scanned images of human expressions, lip movements and eye blinks. The result is a library of facial effects that can be triggered by the computer. Given news to read, the Ananova machine matches RealSpeak sounds to a moving face. The Millennium Dome offers the clue of where this is heading: here visitors can make an avatar of themselves. It could one day be the character that answers the phone, appears to read an e-mail or teaches your lesson. A bulls eye for firms like L&H comes is in making computers understand context. For instance, Ananova still needs to be told how to say things: stories are tagged to ensure she doesnt throw a smile while reading a sad story and her intonation is still a touch flat. Likewise an Internet search might sift just the facts youre looking for. The keys to understanding are part way there - software such as Power Translator can today make working translations between endless language pairs (French to Japanese; English to Portuguese). If the results need work to achieve best copy, an instant but rough translation can be good enough. Only last month the company showed software - an intelligent agent - that would dig into the Internet to find nuggets of information even if it was in a foreign language. Thats good for the majority of the world that speaks no English. There were also tools for audio mining where recorded government speeches could also be searched, like the doctors report. Thats very good all round. And then there were awesome devices - options on BMW and Megane cars that told you where to turn, read out your emails and let you reply to them as you drive. Thats a killer application for sure. But I jest. As the companys Jo Lernout put it, using those tiny mobile phones to get information will create new markets. It could be for toothpicks to press the keys, magnifying glasses to read them, or a speech interface. The contest between SUI and GUI has begun but its not a fair fight.
|