Drivers often use infotainment systems in motor vehicles, such as systems for navigation, music, and phones. However, operating visual-manual interfaces for these systems can distract drivers. Speech interfaces may be less distracting. To help designing easy-to-use speech interfaces, this paper identifies key speech interfaces (e.g., CHAT, Linguatronic, SYNC, Siri, and Google Voice), their features, and what was learned from evaluating them and other systems. Also included is information on key technical standards (e.g., ISO 9921, ITU P.800) and relevant design guidelines. This paper also describes relevant design and evaluation methods (e.g., Wizard of Oz) and how to make driving studies replicable (e.g., by referencing SAE J2944). Throughout the paper, there is discussion of linguistic terms (e.g., turn-taking) and principles (e.g., Grice’s Conversational Maxims) that provide a basis for describing user-device interactions and errors in evaluations. 1. Introduction In recent years, automotive and consumer-product manufacturers have incorporated speech interfaces into their products. Published data on the number of vehicles sold with speech interfaces is not readily available, though the numbers appear to be substantial. Speech interfaces are of interest because visual-manual alternatives are distracting, causing drivers to look away from the road, and increasing crash risk. Stutts et al. [1] reported that adjusting and controlling entertainment systems and climate-control systems and using cell phones accounted for 19% of all crashes related to distraction. The fact that the use of entertainment systems is ranked the second among major causes of these crashes arises the argument that speech interfaces should be used for music selection. Tsimhoni et al. [2] reported that 82% less time was needed for drivers to enter an address using a speech interface as opposed to using a keyboard, indicating that a speech interface is preferred for that task. However, using a speech interface still requires cognitive demand, which can interfere with the primary driving task. For example, Lee et al. [3] showed that drivers’ reaction time increased by 180?ms when using a complex speech-controlled email system (three levels of menus with four-to-seven options for each menu) in comparison with a simpler alternative (three levels of menus with two options per menu). Given these advantages, suppliers and automanufacturers have put significant effort into developing speech interfaces for cars. They still have a long way to go. The influential automotive.com website notes
References
[1]
J. C. Stutts, D. W. Reinfurt, and L. Staplin, “The role of driver distraction in traffic crashes,” AAA Foundation for Traffic Safety, Washington, DC, USA, 2001, https://www.aaafoundation.org/sites/default/files/distraction%20%281%29.pdf.
[2]
O. Tsimhoni, D. Smith, and P. Green, “Address entry while driving: speech recognition versus a touch-screen keyboard,” Human Factors, vol. 46, no. 4, pp. 600–610, 2004.
[3]
J. D. Lee, B. Caven, S. Haake, and T. L. Brown, “Speech-based interaction with in-vehicle computers: the effect of speech-based e-mail on drivers' attention to the roadway,” Human Factors, vol. 43, no. 4, pp. 631–640, 2001.
[4]
F. Weng, B. Yan, Z. Feng, et al., “CHAT to your destination,” in Proceedings of the 8th SIGdial Workshop on Discourse and Dialogue, pp. 79–86, Antwerp, Belgium, 2007.
[5]
F. Weng, S. Varges, B. Raghunathan et al., “CHAT: a conversational helper for automotive tasks,” in Proceedings of the 9th International Conference on Spoken Language Processing (Inter-Speech/ICSLP '06), pp. 1061–1064, Pittsburgh, Pa, USA, September 2006.
[6]
J. H. L. Hansen, J. Plucienkowski, S. Gallant, B. Pellom, and W. Ward, “‘CU-Move’: robust speech processing for in-vehicle speech system,” in Proceedings of the International Conference on Spoken Language Processing (ICSLP '00), vol. 1, pp. 524–527, Beijing, China, 2000.
[7]
R. Pieraccini, K. Dayanidhi, J. Bloom et al., “Multimodal conversational systems for automobiles,” Communications of the ACM, vol. 47, no. 1, pp. 47–49, 2004.
[8]
P. Heisterkamp, “Linguatronic product-level speech system for Mercedes-Benz cars,” in Proceedings of the 1st International Conference on Human Language Technology Research, pp. 1–2, Association for Computational Linguistics, San Diego, Calif, USA, 2001.
[9]
W. Minker, U. Haiber, P. Heisterkamp, and S. Scheible, “The SENECA spoken language dialogue system,” Speech Communication, vol. 43, no. 1-2, pp. 89–102, 2004.
[10]
Sync, http://www.ford.com/syncmyride/#/home/.
[11]
P. Geutner, F. Steffens, and D. Manstetten, “Design of the VICO spoken dialogue system: evaluation of user expectations by wizard-of-oz experiments,” in Proceedings of the 3rd International Conference on Language Resources and Evaluation (LREC '02), Las Palmas, Spain, 2002.
[12]
J. C. Chang, A. Lien, B. Lathrop, and H. Hees, “Usability evaluation of a Volkswagen Group in-vehicle speech system,” in Proceedings of the 1st International Conference on Automotive User Interfaces and Interactive Vehicular Applications (ACM '09), pp. 137–144, Essen, Germany, September 2009.
[13]
T. Dorchies, “Come again? Vehicle voice recognition biggest problem in J.D. Power and Associates study,” http://blogs.automotive.com/come-again-vehicle-voice-recognition-biggest-problem-in-j-d-power-and-associates-study-100161.html#axzz2GOtmAz3l.
[14]
“Consumer Reports, The Ford SYNC system read test messages…in theory,” http://news.consumerreports.org/cars/2010/08/the-ford-sync-system-reads-text-messages-in-theory-mustang.html.
[15]
JustAnswer, “Chrysler 300c I have a problems with Uconnect telephone,” http://www.justanswer.com/chrysler/51ssy-chrysler-300c-problem-uconnect-telephone.html.
[16]
B. Pellom, W. Ward, J. Hansen et al., “University of Colorado dialog systems for travel and navigation,” in Proceedings of the 1st International Conference on Human language Technology Research, Association for Computational Linguistics, San Diego, Calif, USA, 2001.
[17]
C. Carter and R. Graham, “Experimental comparison of manual and voice controls for the operation of in-vehicle systems,” in Proceedings of the 14th Triennial Congress of the International Ergonomics Association and 44th Annual Meeting of the Human Factors and Ergonomics Association (IEA/HFES '00), vol. 44, pp. 286–289, Human Factors and Ergonomics Society, Santa Monica, CA, USA, August 2000.
[18]
C. Forlines, B. Schmidt-Nielsen, B. Raj, K. Wittenburg, and P. Wolf, “A comparison between spoken queries and menu-based interfaces for in-car digital music selection,” in Proceedings of the International Conference on Human-Computer Interaction (INTERACT '05), pp. 536–549, Rome, Italy, 2005.
[19]
L. Garay-Vega, A. K. Pradhan, G. Weinberg et al., “Evaluation of different speech and touch interfaces to in-vehicle music retrieval systems,” Accident Analysis and Prevention, vol. 42, no. 3, pp. 913–920, 2010.
[20]
U. G?rtner, W. K?nig, and T. Wittig, “Evaluation of manual vs. speech input when using a driver information system in real traffic,” in Proceedings of the Driving Assessment 2001: The First International Driving Symposium on Human Factors in Driving Assessment, Training and Vehicle Design, Aspen, Colo, USA, 2001.
[21]
K. Itoh, Y. Miki, N. Yoshitsugu, N. Kubo, and S. Mashimo, “Evaluation of a voice-activated system using a driving simulator,” SAE World Congress & Exhibition, SAE Tech 2004-01-0232, Society of Automotive Engineers, Warrendale, Pa, USA, 2004.
[22]
J. Maciej and M. Vollrath, “Comparison of manual vs. speech-based interaction with in-vehicle information systems,” Accident Analysis and Prevention, vol. 41, no. 5, pp. 924–930, 2009.
[23]
M. C. McCallum, J. L. Campbell, J. B. Richman, J. L. Brown, and E. Wiese, “Speech recognition and in-vehicle telematics devices: potential reductions in driver distraction,” International Journal of Speech Technology, vol. 7, no. 1, pp. 25–33, 2004.
[24]
T. A. Ranney, J. L. Harbluk, and Y. I. Noy, “Effects of voice technology on test track driving performance: implications for driver distraction,” Human Factors, vol. 47, no. 2, pp. 439–454, 2005.
[25]
J. Shutko, K. Mayer, E. Laansoo, and L. Tijerina, “Driver workload effects of cell phone, music player, and text messaging tasks with the Ford SYNC voice interface versus handheld visual-manual interfaces,” SAE World Congress & Exhibition, SAE Tech 2009-01-0786, Society of Automotive Engineers, Warrendale, Pa, USA, 2009.
[26]
O. Tsimhoni, D. Smith, and P. Green, “Destination entry while driving: speech recognition versus a touch-screen keyboard,” Tech. Rep. UMTRI-2001-24, University of Michigan Transportation Research Institute, Ann Arbor, Mich, USA, 2002.
[27]
J. Villing, C. Holtelius, S. Larsson, A. Lindstrom, A. Seward, and N. Aberg, “Interruption, resumption and domain switching in in-vehicle dialogue,” in Proceedings of the 6th International Conference on Natural Language Processing, pp. 488–499, Gothenburg, Sweden, 2008.
[28]
V. E.-W. Lo, P. A. Green, and A. Franzblau, “Where do people drive? Navigation system use by typical drivers and auto experts,” Journal of Navigation, vol. 64, no. 2, pp. 357–373, 2011.
[29]
U. Winter, T. J. Grost, and O. Tsimhoni, “Language pattern analysis for automotive natural language speech applications,” in Proceedings of the 2nd International Conference on Automotive User Interfaces and Interactive Vehicular Applications (ACM '10), pp. 34–41, Pittsburgh, Pa, USA, November 2010.
[30]
K. Takeda, J. H. L. Hensen, P. Boyraz, L. Malta, C. Miyajima, and H. Abut, “International large-scale vehicle corpora for research on driver behavior on the road,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, pp. 1609–1623, 2011.
[31]
A. Barón and P. A. Green, “Safety and usability of speech interfaces for in-vehicle tasks while driving: a brief literature review,” Tech. Rep. UMTRI-2006-5, University of Michigan Transportation Research Institute, Ann Arbor, Mich, USA, 2006.
[32]
B. Faerber and G. Meier-Arendt, “Speech control systems for handling of route guidance, radio and telephone in cars: results of a field experiment,” in Vision in Vehicle—VII, A. G. Gale, Ed., pp. 507–515, Elsevier, Amsterdam, The Netherlands, 1999.
[33]
A. Kun, T. Paek, and Z. Medenica, “The effect of speech interface accuracy on driving performance,” in Proceedings of the 8th Annual Conference of the International Speech Communication Association (Interspeech '07), pp. 1326–1329, Antwerp, Belgium, August 2007.
[34]
A. W. Gellatly and T. A. Dingus, “Speech recognition and automotive applications: using speech to perform in-vehicle tasks,” in Proceedings of the Human Factors and Ergonomics Society 42nd Annual Meeting, pp. 1247–1251, Santa Monica, Calif, USA, October 1998.
[35]
R. M. Schumacher, M. L. Hardzinski, and A. L. Schwartz, “Increasing the usability of interactive voice response systems: research and guidelines for phone-based interfaces,” Human Factors, vol. 37, no. 2, pp. 251–264, 1995.
[36]
Intuity Conversant Voice Information System Version 5. 0 Application Design Handbook, AT&T Product Documentation Development, Denver, Colo, USA, 1994.
[37]
L. J. Najjar, J. J. Ockeman, and J. C. Thompson, “User interface design guidelines for speech recognition applications,” presented at IEEE VARIS 98 Workshop, Atlanta, Ga, USA, 1998, http://www.lawrence-najjar.com/papers/User_interface_design_guidelines_for_speech.html.
[38]
Z. Hua and W. L. Ng, “Speech recognition interface design for in-vehicle system,” in Proceedings of the 2nd International Conference on Automotive User Interfaces and Interactive Vehicular Applications, pp. 29–33, ACM, Pittsburgh, Pa, USA, 2010.
[39]
“Ergonomics—Assessment of Speech Communication,” ISO Standard 9921, 2003.
[40]
“Ergonomics—Construction and Application of Tests for Speech Technology,” Tech. Rep. ISO/TR 19358, 2002.
[41]
“Information Technology—Vocabulary—Part 29: Artificial Intelligence—Speech Recognition and Synthesis,” ISO/IEC Standard 2382-29, 1999.
[42]
“Acoustics—Audiometric Test Methods—Part 3: Speech Audiometry,” ISO Standard 8253-3, 2012.
[43]
“Ergonomics of human-system interaction—Usability methods supporting human-centered design,” ISO Standard 16982, 2002.
[44]
“Road vehicles—Ergonomic aspects of transport information and control systems—Specifications for in-vehicle auditory presentation,” ISO Standard 15006, 2011.
[45]
Voice User Interface Principles and Guidelines (Draft), SAE Recommended Practice J2988, 2012.
[46]
R. Hopper, Telephone Conversation, Indiana University Press, Bloomington, IN, USA, 1992.
[47]
B. Balentine and D. P. Morgan, How to Build A Speech Recognition Application, Enterprise Integration Group, San Ramon, Calif, USA, 1999.
[48]
M. H. Cohen, J. P. Giangola, and J. Balogh, Voice Interface Design, Pearson, Boston, Mass, USA, 2004.
[49]
R. A. Harris, Voice Interaction Design, Morgan Kaufmann, San Francisco, Calif, USA, 2005.
[50]
J. R. Lewis, Practical Speech User Interface Design, CRC Press, Boca Raton, Fla, USA, 2011.
[51]
S. C. Levinson, Pragmatics, Cambridge University Press, New York, NY, USA, 1983.
[52]
G. Skanztze, “Error detection in spoken dialogue systems,” 2002, http://citeseer.ist.psu.edu/cache/papers/cs/26467/http:zSzzSzwww.ida.liu.sezSz~nlplabzSzgsltzSzpaperszSzGSkantze.pdf/error-detection-in-spoken.pdf.
[53]
J. L. Austin, How To Do Things With Words, Harvard University Press, Cambridge, Mass, USA, 1962.
[54]
A. Akmajian, R. A. Demers, A. K. Farmer, and R. M. Harnish, Linguistics: An Introduction To Language and Communication, MIT Press, Cambridge, Mass, USA, 5th edition, 2001.
[55]
J. R. Searle, “A taxonomy of illocutionary,” in Language, Mind and Knowledge, Minnesota Studies in the Philosophy of Science, K. Gunderson, Ed., vol. 7, pp. 344–369, 1975.
[56]
H. P. Grice, “Logic and conversation,” in Syntax and Semantics 3: Speech Acts, P. Coole and J. L. Morgan, Eds., pp. 41–58, Academic Press, New York, NY, USA, 1975.
[57]
J. Véronis, “Error in natural language dialogue between man and machine,” International Journal of Man-Machine Studies, vol. 35, no. 2, pp. 187–217, 1991.
[58]
N. Chomsky, Aspects of Theory of Syntax, The MIT Press, Cambridge, Mass, USA, 1965.
[59]
M. L. Bourguet, “Towards a taxonomy of error-handling strategies in recognition-based multi-modal human-computer interfaces,” Signal Processing, vol. 86, no. 12, pp. 3625–3643, 2006.
[60]
C.-M. Karat, C. Halverson, D. Horn, and J. Karat, “Patterns of entry and correction in large vocabulary continuous speech recognition systems,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 568–575, ACM, Pittsburgh, Pa, USA, May 1999.
[61]
K. Larson and D. Mowatt, “Speech error correction: the story of the alternates list,” International Journal of Speech Technology, vol. 6, no. 2, pp. 183–194, 2003.
[62]
D. Litman, M. Swerts, and J. Hirschberg, “Characterizing and predicting corrections in spoken dialogue systems,” Computational Linguistics, vol. 32, no. 3, pp. 417–438, 2006.
[63]
E.-W. Lo, S. M. Walls, and P. A. Green, “Simulation of iPod music selection by drivers: typical user task time and patterns for manual and speech interfaces,” Tech. Rep. UMTRI-2007-9, University of Michigan Transportation Research Institute, Ann Arbor, Mich, USA, 2007.
[64]
J. F. Kelley, “An empirical methodology for writing user-friendly natural language computer applications,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 193–196, ACM, Boston, Mass, USA, 1983.
[65]
J. D. Gould, J. Conti, and T. Hovanyecz, “Composing letters with a simulated listening typewriter,” Communications of the ACM, vol. 26, no. 4, pp. 295–308, 1983.
[66]
P. Green and L. Wei-Hass, “The Wizard of Oz: a tool for repaid development of user interfaces,” Tech. Rep. UMTRI-1985-27, University of Michigan Transportation Research Institute, Ann Arbor, Mich, USA, 1985.
[67]
A. K. Sinah, S. R. Klemmer, J. Chen, J. A. Landay, and C. Chen, “Suede: iterative, informal prototyping for speech interfaces,” in Proceedings of the CHI 2001 Proceedings, Association for Computing Machinery, New York, NY, USA, 2011.
[68]
L. Dybkj?r, N. O. Bernsen, and W. Minker, “Evaluation and usability of multimodal spoken language dialogue systems,” Speech Communication, vol. 43, no. 1-2, pp. 33–54, 2004.
[69]
M. Walker, C. Kamm, and D. Litman, “Towards developing general models of usability with PARADISE,” Natural Language Engineering, vol. 6, pp. 363–377, 2000.
[70]
M. Hajdinjak and F. Miheli?, “The PARADISE evaluation framework: issues and findings,” Computational Linguistics, vol. 32, no. 2, pp. 263–272, 2006.
[71]
J. Schweitzer and P. A. Green, “Task acceptability and workload of driving urban roads, highways, and expressway: ratings from video clips,” Tech. Rep. UMTRI-2006-6, University of Michigan Transportation Research Institute, Ann Arbor, Mich, USA, 2007.
[72]
P. Green, B. T.-W. Lin, J. Schweitzer, H. Ho, and K. Stone, “Evaluation of a method to estimate driving workload in real time: watching clips versus simulated driving,” Tech. Rep. UMTRI-2011-29, University of Michigan Transportation Research Institute, Ann Arbor, Mich, USA, 2011.
[73]
P. Green, “Using standards to improve the replicability and applicability of driver interface research,” in Proceedings of the 4th International Conference on Automotive User Interfaces and Interactive Vehicular (AutomotiveUI '12), Portsmouth, UK.
[74]
M. R. Savino, Standardized names and definitions for driving performance measures [Ph.D. thesis], Department of Mechanical Engineering, Tufts University, Medford, Ore, USA, 2009.
[75]
Operational Definitions of Driving Performance Measures and Statistics (Draft), SAE Recommended Practice J2944, 2012.