Home

as *

1. SAMMIE System Versuchsfahrt SAMMIE Variante Stand Kommandowort System Versuchsfahrt TALK System Fahrsimulation 9 Welche sonstigen Bemerkungen haben Sie noch zur SAMMIE Variante im Vergleich zum SAMMIE System Version Final 1 1 Distribution public
2. 0 e ne G ine e e Pm OB gas Figure 74 Answers to the question 8 Which system would you use on the long run 50 This question here with the NA SAMMIE included was repeated in the final questionnaire Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 67 96 4 4 Statistical tests Originally there were several hypotheses as to several performance and dialogue criteria of the systems The hypothesis concerning the acceptance of the systems was SAMMIE system achieves a higher user acceptance as the NA system Acceptance is relatively well operationalised by the question 1 of the intermediate questionnaire General impression and question 8 of the NA SAMMIE Which system preferred Concerning the general impression of the systems asked in the intermediate questionnaire a Wilcoxon Matched Pair test revealed that there is no significant difference between systems Wilcoxon Matched Pairs n 18 T 19 5 p 0 41 Le immediately after the runs the Subjects had a similar positive impression of both systems s Figure 39 Concerning the preference for one of the systems asked in the NA SAMMIE questionnaire a 7 Test was performed The alternatives in the question were a Full SAMMIE b NA SAMMIE c C amp C d Baseline The alternative d was only for those Subjects who already participated in the baseline study The result of
3. The next Figure 11 represents the frequencies of modality changes during the task processing all changes during task processing were counted Subjects changed from speech to iDrive more than twice as frequently than vice versa from iDrive to speech By far the most frequent reason for a change from speech to manual input were repeated rejections or false system reactions onto speech input One of the reasons for a change from manual operation to speech input was e g in task 7 Scrolling to the song 99 Luftballons manually and then copying to the playlist verbally Another reason was the unsuccessful manual search for a song album playlist in tasks 6 7 9 or 10 which led to a change to speech input Another result is that much more changes both directions occurred with the C amp C system Here a change between modalities was easier because both modalities were menu based In the baseline study there was a somewhat different mode of calculation Changes to other modality 100 B SAMMIE 90 m C amp C 80 70 60 50 40 30 20 10 0 all SAMMIE tasks tasks 4 5 9 10 Figure 11 Modality changes during tasks averaged across tasks and Subjects changes to iDrive changes to speech The next Figure 12 demonstrates the preferred modality for the levels of MP3 experience averaged over systems and Subjects of the MP3 level groups Experien
4. 20 0 Ww A Y s ov aso eo eret mod gio and er ce Figure 65 Answers to the question 8 9 Which of the following aspects represented advantages disadvantages of multimodal input for you Being asked which functions the Subjects would like to use in the car by the multimodal interaction including the natural speech system SAMMIE there was partly a different order as compared to the baseline study s Figure 66 Now most of the Subjects pronounced the more advanced functions like desk diary navigation and internet while infotainment functions were represented more frequently in the baseline study Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 62 96 12 Which function by multimodal SAMMIE 100 86 205 lt 80 27 o Baseline 5 60 43 40 Percent Subjects 20 PT pn x 24 E 10 Figure 66 Answers to the question 12 Which 0 4 T T S functions in the car 0 0 o d w ve e Le E E m py would you like to use gt ll QU with the multimodal NO input In the last question 15 the Subjects were requested to give improvement suggestions Following statements were done e Better speech recognition 4x e Better adaptation to speech level 2x e Other iDrive position possibly at the steering wheel 2x e
5. 15 Welche Verbesserungsvorschl ge haben Sie f r die Weiterentwicklung des multimodalen nat rlich sprachlichen SAMMIE Systems Falls Sie zuf llig noch andere Personen kennen die an diesem Versuch teilnehmen ist es wichtig dass Sie keine Informationen und pers nlichen Beurteilungen austauschen bis Sie alle den Versuch unabh ngig voneinander durchgef hrt haben Vielen Dank f r Ihre Teilnahme Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 95 96 8 5 NA SAMMIE questionnaire Zwischenbefragung f r SAMMIE Variante im Stand Name Datum Bitte beantworten Sie die folgenden Fragen jeweils nach den entsprechenden Videoclips Informationsausgaben Videoclips Zeigen Sie mir alle Alben Zeige mir alle Alben 1 Wie hoch bewerten Sie den Nutzen der differenzierten pers nlichen Ansprache mit Sie Du sehr hoch sehr gering Videoclips Zeigen Sie mir alle Alben Nennen Sie mir alle K nstler 2 Wie hoch bewerten Sie den Nutzen der Unterscheidung zwischen Zeige und Nenne sehr hoch sehr gering Videoclip Welche Lieder sind auf der Playliste Cool Hits 3 Wie hoch bewerten Sie den Nutzen der Darstellung der Alben mit den Interpreten sehr hoch sehr gering Videoclip Zeige mir die Alben von Herber
6. 4 Irgendwie Irgendwo Ir Nena The Beatles Eisfeld 5 Getraumt Nena 6 Sie Herbert Gr nemeyer Harhart fr namn mr Titel tte lilaata lata FA Titel m ierpreten Bescneia 28 Bis Der Wind Sich Dreht Pur 29 Bitte Keine Love Story Udo Lindenberg E 31 32 Bleibt Alles Anders Blessing Herbert Gr nemeyer Garnett Silk 33 Blick Zur ck Herbert Gr nemeyer Figure 3 Examples of MP3 SAMMIE displays D6 4 Part I 25 January 2007 Page 10 96 kein Lied geladen S Mensch Herbert Gr nemeyer 3 kein Lied geladen Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 11 96 2 2 Experimental course The experimental course had to meet following criteria e Long enough to allow about 10 tasks e Short enough to keep the overall session time within 2 5 3 hours e Starting and ending the drive at a point which was easily accessible to the Subjects main station of Karlsruhe e No hard driving and traffic situations no sharp curves not too much traffic e Preferring speed limits of X100 km h to keep traffic noise within limits no motorways e Preferring express highways with 4 lanes or country roads with few traffic for long or complex tasks to keep oncoming traffic within limits e Roads with a certain amount of changing speed limits to study th
7. 40 Percent Subjects 20 0 not too sufficient extensive Figure 57 Answers to the question 25 How do you judge the extent of the display The next two figures show the answers to the statements concerning the dialogue cf Communicator evaluations 4 separated for the SAMMIE and the C amp C system Figure 58 Figure 59 The answers were spread over the first four categories with no clear preference for one of the systems The best scores got the statement concerning the understanding of what the system said It is not clear however if the statement was conceived as acoustical or content related understanding A relatively bad judgment refers to the statement that it was easy to get the information which the user wanted particularly with the C amp C system Actually the Subjects were often disoriented about the present system state e g though they asked for a specific album they still were in the general album menu level because of misunderstandings This holds true more for the C amp C system Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 55 96 Figure 58 Answers to the statements of questions 27 31 Do you agree to the statements for SAMMIE 27 Understood system 91 28 Got information 65 29 Knew what to do 56 30 Function as expected 59 31
8. The main variable was the multimodal interaction system The Full SAMMIE system was the main system The C amp C system was used as a reference system for direct comparison with the SAMMIE system The Non Adaptive NA SAMMIE system was presented by the experimenter at the end of the session to get a subjective judgment and a comparison to the Full SAMMIE system As far as possible the results should include a system comparison between Full SAMMIE and C amp C on the one hand and Full SAMMIE and the results of the baseline evaluation on the other hand The respective first action of task 1 3 in C amp C mode had to be done by speech input to ensure that the Subjects used speech input at least a few times Introduction Introduction to Introduction to Presentation of to iDrive SAMMIE C amp C NA SAMMIE speech input speech input NA SAMMIE Example 1 Example 2 Example 3 Example 4 Example 5 Example 6 speech input multimodal multimodal permutated except task 8 italic 1 action with speech input Figure 5 Experimental design in terms of systems and tasks The Full SAMMIE run SAMMIE and the C amp C reference run were balanced across Subjects to get a fair comparison in respect to traffic situation order and learning effects So about half of the Subjects began with SAMMIE and continued with C amp C while the other Subjects began with C amp C and continued with SAMMIE For the same reaso
9. multimodal l sen Von welchem K nstler ist Romeo und Julia auf der Playliste Cool Hits Wer ist der Interpret von Romeo und Julia auf der Wiedergabeliste Cool Hits 10 Aufgabe SAMMIE 1 multimodal l sen Kommando multimodal l sen Bitte w hlen Sie aus der Musikrichtung Rock ein Lied Nach Ihrem Geschmack und spielen es ab Finden Sie also ein beliebiges St ck der Musikrichtung Rock und h ren Sie es sich an Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 81 96 8 2 Introduction to the experiment Vor Ort Erkl rung des Versuchs Sehr geehrter e Teilnehmer in vielen Dank dass Sie gekommen sind Wir gehen davon aus dass Sie die Vorab Erklarung gelesen haben Falls nicht tun Sie dies bitte jetzt Hier sind nun weitere Einzelheiten zum heutigen Versuch Sie testen heute das SAMMIE Dialogsystem f r MP3 Player im Fahrzeug in verschiedenen Varianten Dies sind die Varianten die Sie bei den zwei Versuchsfahrten benutzen also A SAMMIE System Spracheingabe mit nat rlicher Sprache oder manuell mit iDrive Knopf B Kommando System Spracheingabe mit einzelnen W rtern oder manuell mit iDrive Knopf Au erdem werden wir Ihnen abschlie end noch eine weitere Variante vorf hren und Sie um eine Beurteilung bitten C SAMMIE System Variante Im wesentlichen hnlich zu A allerdings mit einige
10. 40 8 40 e SAMMIE a 3 4 30 t 5 20 10 0 Lo mean very good very SAMMIE 81 relief disturbing 78 Figure 45 Answers to the question 9 How do you judge the microphone characteristics i e automatically open microphone The next three figures were dedicated to the system output in general Concerning the Figure 46 and Figure 47 with questions about the attitude to information output and the support by the system the maximum is at the 3 scale category I e there is some reservation as to these criteria of the system This had sometimes to do with the extent of speech output and the restricted context sensitivity of the helps The system outputs did not resolve a user disorientation in each case The question in respect to information distribution between speech and display presentation Figure 48 was answered more positively but still with even some negative judgements There was a tendency to the opinion that there were sometimes too much spoken outputs e g the hint to the help system 11 How did you like information output 60 50 40 SAMMIE E C amp C 30 Percent Subjects 20 19 mean get SAMMIE 72 very very C amp C 70 good bad Figure 46 Answers to the question 11 How did you like the system output optically acoustically 5 The equivalent ques
11. 8 4 Final questionnaire Postexperimenteller Fragebogen nach dem SAMMIE Versuch Name Datum Bitte beantworten Sie die folgenden Fragen und beurteilen Sie das multimodale SAMMIE und Kommando System das Sie heute im Fahrversuch kennen gelernt haben Wir ben tigen Antworten die genau Ihre Erfahrungen und Beurteilungen wiedergeben Zur Erinnerung Sie haben beim Fahrversuch drei Systeme kennen gelernt A Multimodales SAMMIE System w hrend der Fahrt B Kommando System w hrend der Fahrt C Multimodale SAMMIE Variante als Vorf hrung Die Systeme A und B haben Sie m glicherweise in einer anderen Reihenfolge getestet Hier sind einige der Bilder von A als Erinnerungshilfe Welcome Mp3 Steuerung Musikrichtungen kein Lied geladen Titel des Albums Mensch Titel Mensch Herbert Gr nemeyer Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 90 96 Wiedergabelisten gt Road Mix Titel Interpreten 2 1 Romeo und Julia Udo Lindenberg Fragezeichen 2 Tripper The Beatles s 6 Sie Herbert Gr nemeyer Mensch Horhart Ar namavar Herbert Gr nemeyer Falls Sie den Fahrsimulationsversuch TALK im November 2005 im BEF mitgemacht haben bitten wir Sie an verschiedenen Stellen des Fragebogens um einen Vergleich der jetzt getesteten SAMMIE Systeme mit dem damaligen TALK System Als Erinnerungshilfe sehen Sie im folg
12. Figure 72 Answers to the question 6 How useful is the adaptation to the user s vocabulary After the questions to the single features a general question summarized all features with an emphasis on advantage for you Figure 73 The order is reflecting to some extent the individual judgements But the extended user guidance is now ranking higher the implicit confirmation lower If no immediate presentation is preceding an extended user guidance seems basically to be positive 7 Advantageously for you 100 SN 80 80 80 60 45 45 40 20 0 o and di o id af wiert coo a ORNA e goo eo co a wee N goa ere Percent Subjects Figure 73 Answers to the question 7 Which features are useful for you Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 66 96 In the last question the preference for one of the systems is asked for Figure 74 There was no single vote neither for NA SAMMIE nor for the baseline system The C amp C system is after the direct experience better accepted than the Full SAMMIE system The NA SAMMIE was preferred by nobody since all features were presented and judged negatively in the additional part of the session 8 Which system would you use in the long run 100 80 60 40 Percent Subjects 20 0 0
13. Wir fragen Sie nach den Aufgaben nach Ihrer Beanspruchung die Sie bitte auf einer Skala von 1 bis 5 ohne Zwischenstufen angeben 1 keine Beanspruchung 5 gro e Beanspruchung Dabei ist die gesamte Beanspruchung gemeint also das Fahren und Bedienen Nach jeder Fahrt geben wir Ihnen einen kleinen Fragebogen zum sofortigen Ausf llen Fahrstrecke S dtangente Richtung Wolfartsweier B3 Richtung Ettlingen B3 Richtung Rastatt gt Stra e Richtung M rsch gt B36 gt Stra e von Forchheim B3 Richtung Karlsruhe gt BAB Zubringer gt S dtangente Richtung Hauptbahnhof Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 84 96 GiD Route 98 Versuchsstrecke 8 10 1 A Datei karte Route Adressh rh Fire Fenster Ax wala S mlS mna FK d CurelbRAW ardi Das sichere Fahren hat auch bei der Bearbeitung der Aufgaben stets unbedingten Vorrang Achten Sie dabei bitte auf die StraBenverkehrsordnung Zum Schluss erhalten Sie einen gro en Fragebogen mit R ckumschlag und wir bitten Sie ihn zuhause heute oder allersp testens morgen auszuf llen Viel Spa Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 85 96 8 3 Intermediate questionnaire For SAMMIE and C amp C run identical apart from the system name SAMMIE Bediensystem and Kommando Bediensystem Zwischenbefr
14. 7 Ease of decision for modality SAMMIE 60 40 m C amp C 4 Baseline Percent Subjects 20 0 very easy mean SAMMIE 81 C amp C 81 very Baseline 84 hard Figure 43 Answers to the question 7 How easy was the decision for speech or manual input for you 100 80 8 Ease of change between modalities e SAMMIE 60 40 m C amp C Baseline Percent Subjects mean 20 0 very easy SAMMIE 81 C amp C 87 very Baseline 80 hard Figure 44 Answers to the question 8 How easy was the change between speech and manual input for you The next Figure 45 concerning the automatically open microphone demonstrates that not all Subjects agreed completely with the autonomous opening of the microphone There were many situations where the Subjects continued an interaction with iDrive or talked to the passengers and the microphone opened autonomously In those cases the system tried to understand the human communication which was irritating and intervened with the meanwhile progressed interaction Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 49 96 9 Automatically open microphone 6096 Im 5096
15. Ab hier handelt es sich um Skalen mit 7 Optionen 33 Nachfolgend finden Sie Wortpaare mit deren Hilfe Sie die Beurteilung des soeben verwendeten Systems vornehmen k nnen Sie stellen jeweils extreme Gegens tze dar wischen denen eine Abstufung m glich ist Bitte bewerten Sie das System m glichst spontan mit Hilfe der unten angegebenen Adjektiv Paare indem sie das zutreffende Feld mit einem Kreuz markieren Wenn Sie der Meinung sind ein Adjektiv Paar nicht zuordnen zu k nnen kreuzen Sie bitte den Mittelpunkt der Skala an 0 Das System war technisch menschlich Kompliziert einfach Unpraktisch praktisch Umst ndlich direkt Unberechenbar voraussagbar Verwirrend bersichtlich Widerspenstig handhabbar Isolierend verbindend Laienhaft fachm nnisch Stillos stilvoll Minderwertig wertvoll Ausgrenzend einbeziehend trennt mich von Leuten Bringt n her nicht vorzeigbar vorzeigbar Konventionell originell Phantasielos kreativ Vorsichtig mutig Konservativ innovativ Lahm fesselnd Harmlos herausfordernd Herk mmlich neuartig Unangenehm angenehm h sslich sch n unsympathisch sympathisch zur ckweisend einladend schlecht gut absto end anziehend entmutigend motivierend 3 2 1 0 1 42 43 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 89 96
16. E g if all Subjects would have marked very good an overall score of 100 would have been resulted If all Subjects would have marked very bad an overall score of 0 would have been resulted Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 46 96 The next Figure 40 is concerning the ease of use and shows that the use of speech operation was easier with the present systems as compared to the baseline system free run With the present systems by far most of the Subjects tended to a positive rating Ratings on the left side SAMMIE 90 C amp C 80 There was a relatively slight difference between SAMMIE and C amp C For the present systems the answers were spread over the positive part of the scales I e the systems were felt to be easy but the degree of ease of use was judged interindividually differently There is a significant correlation between the answers to question 2 and the respective rejection rates of the systems Pearson correlation coefficient r 0 41 p 0 05 2 Ease of use 6096 m A SAMMIE m C amp C Baseline 50 40 30 20 Percent Subjects 10 mean SAMMIE 75 0 very very C amp C 77 simple difficult Baseline 65 Figure 40 Answers to the questions 2 How easy was the system operation for you Question 3 concerned the problem of distractio
17. IST 507802 TALK D6 4 Part I 25 January 2007 Page 15 96 2 4 Tasks The basic principles for defining the tasks were e to use a considerable number of tasks from the baseline study to cover the performance of the SAMMIE system to include tasks with pure information content to consider more demanding functions i e neglecting the simple play back functions to choose items that do not require lengthy scrolling in the displayed lists So Bosch and BEF together with the partners defined the following tasks for the test Attachments SAMMIE number SAMMIE Task Baseline number 1 Ask for the existing albums 1 4 2 Play back the song Der Weg von Herbert Gr nemeyer 1 3 3 Find out the songs on the playlist Pur Klassiker 3 3 4 Browse within the albums search for the album Live by Pur 1 5 and play it back 5 Find and play back a Swing song by Michael Buble 6 Add the song 99 Luftballons by Nena to the new playlist 3 5 7 Find the song Yesterday by the Beatles and play it back 8 Create a new playlist 3 4 9 Find the artist of Romeo and Julia on the playlist Cool Hits 10 Choose any song of the genre Rock and play it back Table 3 Tasks used in the SAMMIE evaluation study All tasks were given in the Full SAMMIE run The bold tasks were transferred from the baseline study Not more than the grey pronounced tasks were given in the C amp C run to keep the whole
18. Lists should not be specified by speech output Time delay until a song has been found Egg timer icon Not enough information on display Picture of the album is superfluous Details like song duration number of list elements etc are missing Concerning the open evaluation of the C amp C system on the whole two Subjects found it better than the SAMMIE system There were statements like iDrive device positioned too far backward which was confirmed by several female Subjects informally expect a natural spontaneous speech input Speech understanding problems with low voice and high surrounding noise e g in tunnel More reliable but less fun One song chosen then all other songs were played back which I did not want Microphone icon should be near tachometer During the sessions Subjects uttered spontaneously or they were asked by the experimenter about their behaviour Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 57 96 During the SAMMIE runs there were statements like Thinking about formulation is strenuous complete sentence takes longer than a command Understanding problems in the tunnel Angry when not understood Safe control of car or system Annoying if another song instead of that song is included into the playlist I looked for The Beatles at
19. session within time limits All task but no 8 could be performed by speech input or by iDrive Task 8 had to be solved exclusively by speech input The experimenter presented the tasks in a consistent way by reading them from paper Each task was repeated once with a different formulation to avoid predefining a single specific formulation and to assist the recollection Formulation and presentation of the tasks 1 4 6 and 8 was identical to the baseline study with partly slight differences in the formulations The songs and albums which had to be played were actually realised acoustically and played back partly When the Subject did not stop it then the respective song or the next song of the album continued to play until the next or even to the next but one task The grouping of tasks into scenarios as in the baseline study was abandoned as well as the categorizing into different difficulty levels The presentation of the tasks was a critical aspect The alternative of presenting them visually was excluded for reasons of visual distraction Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 16 96 2 5 Experimental design The study was conceived as critical experiment I e hypotheses were defined on the basis of the baseline study and other deliberations s chapter 4 5 Moreover additional results were expected concerning the multimodality and efficiency of the SAMMIE system
20. system user system system user system user system Output Der Song wurde zur Wiedergabeliste AUTOFAHRT hinzugef gt system DISPLAY Kontextpanel AUTOFAHRT 59 Table 4 Example for a repeated modality change Subject 2 SAMMIE task 6 P The Chi Square Test is a nonparametric test and compares the observed and expected frequencies in each category to test that all categories contain the same proportion of values or not It assumes ordinal or nominal levels of measurement Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 23 96 3 2 Task Completion Rate The Task Completion Rate TCR is defined as number of accomplished tasks in relation to the number of the given tasks The objective TCR represents the correctly accomplished tasks The subjective TCR represents those tasks where the Subjects thought to have accomplished the tasks correctly usually with a wrong parameter or without playing back a song album which was already displayed correctly e g in task 4 After a first failure to accomplish a task the Subjects were permitted to repeat the task until the end of the course segment was reached The Subjects themselves had the possibility to stop the processing of the task whenever they would have done so in real live which was made rather seldom use of see below The experimental concept included the possibility of experimenter s help s below These helps
21. were given particularly when the Subject forgot a parameter In relatively rare cases the experimenter gave an additional support when he had the impression that the Subject did not understand the task E g several Subjects conceived the task 3 with playlist Pur Klassiker as a playlist or album of Pur called Klassiker In those cases the experimenter pointed to the obvious misunderstanding 7 Overall Task Completion Rate 100 90 80 70 60 50 40 30 20 10 0 subjective objective Percent Subjects SAMMIE C amp C Figure 13 Overall Task Completion Rate for the systems averaged over tasks 1 5 9 10 and Subjects In general it is not possible to draw a fair comparison to the baseline study concerning task completion rate because of several reasons e experimental setup environment was quite different lab vs e The driving task was different driving simulation vs real driving conditions e The experimental design had to be changed In the baseline study the subjects were given 5 attempts to accomplish a task without a distinct time limit In the in car evaluation tasks had to be completed within pre defined course segments which in turn resulted in tighter time constraints with usually less attempts to finish the task To remind In the baseline study a maximum of 5 attempts was permitted If the Subject performed the task after an experimen
22. which explained the experiment on the whole and the SAMMIE and C amp C system in detail s attachment 8 2 n The introduction to the experiment and SAMMIE system comprised e Objectives and experimental realisation multimodality sequence of activities e Functions of speech input and microphone buttons microphone opening closing functions reformulation after misunderstandings signals possibility for human communication e iDrive movements functions e Experimental design tasks input modalities runs e MP3 display microphone icon lists cursor The different dialogue and speaking styles for the two systems were explained explicitly The training run with pure driving without any additional tasks took about 5 min where the Subject was getting accustomed to the specific features and driving behaviour of the experimental car pedals blinking darkened windscreen etc The Subject was advised to try some simple manoeuvres like braking Before each run equivalent training video clips were shown with typical functions playing back a specific song adding a song to a playlist requiring an information about an album These functions were shown with Full SAMMIE iDrive Full SAMMIE speech input and C amp C system speech input in a timely manner As Figure 5 and Figure 7 show the training video for the iDrive was given once before the first run The sample of the training video illustrated a few possible formulations and dialo
23. 2002 Inter Conf on Spoken Language Processing vol 1 pp 269 272 Denver CO USA Sept 2002 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 79 96 8 Attachments 8 1 Tasks 1 Aufgabe SAMMIE 1 Kommando multimodal l sen mit Spracheingabe anfangen Bitte finden Sie heraus welche Alben im System vorhanden sind Sie wollen also wissen welche Alben es gibt 2 Aufgabe SAMMIE 1 Kommando multimodal l sen mit Spracheingabe anfangen Lassen Sie sich bitte das Lied Der Weg von Herbert Gr nemeyer auf dem Album Mensch abspielen Sie m chten also das Lied Der Weg von Herbert Gr nemeyer auf dem Album Mensch h ren 3 Aufgabe SAMMIE 1 Kommando multimodal l sen mit Spracheingabe anfangen Finden Sie nun bitte heraus welche Lieder in der Playliste Pur Klassiker vorhanden sind Sie wollen also wissen welche Titel die Wiedergabeliste Pur Klassiker enth lt 4 Aufgabe SAMMIE 1 Kommando multimodal l sen multimodal l sen Bitte gehen Sie durch die Alben suchen Sie das Album Live von Pur bis es angezeigt wird und lassen es abspielen Also das Album Live von Pur indem Sie die Liste durchgehen und anh ren 5 Aufgabe SAMMIE 1 Kommando multimodal l sen multimodal l sen Suchen Sie ein Swing St ck von Michael Buble und lassen es abspielen Si
24. 3 4 Task duration The duration of a task was measured between the end of the experimenter s task announcement and the confirmation of the Subject that he finished the task 2 Here only those tasks with full subjective or objective accomplishment are considered i e the tasks with TCR 0 are neglected So the longer unsuccessful tasks are not included in the following figures Figure 22 shows the overall task duration for the systems in the present and the baseline study averaged over Subjects and selected tasks For the SAMMIE run and the baseline study free run only the identical tasks were considered which were performed in both studies tasks 1 4 6 8 26 Tn the baseline study a task was considered as being finished at the last system output Here however the confirmation of the Subject represented the ending of a task s chapter 2 6 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 31 96 The average task with SAMMIE and C amp C took about 40 50 s The minimal task durations were about 10 s 12 s The parallel tasks in the baseline study however took clearly longer 27 None of the pairs are significant Mann Whitney U Test SAMMIE C amp C n 710 7 U 34 p 0 92 SAMMIE Baseline n 10 n2 6 U 19 p 0 23 Overall task duration min s 01 30 tasks 1 4 6 8 tasks 1 5 9 10 SAMMIE C amp C
25. 37 0 very very high low Figure 67 Answers to the question 1 How useful is the differentiated addressing by Du The feature of a differentiated function for visual acoustical presentation of artists albums songs was basically positively judged Figure 68 But there was a certain range over the positive categories which may be interpreted as a possibly disturbing effect of long spoken lists 2 Usefulness of differentiation show read out 60 50 o m Us 3596 2 30 oe 25 t 5 20 10 596 5 mean 0 0 Lk very very high low Figure 68 Answers to the question 2 How useful is the differentiation between show and read out Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 64 96 The features Presentation of albums with artists Figure 69 and Implicit confirmation e g the feedback of the entered artist as a headline Figure 70 are clearly positively judged mean 85 mean 70 3 Usefulness of albums with interpreters 60 50 50 8 40 2 a 30 30 20 10 10 0 0 very high low Figure 69 Answers to the question
26. Baseline 01 20 01 10 01 00 00 50 00 40 00 00 00 30 00 20 00 10 Overall task duration min s SAMMIE C amp C much MP3 experience Figure 22 Overall task duration for the systems in the present and the baseline study averaged over selected tasks and Subjects means and standard deviations The next Figure 23 shows the number of turns for the systems and the MP3 experience levels averaged over tasks and Subjects within subgroups As for TCR and number of turns results see above there was no very strong difference between MP3 experiences But MP3 experienced Subjects were somewhat faster with SAMMIE which reflects the number of turns see above Figure 23 Overall task duration as a function of system and MP3 experience averaged over tasks and Subjects within subgroups 27 Tf the tasks 1 5 9 10 would be considered an average SAMMIE task duration of 00 43 s would result to compare with the C amp C data of 00 49 s If all tasks are considered an average SAMMIE task duration of 00 48 s would result A comparison of task duration SAMMIE baseline on the basis of all single tasks of all Subjects with Mann Whitney U Test would have been presumably attained significance but was too costly A t test is revealing significance But task duration is not normally distributed which prohibits the application of this test Version Final 1 1 Distribution public I
27. CER 31 1 for in grammar data One may assume that this is also true for a significant number of out of grammar utterances Figure 37 shows a different picture for the Command amp Control system Although error rates for in grammar data are high as well the overall sentence and word error rate WER are in a more acceptable range due to a quite low out of grammar rate Possible misunderstandings could be reduced by advising the subjects to use displayed items as commands what you see is what you can speak Figure 38 depicts a comparison of the average error rates for the Full SAMMIE the Command amp Control and the Baseline system the latter evaluated in November 2005 see deliverable D6 3 2 38 There is no tool support to compute the concept error rate for all data i e including out of grammar utterances Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 44 96 In Car Showcase Evaluation Comparison of Error Rates 100 90 80 out of grammar rate sentence errorrate word errorrate sentence error rate word error rate concept errorrate out of vocabulary overall overall in grammar in grammar in grammar rate E Full SAMMIE m CC SAMMIE Baseline Figure 38 Speech recognition error rates for all evaluated systems There are two eye catching differences between the systems First of all t
28. Figure 52 Answers to the question 18 How do you judge the formulations of speech output Very high scores got the present system in respect to the acoustical quality Figure 53 This was stated several times spontaneously during the runs too 19 How good was acoustical quality e SAMMIE mB C amp C A Baseline Percent Subjects wo good very bad mean SAMMIE 90 C amp C 91 Baseline 57 Figure 53 Answers to the question 19 How good was the acoustical quality of speech output Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 53 96 The next figures concern the subjective evaluation of the display Figure 54 Figure 57 There is a general trend towards a positive judgement but often clearly below maximum The display was judged to be more helpful in the present study than in the baseline study Figure 54 which is contrary to the judgements as to speech output As the informal interview revealed the display was felt to be clear and easy to survey which was not the case in the baseline study 22 How helpful was display 60 50 8 40 SAMMIE E m C amp C 30 A Baseline t S 5 2096 10 e SAMMIE 77 S Eo C amp C 82 very
29. Use system in future 71 Figure 59 Answers to the statements of questions 27 31 Do you agree to the statements for C amp C 27 31 Do you agree to 8096 e 27 Understood system 60 m 28 Got information o 9 29 Knew what to do E 30 Function as expected 40 31 Use system in future 8 8 20 0 totally not at all 27 31 Do you agree to 8096 e 27 Understood system 60 28 Got information i k 29 Knew what to do S 30 Function as expected 4096 31 Use system future 8 a 20 0 agree agree totally not at all mean 27 Understood system 90 28 Gotinformation 60 29 Knew what to do 72 30 Function as expected 61 31 Use system in future 76 Overall scores for selected questions m SAMMIE C amp C Figure 60 Overall scores for selected questions The last Figure represent the overall scores of the selected questions concerning the dialogue Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 56 96 In the intermediate questionnaires there were some open questions concerning general remarks to the speech output the display presentation and the SAMMIE operation system question 20 26 3
30. clear advantage of the SAMMIE system over the C amp C system All Subjects gave a positive answer with the SAMMIE system by far most of them in the upper two categories This distinct vote for SAMMIE as to comfort should be attributed to the Subjects experiences of one input tasks with SAMMIE speech input This subjective result is one of the most distinct ones concerning the comparison between the systems 5 Which comfort did you feel 60 50 30 20 Percent Subjects 10 mean SAMMIE 82 40 e SAMMIE very very C amp C 69 much low Figure 42 Answers to the question 5 Which comfort did you feel As the next Figures Figure 43 Figure 44 show the decision and the change between modalities was easy or very easy for most of the Subjects This is an important result for the concept of multimodality since a change between modalities at pleasure is easily possible Interesting is that the decision was easier in the baseline study which can possibly be interpreted with the steadily open microphone Tf no baseline data are shown in the figures the equivalent question was not put in the baseline study Even if not all Subjects profited by this they became acquainted with it within the video clip introduction Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 48 96 100 80
31. concerning advantages and disadvantages of the multimodal input The most frequently pronounced advantage concerned avoiding the problems of the other modality which was pronounced by nearly all Subjects This question was not asked in the baseline study The next frequently pronounced option was free choice of the operation mode I e a main motivation for SAMMIE the free option of input modality was felt positively by a considerable part of the sample More than the half of the Subjects pronounced the aspects of adaptation to traffic and tasks Compared to the advantages there were much less disadvantages pronounced The main disadvantage was the uncertainty as to which task was feasible by which input device which was more severe in the baseline study This result is somehow astonishing since all but one task Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 61 96 creating a new playlist was feasible by both modalities The need for a choice between input modalities was no longer pronounced by anyone 8 Advantages of multimodal SAMMIE input 100 m 90 80 m SAMMIE 62 60 Baseline m 60 2 o amp 40 2 2096 14 14 07 E E 0 4 guo o ei Ad o qo e 9 Disadvantages of multimodal SAMMIE input 100 80 m SAMMIE a 8 Baseline 60 2 o 5 40 2
32. departure error was defined as exceeding the middle or edge line of the lane with the edge of the car Lane departures with a duration of more than about 7 8 s were counted repeatedly After each run the experimenter and the supervisor assessed the driving quality on five 5 point scale These were A very safe very unsafe 22 B defensive aggressive C adapted not adapted D rule conformity no rule conformity E concentrated not concentrated The assessments by the two persons were independent from each other and were averaged afterwards By this a certain level of objectivity was achieved This categorization system 9 This categorization is a result of preceding tests in BEF and is similar to other projects e g INVENT where it was gradually developed 3 This critical time of lane departure was assessed subjectively by the supervisor depending on the driving situation The German word sicher could be interpreted in terms of confident or in terms of safe Both interpretation were used depending on the Subject The sovereign drivers were confident but not necessarily safe drivers 33 There was no specific training for the subjective assessment of driving quality While categories C and D could be reduced to some objective criteria the other categories had to do with the experience of the evaluators and their personal driving behaviour By far the most categorizations of the two evaluators were identi
33. familiar interaction mode of natural speech input By this they achieved a higher TCR with SAMMIE than with C amp C The older group with less MP3 experience relied more on the better known manual operation with a direct connection of input device to the display Overall number of turns Task speech turns much few MP3 experience SAMMIE tasks 1 10 iDrive turns much 0 0 few MP3 experience Figure 20 Overall number of turns Task as a function of system and MP3 experience speech and iDrive turns separated averaged over tasks and Subjects within subgroups SAMMIE tasks 1 10 4 0 Number of turns Task 10 4 9 8 7 6 5 4 3 2 e SAMMIE 1 a C amp C Mel 1 2 3 4 5 6 7 8 9 10 Task The previous Figure 21 shows the number of turns for the systems and single tasks averaged over Subjects There was a tremendous difference of number of turns between tasks Much more turns were necessary to perform tasks with more parameters tasks 4 6 7 or and where the system performance was lower than else tasks 6 7 10 A specific situation with recollection and pronunciation problems arose in task 5 Michael Buble Figure 21 Number of turns for the systems and tasks averaged over Subjects Version Final 1 1 Distribution public IST 50780
34. input The decision for a modality and the change between modalities was easy for most of the Subjects about 80 85 This is an important result for the concept of multimodality since a change between modalities at pleasure is easily possible The information output was not fully accepted as to liking 70 support 50 55 information distribution 65 70 and assistance 65 The speech output was assessed to be more or less good 65 70 sufficiently extensive with relatively good formulation 75 and very good quality 90 The judgment of speech output was better than in the baseline study as to quality extent formulation but not as to content and assistance These aspects were highly appreciated in the baseline study The display was relatively well judged This holds true for the assistance 80 contents 70 80 design 75 80 and extent Here the SAMMIE display was mostly better judged than the baseline display difference 7 15 apart from the extent Concerning the dialogue there was a tendency to a positive judgment too SAMMIE was generally better judged than C amp C The best scores got the statement concerning the understanding of what the system said 90 Relatively bad judgments referred to the statements that it was easy to get the information which the user wanted and that the system worked as expected 55 65 Actually the Subjects were relatively often
35. of the hypotheses missed significance But the tendencies confirmed a part of them Hypothesis Tendency Significance 1 Users prefer speech input more with the SAMMIE yes yes system than with the C amp C system 2 Users with much MP3 experience tend to manual contrary no operation Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 76 96 3 Users with much MP3 experience achieve a higher yes no operation efficiency particularly with a lower number of turns 4 Users get a higher Task Completion Rate with yes no SAMMIE than with C amp C 5 Users are faster with the SAMMIE system than with yes no the C amp C system 6 The number of turns per task is higher with C amp C yes yes than with SAMMIE 7 SAMMIE needs less iDrive actions no no 8 The number of system errors with SAMMIE is only False reactions no marginally higher than with C amp C clearly higher rejections lower 9 SAMMIE does less distract from driving than C amp C no no 10 The SAMMIE system leads to a higher user contrary no acceptance than the C amp C system 11 Users can assess well what the system has yes understood On the basis of the objective and subjective results as well as on the basis of observations and informal discussions following recommendations can be given Generally Pursue the concept of multimodality i e fully parall
36. speech output was regarded as rather positive better with the SAMMIE system than with the C amp C system Figure 52 This can possibly be attributed to the general trend to judge the C amp C speech output worse than the SAMMIE speech output Halo effect i e the generalization of the judgement in respect to one aspect to the judgement of others 37 The Subjects missed here a central scale category OK 48 When a dialogue seemed to be a deadlock the system offered a help with the announcement W hlen sie einen der folgenden Men punkte Wiedergabelisten Interpreten Alben Titel Musikrichtungen Mit dem Kommando Hilfe erhalten sie jederzeit n tzliche Informationen zur Bedienung des Systems Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 52 96 17 How good was extent of speech output 60 50 40 SAMMIE 30 m C amp C Baseline Percent Subjects 20 10 0 not sufficient OK too extensive Figure 51 Answers to the question 17 How do you judge the extent of speech output 18 How good was formulation of speech o SAMMIE m C amp C A Baseline Percent Subjects very good very bad mean SAMMIE 78 C amp C 72 Baseline 70
37. tasks means standard deviations Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 43 96 C amp C Average Error Rates 100 90 80 70 60 50 40 30 38 92 40 79 30 29 28 38 28 38 20 0 out of grammar rate 14 96 out of vocabulary rate word error rate in grammar sentence error rate in grammar word error rate overall sentence error rate overall concept error rate in grammar Figure 37 Speech recognition error rates for the C amp C system averaged over subjects and tasks means standard deviations The figure for the Full SAMMIE system shows high error rates for all test data as well as in grammar data Given the reasonably low out of vocabulary rate 1 6 the high out of grammar rate 47 7 is somewhat surprising Still the grammar seems to contain almost all the necessary words but obviously does not cover sufficiently the variety of phrases used by the subjects which were encouraged to use natural language However for some cases although not all words were recognized correctly the semantic concept of the user utterance could still be preserved This can be seen from the difference between sentence error rate SER 40 1 and concept error rate
38. the x Test depends on which systems are considered and which frequency was expected If all three present systems or all four systems including the baseline system are considered then the result was highly significant towards the preference of Full SAMMIE C amp C e g four systems X 22 f 3 p lt 0 001 If just the two systems SAMMIE and C amp C are considered there was no statistical difference X 0 43 f 1 p 0 51 Altogether there is a tendency to a spontaneous better impression of SAMMIE s Figure 39 but for preferring the C amp C system on the long run s Figure 74 But both results are missing significance The hypothesis concerning the distraction of the systems was Full SAMMIE distracts from driving less than C amp C Since the objective data of driving errors are very similar between the systems s chapter 3 6 the subjective evaluation concerning distraction is cited intermediate questionnaire question 3 There was no significant difference between the systems Wilcoxon Matched Pairs n 18 T 19 5 p 0 41 The statistical tests for the objective data are included into the corresponding chapters gt Those Subjects were excluded where the SAMMIE system was active instead of C amp C plus Subject 1 where the questionnaires were structured differently so that a n 18 resulted Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 68 96 4 5 AttrakDiff
39. to formulate freely and the new technology were pronounced much more frequently than in the baseline study Concerning the disadvantages of the natural speech input Looking for formulations was pronounced nearly as often as in the baseline study That can be interpreted either by a missing acceptance of the still restricted formulation freedom or by the instruction to formulate in whole sentences Concerning the advantages of manual input the option correct system reaction was pronounced most frequently even more frequently than in the baseline study The main disadvantages were the safety aspects like eyes off road hands off steering wheel and searching by hand The subjectively felt most important advantage of the multimodal input was avoiding the problems of the other modality The free choice of the operation mode was another important argument Le one main motivation for SAMMIE the free option of input modality was felt positively by a considerable part of the Subjects Compared to the advantages there were much less disadvantages of the multimodal input pronounced The main disadvantage was the uncertainty as to which task was feasible by which input device which was more severe in the baseline study The need for a choice between input modalities was no longer pronounced by anyone Being asked which functions the Subjects would like to use in the car by the multimodal interaction including the
40. was usually five to seven Here only the tasks with full subjective or objective accomplishment are considered i e the tasks with TCR 0 are neglected So the long taking unsuccessful tasks with a long series of turns are not included in the following figures number of turns Task 9 8 7 6 5 44 Figure 17 Overall number of 34 turns speech and 2 iDrive turns added 1 averaged over tasks el and Subjects mean standard deviation SAMMIE tasks 1 10 C amp C tasks 1 5 9 10 Overall number of turns Task 40 speech iDrive 3 5 3 0 27 2 8 26 2 5 2 3 2 0 1 5 Figure 18 Overall number of 1 0 turns averaged over tasks and Subjects speech and iDrive 0 0 turns separately SAMMIE tasks 1 10 C amp C tasks 1 5 9 10 displayed 0 5 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 28 96 The previous Figures show the overall number of turns for the two systems in the present study averaged over the successfully performed tasks and Subjects 2 In Figure 17 all turns of each task were counted 1 the speech turns and the iDrive turns were totalised with rounding errors In Figure 18 the speech and iDrive turns were counted separately With the SAMMIE and the C amp C system about
41. 0 shows the driving errors for the individual error categories There is no obvious difference of the individual driving errors between the systems Any distraction from driving is equivalent for both systems in all measured categories Lane departures and low speeds were the most frequent errors 1 2 lane departure errors per minute seems to be relatively high and can be attributed to the visual distraction when observing the display Le the display was presumably as frequently observed with SAMMIE as with C amp C though a more speech based dialogue would have been possible As could be observed during the test there were rather few glances onto iDrive 2 There were clearly more speed too low as speed too high errors The operation of the systems needed some visual attention which was compensated by reducing the speed The experimental car was relatively often overtaken even on the two lanes roads There were very few dangerous situations Since a relatively broad definition of dangerous was introduced these were mainly situations where the supervisor warned which he did for more than one point on the scales Larger differences were discussed after the score specification so that a certain degree of adaptation to each other can not be excluded 4 The lower standard deviation with C amp C should not be interpreted because of less tasks and Subjects considered with C amp C 5 Though not comparable to the present results a resul
42. 2 In the SAMMIE questionnaire there were single comments as to system output like Speech output has become more melodious and therefore more comprehensible as compared to the baseline system T could not take advantage of the speech output since I was understood seldom It should not say what has been done but that something has been done Display was very simple and clear In long lists the selection of a letter would be good Priorities should be set to Which album and song is currently playing Button for submenu e g create playlist Accommodation of the eyes is a problem Concerning the open evaluation of the SAMMIE system on the whole five Subjects stated that the SAMMIE system has become faster or better than the baseline system There were statements like System understands better and works faster than the TALK Baseline system System has been very much improved Complete tasks in one sentence are appropriate to reduce distraction Tt is a pleasure to work with the system though series maturity has not been reached yet In the C amp C questionnaire there were single comments as to system output like T was not always informed when I was not understood and what I can choose Speech output detains After a change from speech input to iDrive speech output should be stopped Speaking speed something too slow Lists should be specified by speech output
43. 2 TALK D6 4 Part I 25 January 2007 Page 30 96 For most tasks there were more turns with C amp C than with SAMMIE This holds true particularly for the rather complex tasks with several parameters In all tasks more turns occurred than the minimum which was very pronounced with the SAMMIE system In tasks 1 and 8 however there was just a slight difference between minimal and actual number of turns Following example shows a typical number of turns for task 5 Subject 15 SAMMIE task 5 4 speech turns 3 iDrive turns t 1 17 min 2 false reaction 1 rejection 3 driving errors TCR 1 Subject 15 SAMMIE task 5 jASRInpu motunderstood Input zeige alle musikichtungen Live in MPS Player geladen OOO o o o DISPLAY Kontextpanel Musikrichtungen 5 ser swing o O E DISPLAY Kontextpanel Interpreten von Swing Musik system TTS Output der einzige Interpret mit Swing Musik heisst Michael Buble Pur vom Album Live in Player geladen O jASRIput spelema mehri OOO OOS DISPLAY Kontextpanel Michael Buble gt Caught In The Act Michael Buble vom Album Caught In The Act in Player geladen Michael Buble vom Album Caught In The Act in Player geladen Table 6 Example for a typical number of turns Subject 15 SAMMIE task 5
44. 3 Dis Baseline male 49 yes 2 10000 5 14 Hat Baseline male 41 yes 2 12000 4 15 Rot Baseline male 31 no 2 14000 3 16 Sau Baseline male 21 yes 2 2000 3 17 Sch Baseline male 21 yes 2 8500 2 Table 2 Subjects of the SAMMIE experiment experience Driving expe Self assessment Technician background rience Short name studies Four persons of the baseline study were excluded in advance because of more than one self induced accident within the last years or other reasons Subjects 9 did not specify his profession beyond a general statement employee 7 The motivation of those technicians were very high They participated in the experiment sometimes even within their working hours Nearly all of this subgroup had an academical background partly still studying Those persons with a low actual driving experience were included because they participated already in the baseline study without having had much accidents 11 16 or were known from other BEF experiments as safe drivers 8 Subject 8 does not own a car The numbering does not correspond chronologically to the session order But roughly speaking most of the persons with low numbers performed their session in the first part most of the persons with high numbers performed their session in the second part of the study Version Final 1 1 Distribution public
45. 3 How useful is album presentation with artists 4 Usefulness of implicit confirmation 60 50 50 8 40 2 30 30 P 209 E 2096 1596 10 5 0 0 me high low Figure 70 Answers to the question 4 How useful is the implicit confirmation 5 Usefulness of extended user guidance 50 40 2 8 S 30 2 8 20 s 10 10 5 T I very very high low Figure 71 Answers to the question 5 How useful is the comprehensive user guidance Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 65 96 Concerning the extended user guidance was regarded as basically positive s preceding Figure 71 But most of the answers were distributed over the three positive partly non maximal rating categories The step by step guidance was not totally accepted presumably because of the somewhat lengthy dialogue The feature of an adaptation to the user s vocabulary was judged very diversely Figure 72 In spite of the tendency to a positive acceptance there was a group of 4096 who had a more or less negative attitude to the usefulness of this feature As the informal statements showed this was seen as a marginal feature 6 Usefulness of adaptation to user s vocabulary 5096 4 4096 35 2 8 30 N 20 5 15 15 15 a 10 10 10 0 4 57 very very high low
46. 5 turns were necessary on the average to complete a task Considering the complexity of most of the tasks this seems to be an acceptable level The difference of number of turns between SAMMIE and C amp C in Figure 17 is significant Mann Whitney U Test lt 0 05 nj 171 95 U 6629 24 With the SAMMIE system however there were not more than 0 5 turns less than with the C amp C system This seems to be a marginal difference because with SAMMIE mostly one turn would have been theoretically sufficient But there were several factors which affected the number of turns e Subjects frequently did not choose the direct and shortest possible dialogue but partitioned the task in several steps e g firstly calling up the albums or playlists then specifying them e Subjects had to repeat their input after rejections and false reactions by the system which was more frequently with the SAMMIE system s chapter 3 5 There is a rough balance between the speech and iDrive turns for SAMMIE as well as for C amp C There is a tremendous standard deviation for both the number of turns with SAMMIE as well with C amp C system which shows the enormous inter individual differences The next Figure 19 shows the number of turns for the systems and the MP3 experience levels averaged over tasks and Subjects within subgroups The speech and iDrive turn data are summed up As for TCR results see above there is no very strong difference between MP3 e
47. AMMIE system the thinking about the formulation or reformulation after rejections was felt to be straining by many Subjects With the C amp C system the Subject was more bound to the menu and had to do more turns These factors seem to be more or less equivalent as to the subjectively felt mental load 5 4 Subjective Results The Subjects filled out intermediate questionnaires after both runs and a final questionnaire at home For reasons of comparison most of the questions were identical to the final baseline study questionnaire and included mainly 6 point rating scales With the present systems by far most of the Subjects tended to a positive rating Summarized and normalized scores of general impression SAMMIE 75 C amp C 70 With the baseline system there had been a lower rating 61 Le there was a clear improvement concerning the subjective overall impression from the baseline to the SAMMIE systems the more so as the present systems were judged to be easier to use 75 than the baseline system 65 SAMMIE 65 was assessed to be less distracting than C amp C 59 and much less distracting as compared to the baseline system 46 But a certain distracting effect was felt by most of the Subjects A markedly higher comfort was felt with SAMMIE system 8296 as compared to the C amp C system 69 This distinct vote for SAMMIE as to comfort should be attributed to the Subjects experiences of one input tasks with SAMMIE speech
48. AttrakDiff 3 facilitates the evaluation of a chosen product by customers user etc The evaluation data makes it possible to assess how the attractiveness of the product is experienced in terms of usability and appearance and whether optimisation is necessary AttrakDiff 1 was applied as an instrument of measurement in the form of semantic differentials It consists of 23 seven step items whose poles are opposite adjectives e g confusing clear unusual ordinary good bad Each set of adjective items 1s ordered into a scale of intensity Each of the middle values of an item group creates a scale value for pragmatic quality PQ hedonic Quality HQ and attractiveness ATT The two constituent aspects of hedonic quality namely stimulation and identity are separated The hedonic and pragmatic qualities are perceived consistently and independently of each other Both contribute equally to the rating of attractiveness The data of the present study was used to simulate the participation of 20 Subjects SAMMIE all Subjects but No 1 and 18 Subjects C amp C all Subjects but No 1 14 21 The following results were reported by AttrakDiff Overview of AttrakDiff Results El Medium value of the dimensions with Full SAMMIE Confidence rectangle ElMedium value of the dimensions with C amp C Confidence rectangle hedonic quality HQ Project part A Study SAMMIE superfluous Evaluation data entered 20 21 Projec
49. Bitte pro Zeile ein Kreuz Falls Sie bei der Fahrsimulation TALK dabei waren bitte auch in der 3 Zeile ein Kreuz machen SAMMIE System viel einfacher viel schwerer Kommando System viel einfacher viel schwerer TALK System viel einfacher viel schwerer Einzelne Aspekte der Eingabeverfahren Spracheingabe Bitte denken Sie bei der Beantwortung der folgenden drei Fragen an die Versuchsfahrten mit dem nat rlich sprachlichen SAMMIE System sowie an die Aufgaben wo Sie vor allem die Spracheingabe benutzt haben 3 Stellten ein oder mehrere Aspekte der folgenden Liste f r Sie pers nlich Vorteile der Sprachbedienung mit dem nat rlich sprachlichen SAMMIE System im Vergleich zur manuellen Eingabe dar O Q QUO sO Keine Blickabwendung vom Verkehr H here Konzentration mit den Gedanken auf den Verkehr Relativ freie Formulierung der Fragen Moderne Technik Sonstiges 4 Stellten ein oder mehrere Aspekte der folgenden Liste f r Sie pers nlich Nachteile der Sprachbedienung mit dem nat rlich sprachlichen SAMMIE System im Vergleich zur manuellen Eingabe dar O Fehlerkennung von Spracheingaben O Notwendige Suche nach einer passenden Formulierung Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 92 96 O L nger dauernde Eingaben O Geg
50. CAMIS D6 4 Final Report on Multimodal Experiments Part I Evaluation of the SAMMIE System Hartmut Mutschler BEF Frank Steffens Andreas Korthauer BOSCH Final 1 1 Distribution public TALK Talk and Look Tools for Ambient Linguistic Knowledge IST 507802 Deliverable 6 4 25 January 2007 gt Project funded by the European Community o gt under the Sixth Framework Programme for EJ Research and Technological Development nur c The deliverable identification sheet is to be found on the reverse of this page Project ref no Project acronym Project full title Instrument Thematic Priority Start date duration Security Contractual date of delivery Actual date of delivery Deliverable number Deliverable title Type Status amp version Number of pages Contributing WP WP Task responsible Other contributors Author s EC project officer Keywords The partners in TALK are IST 507802 TALK Talk and Look Tools for ambient linguistic knowledge STREP Information Society Technologies 01 January 2004 36 Months Consortium internal M36 December 2006 20 December 2006 D6 4 Part I Final Report on Multimodal Experiments Part I Evaluation of the SAMMIE system Report Final 1 1 96 WP6 BOSCH BEF c o BOSCH BMW DFKI USAAR Hartmut Mutschler Frank Steffens Andreas Korthauer Evangelia Markidou Evaluation final in car showcase multimodal experiments
51. I relied on acoustical dialogue but recollecting is difficult would like to adjust volume and bass by speech input permanent open microphone when driving alone Music should become lower when PTT is activated A shuffle mode would be good During the C amp C runs there were statements like Distraction by system errors Mental load by distraction Display Kein Lied geladen is irritating T turned off the music to prevent disturbance of the speech input Not clear if I have to wait for end of speech output to proceed manually Concerning iDrive there were statements like Position too far backwards Faster At the beginning speech input because of being new device Better structured input and overview T did not think to proceed in the list by turning Delay time with iDrive is irritating Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 58 96 4 2 Final questionnaire The Subjects completed the final questionnaire at home i e after having got known both systems It contained several questions in respect to a general view of the multimodal interaction during driving Question 1 Which input modality would you prefer in the long run was asked because it was assumed that the learning effect was still pending during the sessions As the next Figure 61 illustrates there was a slight preference for the C amp C sys
52. MIE task 6 ASR Input album leuchtturm gern vielleicht das erste lied in die playlist autofahrt einf gen DISPLAY Kontextpanel AUTOFAHRT Table 8 Example for rejections with SAMMIE Subject 17 task 6 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 36 96 3 6 Driving quality The driving quality was measured by recording the driving errors online during the runs and by scoring the overall driving quality Following driving error categories were considered No Category Driving errors Dangerous situation Intervention of driving instructor etc 2 Speed too low Speed lt 20 km h below limit Speed too low with respect to traffic situation 3 Speed too high Speed 2 10 km h above limit Speed too high with respect to traffic situation 4 Distances too low Longitudinal distance too low 41 4 of tachometer Lateral distance too low 0 5 1 5 m depending on situation and StVO 5 Keeping lane inexactly Lane departure False lane used 6 Insufficient observation Bad observation of traffic ahead behind or beside Blind area disregarded etc 7 Inappropriate braking Hard braking Late braking 8 Other driving errors Wrong gear No blinking etc Driving errors were counted only within task processing To enable a comparison of driving errors between Subjects and tasks they were normalized to one minute A lane
53. MIE system and a Command amp Control like reference system C amp C The SAMMIE dialogue system with its evaluated variants and the C amp C system as well as their integration into the BMW car is described in detail in TALK deliverable D5 3 1 The objectives of the evaluation study were to find out how efficient the Final In Car Showcase SAMMIE system for the interaction with a MP3 system in a car is being used and to what extent it is accepted 21 Subjects performed two runs with SAMMIE and the C amp C system on a 19 35 km course with 7 10 tasks The experimental design also allowed for a comparison with the corresponding evaluation of the In car Baseline system cf TALK deliverable D6 3 2 When directly comparing the results of both studies it is important to note that the evaluation conditions for the Baseline system were different from the evaluation of the final SAMMIE system e The Baseline system has been evaluated as a laboratory prototype using a simulated driving task whereas the final evaluation took place in the BMW car under real driving conditions on the road e Due to the missing vestibular feedback of acceleration the simulated driving task in the Baseline study was unfamiliar and more demanding to some Subjects than the real driving task in the final evaluation e A head set with a close talk microphone was used for the Baseline system versus a far talk microphone array for the final SAMMIE evaluation resultin
54. NG QUALITY Bale MENTAL LOAD wit ond eec ever ett etg e e evo R e e ete du este 3 8 SPEECH RECOGNITION PERFORMANCE 4 1 INTERMEDIATE QUESTIONNAIRES isien i ii E Eiki E EEEE aa Ni KE a iai iE iiet 45 4 2 FINAL LEE RE 58 4 3 QUESTIONNAIRE ADAPTIWNEINON ADAPTIWVESAMMIE 000000000000 63 4 4 STATISTICAL NEE 67 4 5 ATTRAKDIFF er nde pe RR ER c e e e eua Perte dees de netu ce aste eate teeta ee ea Rede 68 5 1 OBJECTIVES xit RR aeree RL a eee eet RA ee e 23 2 METHODS x ee Rn dt ao EE peii ete e 5 3 OBJECTIVE RESULTS n 2 4 SUBJECTIVE RESULTS Ee a Ee Bauer RR e e 8 1 8 2 8 3 INTERMEDIATE QUESTIONNAIRE arn a Ran PN Fo EE CAN d reg ee A e LAT e Ae 85 8 4 FINAL QUESTIONNAIRE 85 NASAMMIE QUESTIONNAIRE IST 507802 TALK D6 4 Part I 25 January 2007 Page 5 96 Executive Summary The TALK deliverable D6 4 splits into two parts 1 The first part concentrates on the evaluation of the final SAMMIE in car system 2 Inthe second part we report on the data collection experiments SAMMIE MIMUS and SACTI Moreover we present results from the evaluation experiments using the TownInfo system This part of deliverable D6 4 reports on the results of the evaluation of the final SAMMIE in car system A user test was performed in an experimental car with the SAM
55. Random selection 2x e Other display design better display quality 2x e Higher flexibility concerning formulation Ix e Stopping speech output when manual input starts Ix e Combination between SAMMIE and Command system 1x e Higher sensitivity to the dialogue context e g no erasing instead of playing back 1x e Additional functions charts statistics etc Ix e Submenu 1 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 63 96 4 3 Questionnaire Adaptive Non adaptive SAMMIE After the runs with the Full Adaptive SAMMIE and C amp C system the Non Adaptive SAMMIE system was presented at the end of the session in form of 6 examples s Attachment 5 An example consisted of a double presentation first with Full SAMMIE second with NA SAMMIE Each example was dedicated to one specific feature which was differing between Full and NA SAMMIE After each example the equivalent question to this feature was asked s Attachment 5 The following figures show the answers to these questions While all other features were judged positively the usefulness of the personal differentiated addressing by Sie Du was scored rather negatively Figure 67 But there was a group of 30 who pronounced the second highest category i e the Subjects were divided in respect to this feature 1 Usefulness of personal addressing 50 40 30 20 Percent Subjects 10 mean
56. SAMMIE mode there was still a considerable preference for speech input Most Subjects preferred in most tasks to interact by speech than manually by iDrive even with the experience of rejections and false reactions They took advantage of the possibility to get their MP3 item quickly often within one or a few actions e g one phrase sentence including all parameters For the tasks in C amp C mode however there was a balance between the preferred modalities during the ongoing interactions Speech and iDrive were preferred similarly often by Subjects The C amp C mode required the user to follow the menu in the same manner as with the iDrive mode So it was no basic difference in the effort between modalities except the additional drawback of rejections and false system reactions with speech input Even more interesting is the result that iDrive was preferred somewhat more frequently in the SAMMIE mode during the periods of free interaction though it was clearly less preferred during the interactions to fulfil a given task One possible explanation could be that the first bad experiences with system reactions onto speech input induced partly a shift to iDrive during free interaction periods In addition users probably were able to explore the system more easily and systematically by browsing the hierarchical menu structure using the well known haptic visual modality A Wilcoxon Matched Pair test revealed that the difference of preferred modality
57. ST 507802 TALK D6 4 Part I 25 January 2007 Page 32 96 Task duration min s 01 40 A e SAMMIE 01 30 a C amp C 01 20 A Baseline 01 10 01 00 00 50 00 40 4 00 30 Figure 24 Task duration as a function 09 20 of system 00 10 averaged over 00 00 Subjects 1 2 3 4 5 6 7 8 9 10 Task The previous Figure 24 shows the task duration for the single tasks averaged over Subjects The number of turns contributed mostly to the task duration The task duration reflects very well the number of turns s Figure 21 if broken down to the data of the single tasks of the individual Subjects Pearson correlation coefficient r 0 7 highly significant p lt 0 001 The task processing is faster with SAMMIE than with C amp C and the baseline study in most of the tasks The inverse result in task 5 is attributed to the recollection and pronunciation problem of Michael Buble where the menu driven dialogue with C amp C relieved the problem Very striking is the much shorter task duration with SAMMIE in task 6 as compared to the baseline study which is reflecting the relation in TCR s Figure 14 When any problems with any ambiguity e g two Live albums recollection and pronunciation Michael Buble or system performance 99 Luftballons Beatles Rock occurred the task duration generally increased The similarl
58. Saarland University USAAR University of Edinburgh HCRC UEDIN University of Gothenburg UGOT University of Cambridge UCAM University of Seville USE Deutsches Forschungszentrum f r K nstliche Intelligenz Linguamatics LING BMW Forschung und Technik GmbH BMW Robert Bosch GmbH BoscH For copies of reports updates on project activities and other TALK related information contact The TALK Project Co ordinator Prof Manfred Pinkal Computerlinguistik Fachrichtung 4 7 Allgemeine Linguistik Postfach 15 11 50 66041 Saarbr cken Germany pinkal coli uni sb de Phone 49 681 302 4343 Fax 49 681 302 4351 Copies of reports and other material can also be accessed via the project s administration homepage http www talk project org 2006 The Individual Authors No part of this document may be reproduced or transmitted in any form or by any means electronic or mechanical including photocopy recording or any information storage and retrieval system without permission from the copyright owner Contents 1 2 1 2 2 2 3 2 4 2 5 2 6 3 1 PREFERRED MODAL E Ee e nb ve metres vet dene 19 3 22 cre ine wastes Riss 23 3 3 NUMBER OF TURNS entgeet ee td dete leie e dte totu a 3 4 TASK DURATION EE 3 5 SYSTEM ERRORSE onou Sale aan alias aste ipod a netter 3 6 DRIVI
59. Sprachausgaben sehr gut sehr schlecht Wie beurteilen Sie den Umfang der Sprachausgaben nicht ausreichend zu umfangreich Wie beurteilen Sie die Formulierung der Sprachausgaben sehr gut sehr schlecht Wie beurteilen Sie die akustische Qualit t der Sprachausgaben sehr gut sehr schlecht Haben Sie noch Bemerkungen zu den Sprachausgaben Falls Sie bei der Fahrsimulation dabei waren k nnen Sie auch mit den damaligen Sprachausgaben vergleichen Hier sind nun Fragen zur Displaydarstellung d h zu den Darstellungen auf dem Bildschirm Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 87 96 2 Wie hat Ihnen die optische Anzeige gefallen sehr gut sehr schlecht 22 Wie hilfreich waren f r Sie die optischen Anzeigen sehr hilfreich berhaupt nicht hilfreich 23 Wie gut fanden Sie den Inhalt der optischen Anzeigen sehr gut sehr schlecht 24 Wie gut fanden Sie die Gestaltung der optischen Anzeigen sehr gut sehr schlecht 25 Wie beurteilen Sie den Umfang der optischen Anzeigen nicht ausreic
60. a PUER all Baseline 63 Figure 54 Answers to the question 22 How helpful was the display for you Similarly the contents of the display was judged to be good particularly better than in the baseline system Figure 55 23 How good were contents of display o 9 40 e SAMMIE m C amp C 30 A Baseline t 8 5 20 10 SAMMIE 72 0 4 79 very very Baseline 65 good bad Figure 55 Answers to the question 23 How did you judge the contents of the display Concerning the design of the display there were some reservations in respect to the C amp C system Figure 56 The Subjects missed here the specifications in the headings The extent of the display was regarded as OK Figure 57 The Subjects missed here a central scale category OK Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 54 96 24 How good was design of display 5 e SAMMIE 5 _ A mB C amp C A Baseline t o n mean SAMMIE 79 C amp C 74 very very good bad Baseline 55 Figure 56 Answers to the question 24 How did you judge the design of the display 25 How good was extent of display 100 80 e SAMMIE 60 a C amp C A Baseline
61. agung nach dem SAMMIE System Datum Name 10 Wie ist Ihr allgemeiner Eindruck vom gesamten SAMMIE Bediensystem sehr gut sehr schlecht Die Bedienung des gesamten SAMMIE Bediensystems also mit Sprach und iDrive Bedienung war f r Sie sehr einfach sehr schwierig Wie stark f hlten Sie sich wahrend der Bedienung des SAMMIE Bediensystems vom Fahren abgelenkt Unterscheiden Sie dabei nicht zwischen den Eingabearten sondern betrachten es als Gesamtsystem berhaupt nicht abgelenkt sehr abgelenkt Wie sicher f hlten Sie sich bei der Bedienung des SAMMIE Bediensystems Unterscheiden Sie dabei nicht zwischen den Eingabearten sondern betrachten es als Gesamtsystem sehr sicher sehr unsicher Welchen Komfort empfanden Sie bei der Bedienung des gesamten SAMMIE Bediensystems sehr grof sehr gering Welchen Spa hatten Sie bei der Bedienung des gesamten SAMMIE Bediensystems sehr gro sehr gering Wie leicht oder schwer fiel Ihnen die jeweilige Entscheidung f r eine Eingabeart sehr leicht sehr schwer Wie leicht oder schwer fiel Ihnen der Wechsel zwischen Spracheingabe und Bedienteil seh
62. aneously an increased performance of the present systems as compared to the Baseline system Subjectively the most important advantage of the multimodal input was avoiding the problems of one modality by choosing the other Consequently the free choice of the operation mode was rated positively by a considerable part of the Subjects Recommendations Finally recommendations are given concerning the multimodal interaction concept system performance and system output The most important ones are the following e Pursue the concept of multimodality with free choice of modality at any time Keep the concept of barge in by Push to Talk button and possibly extend the concept with respect to modality changes from speech to iDrive e Further improvements of speech recognition and language understanding performance are needed with regard to acoustic conditions large vocabulary and grammar coverage This is considered an important aspect of multimodal systems featuring speech dialogue Reduce amount and length of the speech output to the necessary information e Keep the display as it is but leave out unnecessary information Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 7 96 1 INTRODUCTION Within the TALK project the multimodal interaction system SAMMIE TALK In Car Showcase had to be evaluated within a user field test The objectives were to analyse e the usage of the multimodal
63. between systems for the given tasks is significant Wilcoxon Matched Pairs T 0 T 28 p lt 0 05 tasks 1 5 9 10 included 7 Le speech input was preferred statistically more frequently with the SAMMIE system than with the C amp C system Vice versa iDrive input was preferred statistically less frequently with the SAMMIE system than with the C amp C system 17 The nonparametric Wilcoxon Matched Pairs Test is comparing two variables It assumes that the variables were measured on a scale that allows the rank ordering of observations based on each variable i e ordinal scale and that allows rank ordering of the differences This test can be almost as powerful as the t test Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 21 96 The next Figure 10 shows the frequency of tasks and free interaction periods which were processed consequently in one modality throughout the equivalent interactions Speech input was exclusively used relatively often with the SAMMIE system much more frequently than iDrive Within the free interaction periods however speech and iDrive were used similarly frequently particularly with SAMMIE s explanation above Overall pure modality 90 B SAMMIE m C amp C 57 tasks 4 5 9 10 Figure 10 Overall pure modality averaged over Speech iDrive Speech iDrive tasks and Subjects tasks tasks free free
64. cal or differed not Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 37 96 differentiated e g between sovereign but risky drivers B C D low E high left high right low and slow jerky drivers A C low B D E high The following Figure 29 shows the overall driving errors per minute for the systems averaged over error categories tasks and Subjects There was no pronounced difference of the mean number of driving errors between systems The use of both systems seem to be coupled to some lack of driving quality Since no reference run without any tasks was performed no statement however can be made about the effect of multimodal operation on driving safety in general The standard deviations were remarkable high since there was a very large interindividual range of driving errors With some Subjects there were not more than occasional driving errors while others crossed the lane edges continuously during task processing The difference of driving errors between systems are not significant Mann Whitney U Test SAMMIE C amp C n 119 2 209 U 12382 p 0 95 Overall driving errors 1 min 5 0 4 0 2 5 3 0 2 0 4 0 0 4 SAMMIE tasks 1 10 C amp C tasks 1 5 9 10 Figure 29 Overall driving errors per minute for the systems averaged over error categories tasks and Subjects means standard deviations The next Figure 3
65. ced Subjects preferred clearly more speech than the less experienced Subjects and vice versa for iDrive The result of Though having learned the manual but not obvious copying into the playlist s chapter 2 6 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 22 96 the statistical X Test shows no significant difference of the preferred modality DOCE experience levels neither for speech nor for iDrive use much low experienced X 621 1 p 0 11 Preferred modality The following dialogue shows an example for several changes between modalities though there were no system errors The Subject changed three times the modality when she did not succeed with the preceding one Subject 2 SAMMIE task 6 3 speech much turns 8 iDrive turns t 1 37 min 1 false reaction 1 rejection 5 driving errors iDrive TOD 1 100 80 60 40 20 0 speech Figure 12 Preferred modality for different MP3 experience averaged over systems and Subjects of the groups Subject 2 SAMMIE task 6 DISPLAY Kontextpanel Titel von Nena TTS Output ch habe 53 Titel von Mena gefunden TTS Output Die ersten 6 zeige ich dir auf dem Bildschirm TTS Output Nennen Sie den Namen eines der angezeigten Titel um ihn abzuspielen system Output Nennen Sie den Namen eines der angezeigten Titel um ihn abzuspielen user system user
66. chen Falls eine Aufgabe nicht zu Ende gef hrt werden kann ist das kein Misserfolg Ihrerseits sondern f r uns ein Erkenntnisgewinn Fahren Sie einfach mit den weiteren Anweisungen fort F r alle Systeme gilt Sie k nnen nur sprechen wenn Sie die Mikrofon Taste vorher gedr ckt haben und die Mikrofonanzeige auf Gr n geschaltet wurde Sie k nnen der Sprachausgabe jederzeit mit der Mikrofon Taste ins Wort fallen und danach selbst einsprechen Noch einmal Bis auf wenige Ausnahmen besteht grunds tzlich freie Wahl zwischen Spracheingabe und manueller Eingabe iDrive auch w hrend der Bearbeitung einer Aufgabe multimodale Eingabe Die Ausgabe des Systems erfolgt optisch auf Display und akustisch als Sprachausgabe Von besonderem Interesse ist f r uns Ihre Benutzung und Bewertung des Dialogsystem incl der Systemausgaben Falls Sie bei der Fahrsimulation im November 2005 beteiligt waren ist au erdem Ihr Vergleich mit fr herem TALK System der Fahrsimulation interessant MP3 Aufgaben e Lieder anh ren e Einholen von Informationen e Arbeit mit Playlisten Wiedergabelisten Jede Aufgabe wird zu bestimmten Zeitpunkten zweimal hintereinander angesagt Beginnen Sie mit der Bearbeitung der Aufgabe bitte erst nach der zweiten Ansage Sagen Sie bitte laut oder geben Sie ein Handzeichen wenn Sie mit der Bearbeitung der Aufgabe fertig sind Danach k nnen Sie auch wieder mit den anderen Fahrzeuginsassen sprechen
67. den Fragen an die Fahrt mit multimodaler Eingabe in der nat rlich sprachlichen SAMMIE Version 8 Stellten ein oder mehrere Aspekte der folgenden Liste f r Sie pers nlich Vorteile der multimodalen Bedienung in der nat rlich sprachlichen SAMMIE Version dar d h Vorteile der freien Auswahl zwischen sprachlicher und manueller Eingabe O Freie Wahl des Eingabemediums nach eigenem Geschmack O Anpassung des Eingabemediums an die Aufgabe Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 93 96 9 10 O Anpassung des Eingabemediums an die Fahrsituation O Vermeidung von Problemen des einen Mediums durch Wahl des anderen O Abwechslung O Sonstiges Stellten ein oder mehrere Aspekte der folgenden Liste f r Sie pers nlich Nachteile der multimodalen Eingabe in der nat rlich sprachlichen SAMMIE Version dar d h Nachteile der freien Auswahl zwischen sprachlicher und manueller Eingabe dar Konzeptionelles Umdenken zwischen den Eingabemedien erforderlich Bei der Spracheingabe muss ich formulieren bei der manuellen Eingabe muss ich auf eine bestimmte Art greifen Unsicherheit ob Aufgabe mit dem gew nschten Eingabemedium tats chlich durchf hrbar ist Entscheidung f r ein Eingabemedium da beide Eingabearten m glich sind Ungewohnte Wahl zwischen zwei Eingabemedien Q CO OO O Sonstiges Welche Gr nde hatten Sie daf r bei der nat rlich sprachlichen SAMMIE V
68. disoriented about the present system state Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 74 96 The Subjects who participated already in the baseline study often stated spontaneously an increased performance of the present systems as compared to the baseline system This concerned particularly the recognition performance and speed of the systems Recommendations for further improvements concerned the extent of speech output and display the selection of items in the lists and the position of the iDrive button Concerning the preference of a system in the long run there was a slight preference of the C amp C system in the final questionnaire SAMMIE 48 C amp C 52 In the intermediate questionnaire the difference was even more pronounced SAMMIE 45 C amp C 60 This is an unexpected result because the C amp C system was meant as a reference system for the SAMMIE system It can presumably be attributed to the better system performance of the C amp C system concerning speech recognition Possibly the better orientation along the menu with C amp C is another reason for it A change to the iDrive operation was easier with C amp C since the Subjects were always up to date with the display Concerning the advantages of the natural speech input with SAMMIE as compared to the manual iDrive inputs the safety aspects dominated as in the baseline system no averting glances The possibility
69. driving quality score for the system averaged over scales tasks and Subjects means standard deviations 5 0 4 0 3 0 2 0 1 0 soe Driving quality score positive Kee e SAMMIE pg C amp C negative T N a 5 es ot oi ot wegl aoi dl p de 2 Figure 33 Driving quality score for the system and quality scales averaged over tasks and Subjects Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 40 96 3 7 Mental load After each task the Subjects had to specify their mental load Beanspruchung on a 5 point scale 1 no mental load 5 strong mental load It represents an overall score for the load given by driving and task The following Figure 34 shows the overall mental load score for both systems averaged over tasks and Subjects In Figure 35 the scores are displayed additionally for the different tasks Overall mental load 5 0 4 0 3 0 2 0 1 0 SAMMIE tasks 1 10 C amp C tasks 1 5 9 10 Figure 34 Overall mental load score for the systems averaged over tasks and Subjects mean standard deviation Mental load 5 0 e SAMMIE tasks 1 10 4 0 a C amp C tasks 1 5 9 10 3 0 2 0 1 2 3 4 5 6 7 8 9 10 Task Figure 35 Mental load score for the syst
70. e innovative dull captivating undemanding challenging ordinary novel unpleasant pleasant ugly attractive disagreeable likeable rejecting inviting bad good repelling appealing discouraging motivating t D i a 2 4 Project part A Project part B Figure 76 Mean values of the AttrakDiff word pairs for products Full SAMMIE project part A and C amp C project part B Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 70 96 5 Summary 5 1 Objectives The objectives of the evaluation study were to find out e the usage of the multimodal systems e the efficiency of the dialogue e the acceptance of the systems e the efficiency of the speech system e influence onto driving quality 5 2 Methods The experimental set up for the user test comprised the experimental car BMW 335 including the iDrive button a MP3 system the full and non adaptive SAMMIE system as well as the Command amp Control C amp C system and a video system including two cameras for the Subject and the traffic scene An experimenter and a supervisor controlled the experiment and recorded the data The resulting experimental course was 34 5 km long for the SAMMIE run and 19 km long for the C amp C run and was driven within 35 40 min and 20 25 min respectively The streets had two lanes with few or medium traffic or four lanes with medium or dense traffic There were sp
71. e distraction from driving while operating the SAMMIE systems provocation of driving errors e Avoiding approach roads and traffic lights within the task segments as far as possible no forced interrupts at approach roads no task completion in standing car e Some structuring for setting task begin and end marks The resulting experimental course is shown in the next table and figure The distance was 34 5 km for the SAMMIE run which was shortened to 19 km for the C amp C run s below A typical Subject needed about 35 40 min to drive the SAMMIE run and about 20 25 min to drive the C amp C run The task segments of the course had two lanes with few or medium traffic or four lanes with medium or dense traffic The task segments had not more than two approach roads within task 5 and no traffic lights within the task processing Mostly there were speed limits to 70 80 or 100 km h which changed in the majority of segments Within the pre tests the distances for the tasks were chosen to allow the complete performance of a task when no hard driving or dialogue problems occurred Since the criteria of having as much tasks as possible was more important than enabling long task performance times several tasks followed close to each other and or had a rather limited performance time particularly tasks 1 5 and 10 As consequence task 2 was started immediately after the ending of task 1 even if the mark traffic sign Oststadt had not be
72. e options whereby several options could be checked The following Figure 63 represents the answer frequencies concerning advantages and disadvantages of the natural speech input with SAMMIE The safety aspects dominated as in the baseline system no averting glances The possibility to formulate freely and the new technology were pronounced much more frequently than in the baseline study Looking for formulations was pronounced nearly as often as in the baseline study That can be interpreted either by a still missing acceptance of the still restricted formulation freedom or by the instruction to formulate in whole sentences There was a considerable decrease of number of Subjects who objected to the longer inputs 3 Advantages of SAMMIE speech input 10096 85 8199 m SAMMIE 5 80 Baseline 2 a 60 5 40 2096 096 And vo ee 4 Disadvantages of SAMMIE speech input 100 m SAMMIE 80 77 Baseline 2 67 5 ZS 60 55 5 40 20 0 x AN ooo owe aere ex Yo Figure 63 Answers to the questions 3 4 Which of the following aspects represented advantages disadvantages of speech input for you in relation to the manual input The following Figure 64 represents the answer frequencies concerning advantages and disadvantages of manual i
73. e system 3 20 Subjects were originally planned for the evaluation study The 21 Subject was included as a reserve The standard sample of BEF ensures safe driving reliability and some kind of a sophisticated expressiveness Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 14 96 Not more than 11 Subjects of the baseline study met the conditions and were at disposal le 11 Subjects already participated in the baseline study 6 Subjects participated in other BEF studies e g in the VICO field study The average age was 36 2 years with a range from 20 to 56 Relatively many Subjects had a technician background which means here engineer or software specialist 9 This has to be handled as a bias within the experiment Most Subjects had an actual driving experience of at least 7000 km year and assessed themselves at least averaged experienced rating scale 1 5 with 5 maximal experience Phan eese eae 46 me 1 ER Hol Baseline female 36 yes 1 9000 4 8 Basetine mae 50 yes 1 2506 5 095 Baseline male 51 yes 15090 a 6 Hof VICO female 47 no 10000 5 8 Ben other male 56 no 1 4 9 Ose other male 35 1 1500 4 10 Rau new female 23 no 1 10000 3 11 Ros Baseline female 39 no 2 2500 3 12 Beh Baseline male 38 yes 2 20000 4 1
74. e wollen also ein St ck der Musikrichtung Swing von Michael Buble und es anh ren Selbstst ndiger Dialog SAMMIE 1 Kommando Sie k nnen nun das System selbstst ndig nach eigenem Wunsch bedienen Bitte probieren Sie Funktionen beliebig aus Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 80 96 6 Aufgabe SAMMIE 1 Kommando multimodal l sen Entfallt F gen Sie bitte das Lied 99 Luftballons im Album Leuchtturm von Nena zur Playliste Autofahrt hinzu Also das Lied 99 Luftballons im Album Leuchtturm von Nena in die Wiedergabeliste aufnehmen Selbstst ndiger Dialog SAMMIE 1 Kommando Sie k nnen nun das System selbstst ndig nach eigenem Wunsch bedienen Bitte probieren Sie Funktionen beliebig aus 7 Aufgabe SAMMIE 1 multimodal l sen Kommando Entf llt Finden Sie heraus ob es das Lied Yesterday auf dem Album Number One Hits von den Beatles gibt Sagen Sie es mir und falls ja spielen Sie es ab Ist das St ck Yesterday von den Beatles auf dem Album Number One Hits eventuell anh ren 8 Aufgabe SAMMIE 1 Kommando Mit Spracheingabe l sen Entfallt Bitte erstellen Sie eine neue Playliste Sie wollen also eine neue Wiedergabeliste anlegen 9 Aufgabe SAMMIE 1 multimodal l sen Kommando
75. ects achieved a higher TCR with SAMMIE than with C amp C Those Subjects relied more on speech input and accomplished tasks more frequently with fewer turns and somewhat faster With the SAMMIE system 4 9 turns and with the C amp C system 5 4 turns were necessary on the average to complete a task the difference being significant Considering the complexity of most of the tasks this seems to be an acceptable level With the SAMMIE system however there were not more than 0 5 turns less than with the C amp C system This is also due to the fact that subjects frequently did not use the direct and shortest dialogue path In addition the number of necessary iDrive actions are independent from the respective system There was a tremendous difference of number of turns between the tasks Much more turns were necessary to perform tasks with more parameters or and where the system performance was lower than else In all tasks more turns occurred than the minimum number necessary to fulfill the task which was very pronounced with the SAMMIE system The average task duration with SAMMIE and C amp C took about 40 50 s The minimal task durations were about 10 s 12 s The comparable tasks in the baseline study however took clearly longer For both number of turns and task duration there were no very prominent differences of the results with regard to MP3 experience But MP3 experienced Subjects were somewhat faster with SAMMIE according to thei
76. eed limits between 70 and 130 km h A sample of 21 Subjects was recruited Essential requirements for the participation were some or much experience with MP3 hardware or software and participation in the TALK baseline evaluation study if possible They were safe driver without strong dialect The age was limited to the young and middle age group The basic principles for the tasks were to use a considerable number of tasks from the baseline study and covering the performance of the SAMMIE system A sample of 10 tasks was chosen with browsing playing back and information functions as well as playlist functions The study was conceived as critical experiment The main variable was the multimodal interaction system The Full SAMMIE system was the main system The C amp C system was used as a reference system as well as the baseline system The Non Adaptive NA SAMMIE system was presented at the end of the session to get a comparison to the Full SAMMIE system The SAMMIE and C amp C system were balanced across Subjects to get a fair comparison in respect to traffic situation order and learning effects A further balance between low and much MP3 experience and between day times was included After the preparation with the setting up of all devices the Subject was successively introduced into car functions the MP3 and the interaction systems including several video clips Within the two test runs the experimenter gave the tasks at the specific ma
77. el input modes with a free choice of modality at any time Keep most of the features e g free access to any menu levels by speech back function etc Optimise all acoustic signals in respect to a clear differentiability between microphone opening and closing Speech input Further improve speech recognition and language understanding performance Either Improve the grammar by e g extending the coverage Do not claim a natural language system if a lot of common German expressions are not covered gt Or reduce the vocabulary grammar to a very limited one and provide a user manual Make the automatic opening of the microphone configurable In favour of a consequent user driven concept for each single speech input an activation of the PTT button should be provided Allow a verification dialogue for low confidence understandings iDrive Reposition the iDrive device in the centre console more to the front Mark the possible actuations on the iDrive device Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 77 96 Speech output V Keep the concept of barge in by PTT button Possibly extend barge in concept for modality changes Reduce amount and length of speech output Do not read lists when not explicitly requested by the user Provide a button to switch off speech output completely so that the user is free to have speech output or not D
78. ems and tasks averaged over Subjects 5 During driving no mental load is not possible The lowest level was meant as Minimal mental load not more than by driving without additional tasks This was explained to the Subjects 37 Since it was asked immediately after a task there were no recollection effects and it can be assumed to be a reliable and consistent score Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 41 96 The mental load was on a generally low level of about two which can be translated into strain somewhat above minimum There was no difference of mental load between systems Asked about the reasons for scores 2 3 the Subjects explained with e Operating MP3 system within a demanding traffic situation e dialogue and speech recognition problems e searching in lists The mixed demand of driving and operation is presumably an essential factor not depending on the system The processing of tasks with a good progress and without serious driving or operation problems were generally not assessed to be demanding Here a score of 1 was very often specified The iDrive functionality was identical in both modes With the SAMMIE system the thinking about the formulation or reformulation after rejections was felt to be straining by many Subjects s chapter 4 2 Additionally there were clearly more rejections and false reactions with SAMMIE s chapter 3 5 With the C am
79. en reached yet The traffic density was low in the course segments of tasks 3 4 6 and 7 whereby the mostly two lanes were rather narrow Altogether there was a pronounced load either by oncoming traffic on narrow streets or by more traffic at higher speed Very high loads were not given e g by much traffic on curvy roads or by very high speeds In some course segments the Subjects were free to operate the systems autonomously Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 12 96 Time km Task 00 00 0 Start Express highway S dtangente 01 00 1 2 Task begin 1 Task Albums Task end 02 00 2 3 Task begin 2 Task Song Der Weg Task end turning right Main road B3 04 00 4 3 Task begin Traffic signs etc Station parking After approach road to express road several traffic signs 80 etc traffic sign Oststadt traffic sign Oststadt several signs 80 etc passing several exits end of express highway gt B3 After approach road to B3 3 Task Playlist Pur Klassik traffic sign 80 Task end 06 00 6 4 Task begin 4 Task Live by Pur Task end 07 00 8 5 Task begin 5 Task Swing song 2xturning to the right Task end traffic lights After traffic lights several traffic signs free etc Hedwigshof After Hedwigshof several traffic signs 70 etc 2x approach roads several traffic signs free etc traffic light traffic sign Rastatt 2 4
80. enden den Versuchsaufbau der Fahrsimulation sowie das TALK Display OAAMp3PlayerGUIAgent Album Kuenstler 2 Die Gr ten Hi Falco Rap 13 Panik Panther Udo Lindenberg Rock Transit Rock 5 Sister King Ko Udo Lindenberg Rock 6 Mensch Herbert Gr ne Pop 7 BallPomp s Lindenberg Rock Die Antwortoptionen der meisten Fragen sind mit 6 K stchen gekennzeichnet die z B von sehr gut bis sehr schlecht reichen Bitte entscheiden Sie sich bei diesen Fragen f r genau ein K stchen nicht mehr und nicht dazwischen ankreuzen Bei anderen Fragen deren Antworten mit Kreisen gekennzeichnet sind k nnen Sie mehrere Antworten ankreuzen Bei den offenen Fragen die mit Linien versehen sind sind keine Antworten vorgegeben Hier k nnen Sie frei formulieren aber bitte so kurz und b ndig dass der Platz ausreicht Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 91 96 Vergleich der Eingabeverfahren 1 Welche Eingabeart w rden Sie mit mehr bung als heute wohl auf Dauer verwenden Bitte nur ein Kreuz Falls Sie bei der Fahrsimulation TALK dabei waren haben Sie die Auswahl zwischen allen drei Optionen ansonsten zwischen den oberen beiden SAMMIE System Versuchsfahrt Kommando System Versuchsfahrt TALK System Fahrsimulation 2 Wie gut waren die Systeme im Vergleich zu den jeweils anderen Systemvarianten zu bedienen
81. eneral impression about system 60 50 2 9 40 T SAMMIE 23 30 m C amp C 8 amp Baseline o 2096 42 10 SAMMIE 75 70 0 Baseline 61 very very good bad Figure 39 Answers to the question 1 How is your general impression about the entire operating system The questions in the baseline study however was exclusively as a final questionnaire 30 There is a discussion in the literature about scales with even and uneven scales The present even scale urges the Subjects to give their opinion with some rating tendency avoiding the tendency to the scale centre on uneven scales Subject 1 is excluded from the data of the intermediate questionnaire SAMMIE and C amp C because the questions were allocated differently to intermediate and final questionnaire after her session Subjects 14 and 21 are excluded from the C amp C data of the intermediate questionnaire because they partly operated with the wrong system in the C amp C trial Subjects 2 and 15 were included though having had the non adaptive system because the system outputs of the NA system and the C amp C were identical So 20 Subjects were considered with SAMMIE and 18 Subjects with C amp C The overall data at the right side of the following figures represent the summarized results which are normalized to 0 100 This was done by weighting the answer categories from to 6 and then scaling the range from 0 to 100
82. enseitige St rung von Spracheingaben und menschlicher Kommunikation Ger uschen O Sonstiges 5 Wie beurteilen Sie die M glichkeit mit dem nat rlich sprachlichen SAMMIE System relativ frei zu formulieren im Vergleich zur Verwendung von Kommandoworten viel besser viel schlechter Manuelle Eingabe Bitte denken Sie bei der Beantwortung der folgenden beiden Fragen an die Aufgaben wo Sie in einer der beiden Fahrten das manuelle Bedienteil benutzt haben 6 Stellten ein oder mehrere Aspekte der folgenden Liste f r Sie pers nlich Vorteile der manuellen Eingabe im Vergleich zur Sprachbedienung dar O Bet tigung mit der Hand Ich kann etwas greifen Korrekte Reaktion des Systems Es macht genau das was ich will O Ausw hlen aus einer Liste Drehen des Knopfes Dr cken O Sonstiges 7 Stellten ein oder mehrere Aspekte der folgenden Liste f r Sie pers nlich Nachteile der manuellen Eingabe im Vergleich zur Sprachbedienung dar O Bet tigung mit der Hand Ich muss die Hand vom Lenkrad wegnehmen Suchen des manuellen Bedienteils mit der Hand Suchen des Bedienteils mit den Augen Blickabwendung vom Verkehr Zuordnung der einzelnen Bet tigungsarten Drehen Schieben Dr cken zu den Funktionen Cursor verschieben Wiedergabefunktionen ausw hlen etc O Sonstiges Multimodale Bedienung Bitte denken Sie bei der Beantwortung der folgen
83. eptable level One task performed with SAMMIE and C amp C took about 40 50 s on the average The minimal task durations were about 10 s 12 s The comparable tasks in the baseline study however took clearly longer Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 6 96 Driving quality and mental load About 2 5 driving errors per minute occurred without a pronounced difference between the systems SAMMIE and C amp C Lane departures and low speeds were the most frequent driving errors and can be attributed to the visual distraction when observing the display The subjectively judged driving quality was nearly equal for both systems which confirms the objective driving quality results A comparison with the Baseline system is not applicable The mental load was on a generally low level of about two scale 1 5 There was no difference in mental load between the systems Higher scores resulted from operating the MP3 system within a demanding traffic situation and in the context of dialogue or speech recognition problems Modality preference Basically the multimodal combination of speech and manual input was extensively used At the beginning of a task there was a very clear preference for speech input with both systems With ongoing interactions while performing the tasks there was a clear reduction in speech preference MP3 experienced Subjects tended to use speech more than the le
84. ersion die Spracheingabe zu nutzen in Fallen bei denen Sie auch manuell mit dem Bedienteil hatten eingeben k nnen 11 Welche Gr nde hatten Sie bei der nat rlich sprachlichen SAMMIE Version daf r das manuelle Bedienteil iDrive zu nutzen da Sie ja auch per Spracheingabe h tten eingeben k nnen 12 13 Welche weiteren Funktionen w rden Sie gerne mit dem multimodalen nat rlich sprachlichen SAMMIE System im Fahrzeug nutzen Navigation dynamische Zielf hrung Restaurant Hotelreservierung O SMS O Terminkalender O Radio O Telefon O Kassette CD Spieler Verkehrsinformation O Internetzugang O Sonstige Haben Sie noch Bemerkungen zur multimodalen Bedienung mit dem nat rlich sprachlichen SAMMIE System wahrend des Fahrens d h zur Bedienung mit beliebigem Wechsel von sprachlicher und manueller Eingabe Falls Sie bei der Fahrsimulation dabei waren k nnen Sie auch mit der damaligen kombinierten Eingabe vergleichen 14 Bitte berdenken Sie jetzt noch einmal den gesamten Versuch Wenn es noch Aspekte aller Bediensysteme gibt die Ihnen aufgefallen sind zu denen Sie aber noch nicht befragt wurden dann erl utern und beurteilen Sie sie bitte hier Also erg nzende Bemerkungen zum Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 94 96 SAMMIE System im Stand zum SAMMIE System bei der Fahrt und zum Kommandowort System bei der Fahrt
85. ew times 16 Preference was measured on the basis of turns and the most effective modality E g when a Subject operated 4x successfully by speech and 2x successfully by iDrive the preference was set to speech When a Subject started with 3 more or less unsuccessful speech inputs and ended up with 2 successful iDrive inputs the preference was set to iDrive Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 20 96 shows the results of the tasks which are averaged over the Subjects and the selected tasks The right side shows the results of the free interactions SAMMIE two periods C amp C one period Overall preferred modality complete task m SAMMIE 100 gm C amp C 90 80 70 tasks 4 5 9 10 Speech iDrive Speech iDrive tasks tasks free free Figure 9 Overall preferred modality in selected tasks averaged over tasks and Subjects When compared to the previous figure there is a pronounced reduction in speech preference within the ongoing interactions during a task The rejections and false reactions of the systems led to changes to iDrive mode where the Subjects were sure to get the tasks done Sometimes a long cumbersome speech interaction was followed by a short successful iDrive interaction The obviously fretful Subjects changed to iDrive eventually For the tasks in the
86. g in noisier speech signals for the speech recognition and language understanding e The conditions for task completion were more restrictive in the final evaluation as the tasks were linked to fixed segments of the experimental course i e tasks were considered as failed 1f not successfully completed within the given course segment Following is a summary of the main results for the final SAMMIE evaluation Task completion The task completion rate TCR reached a level of about 80 This has to be interpreted as a general high level considering the partly tight time and driving conditions The tasks with SAMMIE were completed somewhat but not significantly more frequently than the tasks with C amp C The SAMMIE TCR was about 6 above the baseline TCR Considering the different conditions of the present study with a tighter schedule for the tasks to be performed this is a clear advantage of the SAMMIE system over the Baseline system Often a combination of understanding dialogue and system problems was the reason for not completed tasks particularly by less experienced Subjects Dialogue efficiency Frequently the users did not choose the direct and shortest dialogue and they took a considerable number of iDrive actions Significantly more turns on average were necessary to complete a task with the C amp C system 5 4 turns than with the SAMMIE system 4 9 turns Considering the complexity of most of the tasks this still seems to be an acc
87. g of the microphone was indicated by slightly different acoustical signals and a large green red microphone icon on the display With an additional button the dialogue could be interrupted optionally The MP3 display SAMMIE display showed the MP3 elements and the list of artists songs albums etc s Figure 3 The iDrive button allowed several operations Turning 2 directions pushing 1 direction and shifting 4 directions Turning induced scrolling of the cursor and pushing activated the pronounced item Shifting upwards led to a higher menu level or another former display presentation Shifting downwards paused the playing song Shifting to the left or right side changed to the preceding or next song The Subject camera recorded the Subject including his body motions and manual activities The scene camera recorded the traffic scene The split screen displayed the Subject the traffic scene as well as the MP3 display together with the actual date and time The split image was recorded by a VHS video recorder The time of the laptop the video recorder and the extra clock for the supervisor were synchronized to get a uniform time base The supervisor was sitting beside the Subject and was guiding through the course He monitored the driving safety by observing and warning in potentially dangerous situations Moreover he noted some essential data rough task times chosen modalities task completion and identified most of the driv
88. gue sequences of the operation including rejections After the instructions to the SAMMIE system the Subject trained with the system of the next run with an unstructured sequence and contents of the exercises Basic functions like searching for albums songs playing back songs including songs to a playlist were included The trainings the two test runs and the completion of the intermediate questionnaires were conducted one after the other with short pauses in between The experimenter gave the tasks at P The invitation letter already contained an overview of the experiment Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 18 96 the specific marks on the course The Subject signalised the finishing of a task i e that he did not intend to continue task processing in any way If a task was not completed within the given segment it was broken off at the equivalent mark The Subject was free however to stop earlier if he would do so in real live He was asked for his mental load on a 5 point scale score between 1 7 not stressed at all and 5 very much stressed The supervisor showed the way and supervised the driving with a possibility to intervene verbally During the test segments he identified the driving errors and signalised them by gestures to the experimenter After each run the experimenter and the supervisor evaluated the driving performance on standardized scales independen
89. h a direct connection between input device and display With SAMMIE there was a similar behaviour in relation to modality choice as compared to the TALK baseline study even if there was a tendency speech input to be used and preferred somewhat more frequently The TCR results of the present study were on a level of about 80 This has to be interpreted as a general high level considering the partly tight time conditions The tasks with SAMMIE were completed somewhat but not significantly more frequently than the tasks with C amp C The SAMMIE TCR was 6 above the baseline TCR Considering the possibility of 5 attempts within the baseline study as compared to usually less attempts that were possible within the course segments of the present study this is a clear advantage of the SAMMIE system over the baseline system Actually many tasks were completed rather quickly often with the minimal number of turns Without the helps of the experimenter however a lower TCR would have been yielded The helps of the experimenter concerned the repetition of the parameters 1096 of the tasks and more substantial helps explanation of a task loudness etc 1096 of the tasks Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 72 96 The reason for not completed tasks often was a combination of understanding dialogue and system problems particularly by Subjects with less MP3 experience Experienced Subj
90. hat the word Beatles was often not understood and the verbal specification of the four elements artist album song play back within a relative short driving segment led to relatively many rejections Following example illustrates a missed task completion for task 2 Subject 17 SAMMIE task 2 6 speech turns 2 iDrive turns t 1 35 min 4 rejections 2 driving errors TCR 0 Subject 17 SAMMIE task 2 TTS Output W hlen sie einen der folgenden Men punkte Wiedergabelisten Interpreten Alben Titel Musikrichtungen Mit dem TTS Output STOPPED per Bargeln ISPLAY Kontextpanel Alben TTS Output Es gibt 29 Alben TTS Output Die ersten 6 werden auf dem Bildschirm angezeigt ISPLAY Kontextpanel Alben von Herbert Gr nemeyer TTS Output 4 Alben von Herbert Gr nemeyer gefunden TTS Output Das habe ich leider nicht verstanden TTS Output Nennen Sie den Namen eines angezeigten Albums um iDriveKeyEvent DOWVN UP wurde ausgel st 4x iDriveKeyEvent SPACE wurde ausgel st Output Markieren Sie einen Titel und sagen sie abspielen um Table 5 Example for a missed task completion Subject 17 SAMMIE task 2 The following Figure 15 shows the perceived TCR for the systems and the MP3 experience levels averaged over tasks and Subjects of the subgroups One result here is the not very distinct difference between Subjects with different MP3 experiences Even persons with few knowledge concerning MP3 systems a
91. he SAMMIE system was revised and restructured using the data collected during the evaluation of the baseline system Additionally a 2 grammar for the Command amp Control C amp C like system was developed When comparing the results we have to keep in mind that the evaluation of the SAMMIE and the C amp C system was carried out in the running car with a far talk microphone while the baseline system was evaluated in the lab with a headset Thus the different acoustic environment has a prominent influence on the recognition performance The figures Figure 36 and Figure 37 below give an overview of the speech recognition performance metrics for the SAMMIE system and for the C amp C system They show the most relevant error rates of the speech recognizer with mean values over all tasks and test subjects and the corresponding standard deviation interval Full SAMMIE Average Error Rates 100 90 80 70 68 99 60 52 39 50 7e 47 71 40 14 32 03 31 13 40 30 20 T out of grammar sentence error rate word error rate sentence errorrate word error rate concept errorrate out of vocabulary rate overall overall in grammar in grammar in grammar rate Figure 36 Speech recognition error rates for the Full SAMMIE system averaged over subjects and
92. hend zu umfangreich 26 Haben Sie noch Bemerkungen zu den Displaydarstellungen Falls Sie bei der Fahrsimulation dabei waren k nnen Sie auch mit den damaligen optischen Anzeigen vergleichen Bitte nehmen Sie zu den folgenden Aussagen ber das System Stellung Es handelt sich also um Aussagen denen Sie mehr oder weniger zustimmen oder nicht zustimmen sollen Ab hier handelt es sich um Skalen mit 5 Optionen 27 Es war einfach f r mich zu verstehen was das System sagte stimme vollkommen zu stimme gar nicht zu 28 Es war einfach die Informationen zu bekommen die ich wollte stimme vollkommen zu stimme gar nicht zu 29 Ich wusste zu jeder Zeit im Dialog was ich sagen oder machen kann stimme vollkommen zu stimme gar nicht zu 30 Das System funktionierte in der Weise wie ich es von ihm erwartet habe stimme vollkommen zu stimme gar nicht zu 31 Ich denke ich w rde das System zuk nftig gerne nutzen stimme vollkommen zu stimme gar nicht zu N chste Seite Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 88 96 32 Haben Sie noch Bemerkungen zum SAMMIE Bediensystems Falls Sie bei der Fahrsimulation TALK dabei waren k nnen Sie auch das damalige TALK System einbeziehen
93. here is a big difference in WER and SER for Full SAMMIE and C amp C when compared to the Baseline system especially when referring to the in grammar utterances The obvious reason for the degrading speech recognition performance is the noisy car environment and the usage of a far talk microphone compared to the Baseline lab environment with the subjects wearing a headset Secondly we see a big difference in out of grammar rates between the C amp C and the Full SAMMIE system As already pointed out the subjects were advised to use only specific command words and displayed items respectively while operating the C amp C system The Full SAMMIE however claims to enable natural language input so the subjects could use their own wording with only little indications by the experimenter However the results show that this freedom is obviously not sufficiently supported by the coverage of the grammar On the other hand there is an additional effect which qualifies the high WER and SER for the systems that enable natural language input The concept error rate is in general significantly lower than the sentence error rate i e the semantic information issued to the dialogue manager often is correct even if some words have not been recognized Here this can only be proven for the in grammar data but it can be assumed for the out of grammar utterances as well For the C amp C there is no difference between sentence error rate and concept error rate due to the short co
94. ing errors which he signalised to the experimenter for registration He noted all relevant times and events on an experimental sheet The experimenter was sitting behind the supervisor She supervised the experimental set up announced the tasks and activated F keys on the keyboard to stamp the exact times of task beginning and ending Moreover she registered the driving errors by means of a data recorder communicator In the event of system crash or hang up she activated a reset ces iDrive is used as a synonym for ergocommander Dangerous situations occurred very rarely and accidents could be avoided easily by this additional control Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 9 96 Experimental set up SAMMIE EE Video Loud microphone bs microphone 946 speaker 3 7 Clock camera Subject Mp3 Display Supervisor Data recorder Experimenter SAMMIE display Video SAMMIE recorder system Figure 2 Operating the PTT and iDrive button Version Final 1 1 Distribution public IST 507802 TALK Welcome Mp3 Steuerung 2 Interpreten Alben 4 Titel 5 Musikrichtungen Titel des Albums Mensch Titel Neuland Der Weg Viertel Vor Lache Wenn Es Nicht Zum Weinen Reicht Unbewohnt Blick F rirk Wiedergabelisten gt Road Mix Titel Interpreten 2 Fragezeichen 3
95. ive respectively three times for C amp C speech input The reasons were the felt potentials of speech input like low distraction easy operation comfort etc Another reason could be the novelty of speech input With ongoing interactions during a task processing there was a clear reduction in speech preference The rejections and false reactions of the systems during speech interaction led to changes to iDrive mode where the Subjects were sure to get the tasks done Sometimes a long cumbersome speech interaction was followed by a short successful iDrive interaction For the tasks in the SAMMIE mode there was still a considerable preference for speech input even during the ongoing task performance Speech input was exclusively used in almost 60 of the tasks For the tasks in C amp C mode however there was a balance between the preferred modalities during the ongoing interactions Within free interaction periods however iDrive was used relatively often more frequently than in the mandatory tasks This can be a hint that the experimental situation affected the modality choice MP3 experienced Subjects tend to use speech more than the less experienced Subjects and vice versa for iDrive This younger group took more advantage of the natural speech interaction mode By this they achieved a higher task completion rate TCR with SAMMIE than with C amp C The older group with less MP3 experience relied more on the well known manual operation wit
96. lanes several approach roads yield right of way medium traffic Characteristics Permitted speed 4 lanes straight on much traffic 4 lanes straight on much traffic 2 lanes wide curves few traffic 2 lanes straight on few traffic 2 lanes 4 lanes several approach roads yield right of way medium traffic Country road L506 14 00 15 2 Task begin 6 Task 99 Luftballons After approach road to L506 several traffic signs 80 etc traffic sign 50 railway crossing 2 narrow lanes straight on few traffic 19 00 18 8 Free interaction Main road B36 Country road K3581 23 00 22 6 Task begin 7 Task Song Yesterday Task end 26 00 25 2 Task begin 8 Task New playlist Task end Turning to the right after approach road several traffic signs 70 etc roundabout tunnel traffic sign Light traffic sign 70 Traffic sign free several traffic signs 80 etc passing several exits traffic sign Karlsruhe 2 narrow lanes straight on roundabout tunnel few traffic 100 80 60 2 narrow lanes 100 70 wide curves medium traffic 4 lanes several approach roads yield right of way medium traffic l2cecleccleuleclcclcllccclolemclceceeuenececelcecececceeceue J Express highway Brauerstr 30 00 28 7 Task begin 9 Task Romeo and Julia Turning right Express highway Siidtangente 33 00 31 5 Task begin 10 Task Rock song Task end turning to the righ
97. mmands which are either right or completely wrong Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 45 96 4 SUBJECTIVE RESULTS 4 1 Intermediate questionnaires The Subjects filled out intermediate questionnaires after both runs s Attachment 3 and a final questionnaire with their own subjective evaluation at home s Attachment 4 They were urged to do it at the same or at the following day for reasons of recollection Most of the questions were identical to the final baseline study questionnaire for reasons of comparison and included mainly 6 point rating scales The following Figure 39 shows the frequency of the answers to the first question of each of the intermediate questionnaires concerning the general impression about the interaction systems 7 With the present systems by far most of the Subjects tended to a positive rating Ratings on the left side SAMMIE 90 C amp C 80 Summarized and normalized scores SAMMIE 75 C amp C 70 With the baseline system however there was a maximum nearby the centre of the scale I e there is a clear improvement concerning the subjective overall impression from the baseline to the SAMMIE systems SAMMIE and C amp C are different mainly at the highest score I e the general impression about SAMMIE was judged to be very good by 25 of the Subjects only by 5 about C amp C 1 G
98. multimodal interaction Though some Subjects tended to use the systems exclusively by speech or manually in some tasks all Subjects changed between modalities particularly when problems arose The multimodal systems allowed a faster and more efficient interaction with the MP3 system as compared to the baseline system and it was clearly more accepted It offered some kind of freedom so that even playing with the system either verbally or manually could be observed The multimodality however often served as a chance to avoid the respective other modality Changes from speech to manual input often occurred when system errors occurred There was a considerable progress of the SAMMIE system relating to the baseline system This concerns most of the objective and subjective results The most obvious improvements apply to the speed the TCR and the display It was very striking that the C amp C system was somewhat more preferred than the Full SAMMIE system This has to do with the system performance the tight connection of input to output and the possibility to enter single commands 1 to avoid looking for a formulation in a sentence A future system featuring natural language interaction should also allow for a C amp C like interaction Natural speech input seems to be coupled to a quite different inner model of the user in respect to formulating all wanted functions and parameters within one or few sentences A pure acoustical dialogue would be p
99. n Besonderheiten Es kann z B folgendes Display dargestellt werden Welcome Mp3 Steuerung Interpreten Musikrichtungen See kein Lied geladen Die Aufgaben werden wahlweise mit nat rlicher Spracheingabe oder manuell mit einem Bedienelement iDrive gel st Dies nennen wir multimodal Am Lenkrad befinden sich mehrere Tasten F r den Versuch sind lediglich die beiden inneren markierten Tasten auf der rechten Seite von Interesse Obere innere Taste auf der rechten Seite ffnen des Mikrofons f r die Spracheingabe Dabei wechselt die Mikrofonanzeige auf dem Display von Rot nach Gr n und es ert nt ein bestimmtes Gong Signal Das Mikrofon schlie t nach jeder Spracheingabe automatisch mit einem anderen Gong Signal Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 82 96 Untere innere Taste auf der rechten Seite Schlie en des Mikrofons bzw Beenden der Sprachausgabe Diese Taste k nnen Sie bei Bedarf benutzen Der groBe iDrive Knopf befindet sich rechts von Ihrem Sitz auf der Mittelkonsole Drehen Markierung eines Elements auf der angezeigten bzw vorgelesenen Liste Kurzes Dr cken Auswahl des Elements z B Spielen eines Liedes Langes Dr cken Aufnahme des ausgew hlten Liedes in die Playliste Nach rechts links verschieben Auswahl des n chsten vorigen Liedes Verschieben nach unten Stopp Pause des Liedes oder Albums Verschieben nach oben Zur ckgehen zur
100. n from driving s following Figure 41 Corresponding to the preceding figures there is a similar curve between the present systems apart from the highest ranking of not at all A certain distracting effect was felt by most of the Subjects with C amp C more than with SAMMIE as the maximums are near the scale centre The Pearson correlation coefficient of r 0 21 between the answers to this question and the individual driving errors is however not significant The SAMMIE and C amp C systems were assessed to be less distracting than the baseline system free run But again when comparing with the baseline evaluation results one should consider that the experimental setup was quite different because the baseline evaluation used a driving simulation 43 It is conceivable that there are still other correlations e g to the number of turns task duration etc which is beyond the scope of this report Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 47 96 3 No distraction from driving 8 40 e SAMMIE 5 mB C amp C 30 4 Baseline t 8 5 2096 10 SAMMIE 65 0 4 C amp C 59 not at very Baseline 46 all much Figure 41 Answers to the question 3 To which degree were you distracted from driving during operation As the next Figure 42 about the felt comfort reveals there was a
101. natural speech system SAMMIE there was partly a different order as compared to the baseline study Now most of the Subjects pronounced the more advanced functions like desk diary navigation and internet while in the baseline study infotainment functions were represented more frequently Besides the runs with the Full Adaptive SAMMIE and C amp C system six example videos were presented at the end of the session contrasting the Adaptive and the Non Adaptive variants of the SAMMIE system After each example a corresponding question related to features of the adaptive presentation strategy was asked to the subject While all other features of the Full SAMMIE were judged positively the usefulness of the personal differentiated addressing was scored rather negatively 3796 The feature of a differentiation between a visual and an acoustical presentation of items 7596 presentation of albums with artists 7896 and the implicit confirmation 8596 were judged positively The extended user guidance was basically regarded as positive 70 The step by step guidance was not totally accepted presumably because of the somewhat lengthy dialogue The feature of an adaptation to the user s vocabulary was judged very diversely 57 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 75 96 6 OUTLOOK The field test with different variants of the final In Car Showcase SAMMIE revealed an extensive use of the
102. nd structure could operate the systems to some degree While there is no pronounced difference between systems for few MP3 experience there is an obvious difference for much MP3 experience Experienced Subjects achieved a somewhat higher TCR with SAMMIE than with C amp C A further analysis shows that those Subjects relied more on speech input and accomplished tasks more frequently with fewer turns and somewhat faster s Chapter 3 3 This group with much MP3 experience had a mean age of 32 years while the other group few experience was about 41 years on the average It can be speculated that this age difference could have been an additional factor in respect to taking advantage of the less familiar interaction mode of natural speech input Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 26 96 Task Completion Rate objective subjective 100 80 60 1 40 20 much MP3 experience 0 SAMMIE tasks 1 10 C amp C 1 5 9 10 Figure 15 Task Completion Rate for the systems and Subjects MP3 experience averaged over tasks and Subjects of the subgroups The following Figure 16 shows the abortion of tasks by the Subjects themselves and the experimenter s helps for the two systems averaged over tasks and Subjects parameter Subject forgot a parameter else additional support e g not understood the task Since the experimenter s helps
103. nd tasks averaged over Subjects The previous Figure 14 shows the TCR for the systems of the present and baseline study for the individual tasks averaged over Subjects In tasks 1 and 2 the C amp C TCR was better than the SAMMIE TCR A further analysis shows that most of the Subjects with failures in these tasks started with SAMMIE and were not well experienced with MP3 systems As a consequence understanding dialogue and system problems were confounded These problems were less in the later tasks for these Subjects Particularly in tasks 4 and 5 the SAMMIE TCR was clearly superior to the C amp C TCR Those tasks belonged to the complex tasks with three information items e g task 4 album artist and play back With optimal performance of the SAMMIE system not more than one speech input should have been sufficient In practice two actions were at least necessary if the system reaction was correct e g task 4 a Spiele mir das Album Live von Pur b von Pur The performance increase from 40 for the baseline to 81 for the SAMMIE system was very striking for task 6 where 99 Luftballons was often not recognized in the baseline study 22 Tf all tasks 1 10 with SAMMIE would have been counted an equal TCR of 82 would result i e objective TCR 74 and subjective TCR 8 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 25 96 The completion of task 7 suffered from the fact t
104. nding than the real driving so that Subjects were more dependent on speech output Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 51 96 15 How helpful was speech output 60 50 40 A o SAMMIE 2 m C amp C 30 4 Baseline t 8 5 20 a 10 mean SAMMIE 65 0 4 64 very norat Baseline 81 much all Figure 49 Answers to the question 15 How helpful were the speech outputs for you The contents of speech output was assessed worse with C amp C system than with SAMMIE system Figure 50 This can be associated with the verbal listings of items which was not accepted by a part of the Subject sample 16 How good were contents of speech output 6096 50 8 40 e SAMMIE a C amp C 30 Baseline t b 5 20 SCH 1096 mean SAMMIE 69 096 C amp C 66 very very Baseline 74 good bad Figure 50 Answers to the question 16 How did you judge the contents of speech output The extent of speech output was often judged to be relatively good Figure 51 There was a slight tendency that Subjects felt SAMMIE speech output to be somewhat too extensive possibly because of more rejections and hints to the help system 48 The formulation of
105. nput Again the option correct system reaction was pronounced most frequently even more frequently than in the baseline study The easy choice from a list was pronounced next As an additional advantage the faster operation was noted by 5 Subjects Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 60 96 The main disadvantages were the safety aspects like eyes off road hands off steering wheel and searching by hand Since the first option was not included in the baseline study the related options searching by hand searching by eyes were presumably pronounced more frequently than in the present study As other disadvantages of the manual input similar aspects were noted like searching iDrive button and display cursor position of the iDrive button too far backward 6 Advantages of manual input 100 ore 80 m SAMMIE o Baseline 5 60 o 5 40 a 20 0 RO om e non ove 5 pac 619 c Disadvantages of manual input 10096 m SAMMIE Baseline 80 60 40 Percent Subjects 20 0 E es M 3 N M goat Figure 64 Answers to the question 6 7 Which of the following aspects represented advantages disadvantages of manual input for you in relation to speech input The following Figure 65 represents the answer frequencies
106. ns a balance between low and much MP3 experience was included Additionally a balance between session day times was introduced since traffic differed considerably over day time E g a similar number of Subjects with few MP3 experience started with SAMMIE as with C amp C at the early and the late afternoon So following experimental design was resulting To maintain the balance with experienced Subjects was difficult because of dating problems Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 17 96 morning early afternoon late few experience 3 1 SAMMIE 2 C amp C 1 SAMMIE 2 C amp C 9 1 C amp C 2 SAMMIE 1 C amp C 2 SAMMIE 1 SAMMIE 2 C amp C 1 C amp C 2 SAMMIE 1 SAMMIE 2 C amp C much MP3 experience 13 1 SAMMIE 2 C amp C 1 SAMMIE 2 C amp C 1 C amp C 2 SAMMIE 1 SAMMIE 2 C amp C 1 C amp C 2 SAMMIE 1 SAMMIE 2 C amp C 1 C amp C 2 SAMMIE Figure 6 Experimental design in terms of Subjects numbers on the left side of each box 2 6 Experimental realisation Following Figure 7 illustrates the experimental realisation The preparation concerned the setting up of all devices including the experimental vehicle the SAMMIE system the video recorders synchronizing all clocks etc The Subject was successively introduced through the explanation of main car functions several video clips with typical speech and iDrive examples and written instructions
107. nts as to modality as well free run see TALK deliverable D6 3 17 02 2006 chapter 4 1 At the beginning of a task there was a very clear preference for speech input with all systems At the beginning speech input was used 2 5 5 times more frequently than the iDrive One of the most important reasons for this result was less distraction from driving visually and manually as the statements of the Subjects revealed s chapter 4 2 Moreover for many Subjects especially for the technicians and young Subjects speech input seemed to be the more interesting mode which they could compare with earlier systems baseline etc Overall preferred modality first action 100 90 m SAMMIE m C amp C 80 Baseline 70 60 50 40 30 20 10 0 Speech iDrive Figure 8 Overall preferred modality of the first action averaged over tasks and Subjects The following Figure 9 shows the overall preferred modality for the SAMMIE and C amp C system considering the complete tasks The SAMMIE data include all tasks the C amp C data include only those tasks which were given without any constraints as to the modality The left side S Task 8 which was given only in SAMMIE mode could be performed exclusively by speech input The respective first action of task 1 3 in C amp C mode had to be done by speech input to ensure that the Subjects used speech input at least a f
108. o not announce very obvious system activities e g Die ersten sieben werden auf dem Bildschirm dargestellt A short tone is often enough for signalising a display output Do not refer to the incomplete help system Optical display v Keep the display basically as it is Leave out any unnecessary information particularly the picture of the albums and increase instead the size of the actual artist album and song or playlist Increase the graphics resolution Position the display centrally i e at or above the dashboard Signalise the pause status of the MP3 player optically Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 78 96 7 References 1 2 3 4 Tilman Becker Nate Blaylock Ciprian Gerstenberger Andreas Korthauer Nadine Perera Peter Poller Jan Schehl Frank Steffens Rosmary Stegmann Jochen Steigner In Car Showcase Based on TALK Libraries Deliverable D5 3 TALK project 2006 Andreas Korthauer Holger Banski Frank Steffens Hartmut Mutschler Peter Poller Evaluation of the Baseline System Deliverable D6 3 TALK project 2006 AttrakDiff website http www attrakdiff de M A Walker A Rudnicky R Prasad J Aberdeen E Owen Bratt J Garofolo H Hastie A Le B Pellom A Potamianos R Passonneau S Roukos G Sanders S Seneff D Stallard DARPA Communicator Cross System Results for The 2001 Evaluation ICSLP
109. o not well reproduce the number of turns task s Figure 21 I e the specific items formulations and dialogue context seem to be more important for the false reactions than the number of turns But the rejections reproduce the number of turns relatively well i e more turns resulted in a higher probability of rejections 2 2 2 0 False reactions task i men pn e SAMMIE 1 8 1 6 s C amp C 1 4 1 2 1 0 0 8 0 6 0 4 0 2 0 0 4 5 6 7 8 9 10 Task Figure 27 False system reactions per task for the systems and tasks averaged over Subjects Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 35 96 Rejections task 9 10 Task Figure 28 Rejections per task for the systems and tasks averaged over Subjects Concerning the rejections the critical tasks were mainly task 4 C amp C and 6 SAMMIE The number of rejections for each task was relatively balanced across Subjects i e there were no strong outliers Following example shows task 6 with one rejection with the task completion not before the following iDrive turns Subject 17 SAMMIE task 6 2 speech turns 2 iDrive turns t 0 32 min 3 driving errors TCR 1 The example is below the mean rejection rate Subject 17 SAM
110. often led to a successful performed task the TCR data s above have to be interpreted in the context with the help data For the SAMMY run there were relatively many helps in tasks 4 5 and 7 The reasons for that were the task complexity 3 4 elements the ambiguity two albums Live and the strange pronunciation of Michael Buble For the C amp C run there were relatively many helps in tasks 2 3 5 and 9 In task 2 the identical word Mensch as album name and as song name was somewhat irritating In task 3 the playlist name Pur Klassiker was misleading for several Subjects see above Without these helps a lower TCR would have been yielded Particularly the more complex tasks would have been solved less frequently within the given course segment than displayed in Figure 14 Abortion and experimenter s help 60 50 B SAMMIE tasks 1 10 m C amp C tasks 1 5 9 10 40 30 Percent Subjects 20 10 0 aborted by Subject experimenter help experimenter help parameter else Figure 16 Subjects abortion and experimenter s helps for the systems averaged over tasks and Subjects Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 27 96 3 3 Number of turns A turn is defined as a pair of a user s input and the corresponding system output With speech input in the SAMMIE mode a single utterance was theo
111. operation 3 Users with much MP3 experience achieve a higher operation efficiency particularly with a lower number of turns Users get a higher Task Completion Rate with SAMMIE than with C amp C Users are faster with the SAMMIE system than with the C amp C system The number of turns is higher with C amp C than with SAMMIE SAMMIE needs less iDrive actions The number of system errors with SAMMIE is only marginally higher than with C amp C SAMMIE distracts the user less from driving than C amp C 10 The SAMMIE system leads to a higher user acceptance than the C amp C system 11 Users can assess well what the system has understood AA Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 8 96 2 EVALUATION DESIGN 2 1 Experimental set up The basic components of the experimental set up were s Figure 1 and Figure 2 e Experimental car BMW 335 e SAMMIE system with microphones loudspeaker and iDrive e Two cameras for Subject and traffic scene e Split screen and video recorder e Additional electronics e Data recorder keyboard writing pad The exterior elements of the SAMMIE system were a microphone for speech input and the iDrive device for manual input In contrast to the baseline system the microphone could be opened by the user by means of activating the Push To Talk button PTT at the steering wheel or automatically by the system during the dialogue Opening and closin
112. or that a reference trial without any interaction tasks would be necessary including additional measurements e g of the eye movements The driving quality scores were calculated by averaging those of the experimenter and supervisor This subjectively judged driving quality of the Subjects was nearly equal for both systems which confirms the objective driving quality results As could be observed some Subjects drove very cautiously and relatively slowly during the complete session more or less independent from system and tasks no tasks They wanted to perform well and did not play with the MP3 system and the car Often they relied somewhat more on manual input by iDrive Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 73 96 Some other Subjects mostly the younger ones drove in a superior style played with the MP3 system and the car and operated often with speech input Those individual differences affected the driving quality more than the respective interactive system The mental load was on a generally low level of about two scale 1 5 There was no difference of mental load between systems Higher scores resulted from operating the MP3 system within a demanding traffic situation and dialogue or speech recognition problems The processing of tasks with a good progress and without serious driving or operation problems were generally not considered to be demanding With the S
113. ossible In cases of lists or system problems however falling back to the display is necessary and represents a rupture within the model With a command based system speech input goes along with the display presentations and allows an easy change to the manual Input Even if not verified within this experiment some kind of distraction from driving can be assumed A possible distraction can affect the lane keeping and speed While speech input per se is not very prone to distraction the coupled visual activities towards the display does Nevertheless a mere speech system without display would not be accepted The experimental conditions affected the results particularly when giving predefined tasks Most of the Subjects felt some time pressure and acted differently than else There may have been even the artefact to comply with assumed expectations of the experimenter e g Speech input is a relatively new interaction system Prefer it The free interactions showed that the Subjects behaved partly differently when choosing their own music and interacting with the system in their own possibly more known way In free interactions the iDrive was used as often as speech input This is a hint that the familiar manual input is still a well accepted input modality at least without a considerable familiarity with speech input Long term studies could show some changes in the interaction behaviour and the choice of modality Hypotheses Most
114. p C system the Subject was more bound to the menu and had to do more turns These factors seem to be more or less compensatively as to the subjectively felt mental load There were low mental load scores in tasks 1 and 8 both of which were usually done fast and with 1 2 turns task 1 either verbally or manually task 8 exclusively verbally The mental load curve over the tasks resembles mostly that of task duration but task 8 to a somewhat less degree to that of number of turns and rejections The task duration includes the effort concerning the turns as well as the driving situation and reflecting pauses The highest scores were given in tasks 6 and 7 each with four elements e g task 6 artist album song playlist on a relatively narrow two lanes road Both tasks needed most turns beside task 10 took longest and led to the most rejections and false reactions accompanied by one of the highest driving error scores Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 42 96 3 8 Speech Recognition Performance The evaluation of the baseline system already showed the important influence of speech recog nition performance on the evaluation of the system Improvement compared to the baseline system could however only be achieved by revising the grammar and tuning some recognition parameters as still the same speech recognition engine Nuance 8 5 was used The natural language grammar for t
115. p system The following Figure 25 and Figure 26 show the number of false reactions and rejections per task for the present and the baseline systems averaged over tasks and Subjects The SAMMIE bar includes all those tasks which were given in the baseline study too tasks 1 4 6 8 The baseline bar represents the TCR data averaged over the tasks of the free run i e with free modality choice 29 The C amp C includes all tasks which were given in the C amp C run tasks 1 5 9 10 There were as many false reactions with the SAMMIE as with the baseline system for the selected tasks On the average nearly each second of these rather complex task was affected by a false reaction of the system which irritated the user usually more than a rejection If considering all tasks then a mean of even 0 46 false reactions task resulted for the SAMMIE system If considering all those tasks which were given in the C amp C run 1 5 9 10 then a mean of 0 42 false reactions task resulted for the SAMMIE system There were considerably fewer false reactions task of 0 08 with the C amp C system The difference of false reactions SAMMIE C amp C is significant Wilcoxon Matched Pairs n 7 0 T 28 p 0 05 Overall false reactions task m SAMMIE gm C amp C Baseline tasks 1 4 6 8 tasks 1 5 9 10 0 39 SAMMIE C amp C Baseline Figure 25 False reactions for the systems a
116. r fewer turns A general impression was that the task duration was not a critical factor in cases when task processing progressed There were as many false reactions with the SAMMIE as with the baseline system but more than with the C amp C system On the average nearly each second task was affected by a false reaction of the system which irritated the user usually more than a rejection There was about one rejection task with the SAMMIE system which was fewer as compared to the C amp C and baseline system The rejections correlated with the number of turns i e more rejections corresponded to more turns The driving quality was measured by recording the driving errors online during the runs and by scoring the overall driving quality and normalizing it to one minute There was no pronounced difference of the mean number of driving errors between systems With some Subjects there were not more than occasional driving errors while others crossed the lane boundaries continuously during task processing Lane departures and low speeds were the most frequent driving errors More than one lane departure error per minute and about 0 7 speed too low errors seem to be relatively high and can be attributed to the visual distraction when observing the display The experimental car was relatively often overtaken even on the two lanes roads No definite statement however can be made about the effect of multimodal operation on driving safety in general F
117. r leicht sehr schwer Wie beurteilen Sie das Verhalten des Systems das Mikrofon w hrend eines Dialogs selbstst ndig zu ffnen unterst tzt sehr gut sehr verwirrend War es f r Sie verst ndlich wann Sie sprechen konnten immer sehr selten N chste Seite Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 86 96 11 12 13 14 15 16 17 18 19 20 Informationsausgaben Hier zun chst Fragen zur Informationsausgabe allgemein d h unabh ngig davon ob sie optisch oder akustisch erfolgten Wie hat Ihnen die Systemausgabe optisch und akustisch gefallen sehr gut sehr schlecht Wurden Sie bei Problemen im Dialog vom System unterst tzt sehr gut sehr schlecht Wie gut fanden Sie die Verteilung der Information zwischen Sprachausgabe und optischer Anzeige sehr gut sehr schlecht Hier Fragen zu den Sprachausgaben d h zu den Sprachansagen des Systems an Sie Wie hat Ihnen die Sprachausgabe gefallen sehr gut sehr schlecht Wie hilfreich waren f r Sie die Sprachausgaben sehr hilfreich berhaupt nicht hilfreich Wie gut fanden Sie den Inhalt der
118. retically enough to perform a task if the user included all parameters in one expression and if the system reacted correctly So the minimal number of turns with speech input was one if the operation was done exclusively by speech input If the dialogue was not optimal e g due to misrecognitions or the system needed additional information more than one turn was necessary E g in task 4 an additional choice between two artists was necessary With speech input in the C amp C mode as much turns as menu presentations were necessary For most of the tasks the minimal of number of turns was three four if the operation was exclusively verbally For tasks 1 and 3 less turns were sufficient one and two Basically one action with iDrive was counted as a single input if one system output followed E g pushing the iDrive controller down forward or backward together with the corresponding system output was counted as a single turn For the turnings an action sequence was counted as one turn when it was followed by one system output in the Log file So a quick turning of the iDrive controller over several raster points and the equivalent system response in the Log file was one turn Thus the mental user s model of what was one action was more or less modelled So with iDrive the minimal number of turns depended very much on the speed of scrolling and was not defined E g the lower limit of number of turns for the rather complex tasks 3 7
119. rks on the course The Subject signalized the finishing of a task If a task was not completed within the given segment it was broken off at the corresponding mark and the Subject was asked for his mental load Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 71 96 5 3 Objective Results The SAMMIE evaluation of the Final In Car Showcase revealed a number of results about the use and usefulness of different variants of the SAMMIE dialog system for the MP3 domain during real driving Detailed dialogue and driving performance data as well as subjective evaluation data about speech input and output manual iDrive input and about the display were collected The main concern was the usability and usefulness of multimodal interaction Basically the multimodal combination of speech and manual input was extensively used The users changed in about 30 60 of tasks from speech to manual operation and in about 15 30 from manual to speech operation The main reason for the first result are system errors or dialogue deadlocks where the user does not succeed to solve a task The main reason for the second result are functions where the user does not find the correct item in a list or does not recall the right manual action At the beginning of a task there was a very clear preference for speech input with all systems For the first action SAMMIE speech input was used five times more frequently than iDr
120. ss experienced Subjects and vice versa for iDrive Subjective ratings With the present systems by far most of the Subjects tended to a positive judgement of the multimodal interaction systems I e there was a clear improvement concerning the subjective overall impression from the Baseline to the SAMMIE systems the more so as the present systems were judged to be easier to use than the baseline system SAMMIE was assessed to be less distracting and more comfortable than the C amp C and Baseline system The decision for a certain modality and the change between modalities was easy for most of the Subjects This is an important result in favour of the concept of multimodality since a change between modalities at pleasure is easily possible Overall speech output and the display were judged relatively positively The information output however was not fully accepted with regard to liking support information distribution and assistance Concerning the dialogue there was a tendency to a positive judgment SAMMIE was generally better judged than C amp C We used statements from the COMMUNICATOR evaluations 4 to assess aspects of the dialogue The best scores got the statement concerning the understanding of what the system said Restrictions referred to the statements that it was easy to get the information which the user wanted and that the system worked as expected The Subjects who participated already in the baseline study often stated spont
121. sten nennen die sich nicht sichtbar weiter unten oder oben in der Liste befinden Bei Verst ndnisschwierigkeiten hilft evtl Bei Verst ndnisschwierigkeiten hilft evtl eine Neuformulierung der Spracheingabe eine erneute Spracheingabe Au erdem Au erdem k nnen Sie jederzeit also auch k nnen Sie jederzeit also auch w hrend wahrend einer Aufgabe auf die manuelle einer Aufgabe auf die manuelle Eingabe Eingabe bergehen und andersherum bergehen und andersherum F r beide Systeme gelten die folgenden Steuerbefehle Mit Weiter oder einem hnlichen Befehl bl ttert das System auf den n chsten vorigen Teil einer Liste hnlich dem mehrfachen Drehen des iDrive Knopfes Mit Zur ck oder einem hnlichen Befehl geht das System in eine der vorigen Darstellungen zur ck Dies entspricht oft dem Hochschieben des iDrive Knopfes Mit Hauptmen oder einem hnlichen Befehl geht das System in das Hauptmen analog zur Bet tigung der Taste Hauptmen Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 83 96 Sie k nnen den Dialog abbrechen und mit dem Hauptmen erneut beginnen Wenn Sie die weitere Bearbeitung einer Aufgabe f r ganz aussichtslos halten k nnen Sie ebenfalls abbrechen Wenn wir w hrend der Bearbeitung der Aufgaben an bestimmten Marken auf der Strecke angekommen sind dann fordert Sie der Versuchsleiter auf die Aufgabe abzubre
122. systems choice of modality e the dialogue efficiency Task Completion Rate number of turns dialogue times e the acceptance of the system questionnaires with subjective evaluation e the efficiency of the speech system false reactions rejections e The influence onto driving quality driving errors driving scores The main variable was the multimodal interaction system The Full SAMMIE system had to be compared to the Command amp Control C amp C system as the reference system as well as to the baseline system The Non Adaptive NA SAMMIE system should be included into the evaluation too For a more detailed description of the evaluated system variants and their specific features see deliverable D5 3 1 the results of the baseline system evaluation can be found in deliverable D6 3 2 The study was conceived as critical experiment I e hypotheses were defined on the basis of the results of the baseline study and other deliberations Moreover additional results were expected concerning the multimodality and efficiency of the SAMMIE system Essential aspects of the methods were e system variants experimental set up e experimental course e Subjects e evaluation tasks e experimental design realization e measurements questionnaires Following hypotheses were established 1 Users prefer speech input more with the SAMMIE system than with the C amp C system 2 Users with much MP3 experience tend to manual
123. t 37 00 34 5 traffic sign Skidding traffic signs 100 etc passing several exits Exit Wolfartsweier traffic sign 80 2 tunnels passing several exits Exit Hauptbahnhof Station parking Table 1 Experimental course with tasks segments and details 130 100 70 4 lanes straight on medium traffic 4 lanes straight on much traffic Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 13 96 Figure 4 Experimental course as map 2 3 Subjects A sample of 21 Subjects was recruited s Table 2 2 Essential requirements for the participation were Some or much experience with MP3 players or similar software Participation in the baseline evaluation study 1f possible Very safe driver Regular driving experience Capable to avoid any strong dialect Involved in former BEF studies if possible Knowledge of local roads if possible Skt oe ee No specific design with other Subject parameters was envisaged but a certain variance in sex and professional background was aspired not too much technicians The age was practically limited to the young and middle age group because of the conditions 1 and 3 As the following table shows there were 10 Subjects who had some MP3 experience 1 and 11 Subjects who had much MP3 experience 2 Much MP3 experience means Having already used an iPod or Using regularly an MP3 hardware or softwar
124. t Gr nemeyer 4 Wie hoch bewerten Sie den Nutzen der impliziten Best tigung sehr hoch sehr gering Videoclip Ich will ein Rock Lied 5 Wie hoch bewerten Sie den Nutzen der ausf hrlicheren Benutzerf hrung sehr hoch sehr gering Videoclip Nennen Sie mir alle K nstler 6 Wie hoch bewerten Sie die Anpassung des Systems an das Vokabular des Benutzers K nstler Interpreten sehr hoch sehr gering Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 96 96 Bitte beantworten Sie die folgenden Fragen abschlie end nachdem Sie alle Videoclips gesehen haben Vergleich der Eingabeverfahren 7 Stellen ein oder mehrere Aspekte der folgenden Liste f r Sie Vorteile des SAMMIE Systems Versuchsfahrt im Vergleich zur SAMMIE Variante Stand dar O O oo0oo0o0o o Pers nliche Ansprache Sie Du Unterscheidung zwischen Zeige gt optische Darstellung und Nenne gt akustische Darstellung Darstellung der Alben mit den Interpreten Implizite Best tigung Alben von Herbert Gr nemeyer Ausf hrlichere Benutzerf hrung Anpassung des Systems an das Vokabular des Benutzers K nstler Interpreten Sonstiges 8 Welche Eingabeart w rden Sie wohl auf Dauer verwenden Bitte nur ein Kreuz
125. t from the baseline study should be mentioned There were 2 0 lane departure errors per minute in the free run of the baseline study on the average Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 38 96 safety reasons very early Just one situation occurred where an accident was certainly prevented by the supervisor s warning Driving errors 1 min 2 0 1 8 SAMMIE tasks 1 10 1 6 s C amp C tasks 1 5 9 10 1 4 1 2 1 0 0 8 0 6 0 4 0 2 0 0 gi A9 gen ase A cous e o Qe sP spe asi gar yane Figure 30 Driving errors per minute for the systems and error categories averaged over tasks and Subjects means standard deviations Overall driving errors 1 min 4 0 3 5 25 au 15 8 1000 05 0 0 much MP3 experience SAMMIE C amp C Figure 31 Overall driving errors as a function of system and MP3 experience averaged over tasks and Subjects within subgroups The previous Figure 31 shows the overall driving errors for the systems and the different MP3 experience levels averaged over tasks and Subjects within subgroups As for TCR number of turns results and task duration see above there was no very strong difference between MP3 experience levels This can be a hint onto the possible fact that driving errors depends much more on the indi
126. t part B Study C amp C Evaluation data entered 18 21 pragmatic quality PQ Figure 75 shows the results for the dimensions pragmatic quality PQ hedonic Quality HQ identity T and stimulation S and attractiveness ATT For all dimensions the C amp C systems performs slightly better than the SAMMIE system This difference is however statistically not significant Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 69 96 Diagram of average values m Project part A e Project part B 3 gt D E E lt HQ I HQ S ATT Dimensions Figure 75 Mean values of the four AttrakDiff dimensions for the products Full SAMMIE project part A and C amp C project part B Word Pairs Adjectives Figure 76 shows the mean values of the word pairs Of particular interest are the extreme values These show which characteristics are particularly critical or particularly well resolved Description of word pairs technical human complicated simple impractical practical cumbersome straightforward unpredictable predictable confusing clearly structured unruly manageable isolating connective unprofessional professional tacky stylish cheap premium alienating integrating separates me from people brin n her unpresentable presentable conventional inventive unimaginative creative cautious bold conservativ
127. tem This is an unexpected result because the C amp C system was meant as a reference system for the SAMMIE system This could be attributed to the better speech recognition performance of the C amp C system Possibly the better orientation along the menu with C amp C is another reason for it 1 Which system preferred in the long run 100 80 60 40 Percent Subjects 20 0 SAMMIE C amp C TALK Baseline Figure 61 Answers to the question 1 Which system would you use in the long run The ease of use of the present systems were judged much better than the baseline system Figure 62 While in Figure 40 with a similar question original data of the baseline study were used the data here were collected with knowledge of both systems and with new data as to the baseline study 2 Ease of use in respect of other systems 100 80 8 e SAMMIE EI 60 3 a Cac z Baseline 40 E 20 0 SAMMIE 70 much much C amp C 66 easier harder Baseline 42 Figure 62 Answers to the question 2 How easy were the systems to operate in respect to other systems Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 59 96 In questions 3 7 the Subjects were asked about the advantages and disadvantages of the input modalities There were fiv
128. ter s help this was counted as TCR 1 but was separately noted Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 24 96 e A different selection of tasks was used The very simple playback tasks pause song continue song etc were dropped in the present study The previous Figure 13 shows the overall objective and subjective TCR i e perceived TCR averaged over those tasks and Subjects The bars include all those tasks which were given in both runs tasks 1 5 9 10 The perceived TCR of SAMMIE and C amp C is 83 and 79 The TCR results of the present study were on a level of about 80 This has to be interpreted as a general high level especially when considering the partly tight time conditions The average time at disposal was about 1 30 min The tasks with SAMMIE were completed somewhat more frequently than the tasks with C amp C A Wilcoxon Matched Pair test revealed that the difference of the perceived TCR between systems for the given tasks is not significant Wilcoxon Matched Pairs n 7 T 10 T 18 p 0 5 tasks 1 5 9 10 included Task Completion Rate objective subjective 100 U U U 90 80 aX 70 2 60 o 5096 8 40 5 Send e SAMMIE 2096 C amp C 1096 444 1 2 3 4 5 6 7 8 9 10 Task Figure 14 Task Completion Rate for the systems a
129. tions in the baseline study were put several weeks after the study with a considerable recollection problem so that the baseline data are not included here Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 50 96 12 Support when problems during dialogue 60 50 2 409 2 4096 e SAMMIE a a C amp C B 30 8 5 20 n 10 mean Be SAMMIE 54 C amp C 51 very very good bad Figure 47 Answers to the question 12 Were you supported in case of dialogue problems 13 Information distribution 60 50 2 100 8 40 9 SAMMIE a 30 ases t 5 20 i 1096 mean SAMMIE 71 ne C amp C 63 very very good bad Figure 48 Answers to the question 13 How do you judge the distribution of the information to speech output and display The next figures concern the subjective evaluation of speech output Figure 49 Figure 53 There is a general trend towards a positive judgement but often clearly below maximum Speech output was judged to be less helpful in the present study than in the baseline study Figure 49 This can be interpreted in terms of the actual good display presentations see below with too much speech information now or vice versa in the baseline system Another explanation could be that the simulated driving task was more dema
130. tly from each other After a run the Subject filled in the equivalent intermediate questionnaire s attachment 8 3 The final questionnaire s attachment 8 4 was handed out after the session to be filled in soon at home The subjects were paid by 40 Euros for participation The Non Adaptive NA SAMMIE system was presented at the end of the session To this end six video clips were Final shown where the same task was firstly E presented in the non adaptive version Introduction to Experimental car then in the adaptive version The tasks Training run Questionnaire NA represented functions of the systems Ecg where the different features could be illustrated Personal addressing differentiation of optical and acoustical presentation presentation of albums without with artists usefulness of confirmation user guidance adaptation to user s vocabulary s attachment 8 5 NA SAMMIE presentation Introduction to iDrive Introduction to SAMMIE C amp C Training video Training tasks Intermed question 1 2x Three pre tests were carried out at Bosch Trial run and three pre tests in BEF to test the envisaged method for the main evaluation sessions The tasks and the experimental design were tested in respect to feasibility and duration Figure 7 Experimental realisation Within the C amp C run there
131. veraged over Subjects and the selected tasks As the following Figure 26 illustrates there was about one rejection task but fewer rejections task with the SAMMIE system as compared to the C amp C and baseline system The difference of rejections SAMMIE C amp C is barely missing significance Mann Whitney U Test SAMMIE C amp C n 209 n 119 U 10988 p 0 058 considering all single tasks of all Subjects If considering all tasks then a mean of even 0 96 rejections task resulted for the SAMMIE system Corresponds to the green data of tasks 1 4 1 3 3 3 1 5 3 5 and 3 4 in Figures 20 and 21 of the Final report Evaluation of the TALK baseline system BEF 31 01 2006 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 34 96 Overall rejections task m SAMMIE tasks 1 4 6 8 C amp C Baseline tasks 1 5 9 10 1 18 SAMMIE C amp C Baseline Figure 26 Rejections for the systems averaged over Subjects and selected tasks Figure 27 and Figure 28 show the false reactions and rejections per task for the systems and single tasks averaged over Subjects There were no false reactions with the C amp C system in tasks 1 9 and 10 for the relevant Subjects One explanation is that in task 1 Alben was recognized very well and in task 10 mostly iDrive was used The false reactions task d
132. vidual driving performance than on the operation of the multimodal systems A better mastering of MP3 systems does not necessarily lead to a better driving The following Figure 32 and Figure 33 show the driving quality scores for the systems evaluated subjectively by the experimenter and supervisor s above averaged over tasks and Subjects in the first figure additionally over scales There is no pronounced difference between driving quality scores for the systems Le the subjectively judged driving quality of the Subjects was nearly equal with both systems which confirms the objective driving quality results As could be observed some Subjects drove very cautiously and relatively slowly during the complete session more or less independent from system and tasks no tasks They wanted to Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 39 96 perform well and did not play with the MP3 system and the car Often they relied somewhat more on manual input by iDrive Some other Subjects mostly the younger ones drove in a superior style played with the MP3 system and the car and operated often with speech input Those individual differences effected the driving quality more than the respective interactive system 5 0 4 0 3 0 2 0 1 0 Overall driving quality score 3 6 SAMMIE tasks 1 10 C amp C tasks 1 5 9 10 Figure 32 Overall
133. vorigen Men ebene d h zur vorigen Darstellung Au erdem befindet sich daneben eine Taste Hauptmen mit der Sie in die oberste Ebene des Men s kommen Diese Ebene wird auch der Ausgangspunkt vor jeder Aufgabe sein Wie Sie es von anderen MP3 Systemen kennen gibt es hier unter anderem den Men punkt Musikrichtungen z B Pop Rock Deutsch Rock Jazz etc Sie werden bei der folgenden Versuchsfahrt das Dialogsystem verwenden das Ihnen der Versuchsleiter nun sagt SAMMIE System Kommando System Sie bedienen das SAMMIE System mit Sie bedienen das Kommando System mit Spracheingabe wahlweise in nat rlicher Spracheingabe wahlweise in Kommando Sprache oder manuell ber den iDrive Knopf Sprache oder manuell ber den iDrive Knopf Wenn Sie f r eine Aufgabe oder einen Teil Wenn Sie f r eine Aufgabe oder einen Teil einer Aufgabe die Spracheingabe w hlen einer Aufgabe die Spracheingabe w hlen dann sprechen Sie im Prinzip so als wenn Sie dann sprechen Sie im Prinzip einzeln die mit einer Person sprechen w rden also in W rter die Sie auf dem Display sehen oder nat rlicher Sprache Sie sollten m glichst in die unten erkl rten Steuerw rter Weiter einfachen ganzen Satzen sprechen Sie etc also in Kommandoform f hren den Dialog etwa so wie bei der zwischenmenschlichen Kommunikation also im Wechselgesprach mit dem System Sie k nnen auch Interpreten Alben Titel und Playli
134. were some erroneous settings of the system mode Le with Subjects 2 and 15 there was non adaptive SAMMIE variant instead of the C amp C variant set and with Subjects 14 partly and 21 the Full SAMMIE was set instead of C amp C These data were excluded from the objective and partly from the subjective results 14 This sign was necessary to differentiate between the objective and subjective TCR When it was obvious for the experimenter that the task was finished she nevertheless demanded the sign Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 19 96 3 OBJECTIVE RESULTS 3 1 Preferred modality One of the main objectives of the evaluation study was to investigate the multimodal interaction between the user and the SAMMIE system Do users prefer one modality to another or do they change between modalities during their dialogue Since the Subject usually had the free choice between modalities this question could be answered clearly The following Figure 8 shows the overall preferred modality for the SAMMIE C amp C and baseline system Here the modality of the respective first action was considered i e the input mode with which the Subjects started to perform the task The SAMMIE and baseline data include all tasks of the respective study the C amp C data include only those tasks without any constraints as to modality 4 5 9 10 The baseline data comprise the free run without any constrai
135. xperiences Similarly a Wilcoxon Matched Pair test reveals that there is no significant difference between experience groups neither for SAMMIE nor for C amp C e g SAMMIE Wilcoxon Matched Pairs n 10 T 22 p 0 58 Overall number of turns Task 7 0 speech iDrive turns 6 0 5 0 407 3 0 207 l 0 0 SAMMIE tasks 1 10 C amp C tasks 1 5 9 10 Figure 19 Overall number of turns Task as a function of system and MP3 experience speech and iDrive turns added averaged over tasks and Subjects within subgroups 2 Since another principle of counting turns was applied in the baseline study a comparison with the baseline study is not possible The Mann Whitney U Test for independent samples was used for these comparisons single tasks of all 21 17 Subjects were considered here 5 The calculation of the standard deviation requires a normal distribution which is not given here But as a rough measure for the data variance it is used here nevertheless It was calculated by considering directly all tasks from all Subjects Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 29 96 As the next diagrams in Figure 20 reveal however Subjects with few MP3 experience relied much more on iDrive operation while Subjects with much MP3 experience operated more with speech This younger group took more advantage of the usually less
136. y complex task 2 Der Weg von Herbert Gr nemeyer led to a good system performance and included more easily to remember parameters which resulted in a short task duration A general impression was that the task duration was not a too critical factor if task processing progressed The given domain of infotainment seemed to be a play ground for several Subjects and the driving task was not dependent on any MP3 results Particularly in the free interaction periods the Subjects browsed the MP3 system for a considerable period of time Subject 4 SAMMIE task 7 spiele das lied yesterday aus diesem album Mp3 Player Play ausgel st ask 7 needed the longest task uration Following example is typical task duration of 1 16 s ftask 7 with a comparatively bw number of rejections Subject 4 SAMMIE task 7 2 turns 3 iDrive turns 1 jection 4 driving errors CR 1 Table 7 Example for a typical task duration Subject 4 SAMMIE task 7 Version Final 1 1 Distribution public IST 507802 TALK D6 4 Part I 25 January 2007 Page 33 96 3 5 System errors There were different system errors which can be classified in false system reactions and rejections False reactions were generally all incorrect reactions of the system perceived from the user s point of view Rejections were system reactions like I am afraid I did not understand or a reference to the hel

as *

Contents

Download Pdf Manuals

Related Search

Related Contents