Speech recognition - Lunds tekniska högskolaSpeech recognition - possibility and usability for...

Anna Johnsson

Sara Garmark

Speech recognition -possibility and usability

for people with disabilities

Certec, Departement of Design Sciences,Lund Institute of Technology,Lund University, September 2000

Speech recognition – possibility and usability for people with disabilities ____________________________________________________________________________________________________________________________________________________________________

i

Preface This master thesis was performed from April to September 2000 as a part of the master education in Computer Science and Technology at Lund Institute of Technology, Lund University. The thesis consists of 20 university points, which is equivalent to twenty weeks fulltime studies. A big thank you to: Our supervisors Mikael Reinholdsson at Softhouse and Charlotte Magnusson at Certec, Lund Institute of Technology for support and good ideas. Tord Olsson for ideas and for helping us with the callcentre visits. Claes Ericson at Arbetsmarknadsinsitutet for giving us a realistic perspective on issues concerning people with disabilities. Alf Holmlund at Trygg Hansa, Roland Andersson and Paul Nilsson for being there to fix our computers when they did not work. Håkan Rick and Ola Nilsson for the programming help. Martin Dremo and Marie Höglund for telling us the secrets about SpeechMania and for lots of laughs and nice company during the summer. Kajsa Studt, Karolin Bengtsson, Johan Alfvén and Magnus Flatholm for proofreading. Malmö 8 September 2000 Anna Johnsson and Sara Garmark


ii


iii

Abstract Finding work for people with severe physical disabilities is not easy. This master thesis aims to investigate the possibility of creating new job opportunities for these people, using speech recognition. One possibility is to construct an application where a person works in a callcentre by using his/her voice. To test the idea a co-operation with the callcentre at Trygg Hansa in Malmö was initiated. At Trygg Hansa, a project was started to find out if it was possible to voice-control a workstation for an insurance agent. Both the computer software used for insurance matters and the telephone was to be voice-controlled, using a speech recognition program. Usablility theories and design rules were studied in order to make a good application. The solution that was developed turned out not to work as well as hoped. This was mainly due to the fact that the speech recognition program and the software that was to be controlled were not good enough. Trygg Hansa’s insurance program was unreliable since it crashed to often and the speech recognition program was too slow and difficult to use with other programs. As the work proceeded at Trygg Hansa an idea to make an automati c telephone service where the customer is asked to give insurance number before being connected to manual service arose. This would make the work for an insurance agent with physical disabilities easier. The idea developed into a separate project where speech recognition should be used. A prototype was developed to investigate the idea. The prototype is working and clearly shows the possibilities with a solution like this. The telephone service solution could be developed to a full-scale solution but it is not very useful for persons with physical disabilities without a good voice- controlled workstation application. The idea to use speech recognition to create job opportunities for persons with physical disabilities at callcentres is still good. The only thing missing is technology good enough but this will hopefully come in the future.


iv


v

Table of contents 1. INTRODUCTION...................................................................................... 1

1.1 AIM ..........................................................................................................1 1.2 WORK METHOD........................................................................................2 1.3 CHAPTER OVERVIEW ................................................................................2

2 BACKGROUND ......................................................................................... 4

2.1 SPEECH RECOGNITION .............................................................................4 2.1.1 Phoneme or word based speech recognition .................................. 4 2.1.2 Speaker dependent or speaker independent.................................... 5 2.1.3 The speech recognition process ...................................................... 5 2.1.4 Speech understanding ..................................................................... 6 2.1.5 Programs used in this master thesis ............................................... 7

2.2 DISABILITIES AND POSSIBILITIES ..............................................................7 2.2.1 Medical terminology ....................................................................... 7 2.2.2 What can cause disability? ............................................................. 9 2.2.3 Possibilities ................................................................................... 10

3. USABILITY.............................................................................................. 11

3.1 DEFINITIONS ..........................................................................................11 3.1.1 Nielsen’s definition of usability .................................................... 11 3.1.2 Löwgren’s definition of usability .................................................. 13 3.1.3 Other definitions ........................................................................... 14

3.2 THE DESIGN PROCESS .............................................................................15 3.2.1 Usability engineering.................................................................... 15

3.2.1.1 Know the user .........................................................................15 3.2.1.2 Competitive analysis ...............................................................15 3.2.1.3 Goal setting .............................................................................16 3.2.1.4 Parallel design .........................................................................16 3.2.1.5 Participatory design ................................................................17 3.2.1.6 Co-ordinating the total interface .............................................17 3.2.1.7 Guidelines and heurist ic evaluation ........................................17 3.2.1.8 Prototyping..............................................................................17 3.2.1.9 Interface evaluation.................................................................18 3.2.1.10 Iterative design......................................................................19 3.2.1.11 Follow-up studies of installed systems .................................19

3.2.2 Other design approaches .............................................................. 19 3.2.2.1 Theory-based design ...............................................................20 3.2.2.2 Contextual design....................................................................20 3.2.2.3 Par ticipatory design ................................................................20

3.3 USABILITY TESTING ...............................................................................21 3.3.1 Concept testing.............................................................................. 21


vi

3.3.2 Heuristic evaluation.......................................................................21 3.3.3 Checklists .......................................................................................21 3.3.4 Structured walkthrough .................................................................21 3.3.5 Empirical testing............................................................................21 3.3.6 Thinking aloud ...............................................................................22 3.3.7 Field trial .......................................................................................22 3.3.8 Follow-up study and field study.....................................................22

3.4 DESIGN RULES ....................................................................................... 23 3.4.1 Shneiderman’s Eight Golden Rules ...............................................23 3.4.2 Other design rules ..........................................................................25

4. VOICE-CONTROLLED WORKSTATION AT A CALLCENTRE ..27

4.1 PHILIPS FREESPEECH 2000.................................................................... 27 4.1.1 Training .........................................................................................28 4.1.2 Program Modes.............................................................................28 4.1.3 The Command Explorer.................................................................29 4.1.4 Evaluation ......................................................................................30 4.1.5 How FreeSpeech follows the design rules .....................................31

4.2 WORKSTATION AT TRYGG HANSA......................................................... 32 4.2.1 FRONTER......................................................................................33

4.2.1.1 Eva luation ............................................................................... 34 4.2.1.2 How FRONTER follows the design rules .............................. 35

4.2.2 Centenium......................................................................................35 4.2.2.1 Evaluation ............................................................................... 37

4.3 PRACTICAL APPROACH........................................................................... 37 4.3.1 FreeSpeech and FRONTER solution .............................................38

4.3.1.1 Performance............................................................................ 39 4.3.2 FreeSpeech and Bojnet solution ....................................................41 4.3.3 FreeSpeech and Centenium solution .............................................42

4.4 EVALUATION AND CONCLUSIONS ........................................................... 43

5. VOICE-CONTROLLED TELEPHONE DIALOGUE .........................46

5.1 PHILIPS SPEECHMANIA 99..................................................................... 46 5.1.1 The SpeechMania package ............................................................46 5.1.2 Developing SpeechMania applications .........................................47

5.1.2.1 HDDL ..................................................................................... 48 5.1.2.2 DCOFFLIN............................................................................. 49 5.1.2.3 HDDLPARS........................................................................... 50 5.1.2.4 Recording Station ................................................................... 50 5.1.2.5 Lexicon Manager .................................................................... 50 5.1.2.6 Transactions............................................................................ 50 5.1.2.7 Running the application online ............................................... 51 5.1.2.8 Transcription Station .............................................................. 51 5.1.2.9 Corpus Manager...................................................................... 51


vii

5.1.2.10 Training tools ........................................................................51 5.1.2.11 Work process in summary....................................................52

5.2 PRACTICAL APPROACH ...........................................................................52 5.3 EVALUATION AND CONCLUSIONS ...........................................................54

6. SUMMARY AND CONCLUSIONS ...................................................... 56

7. FUTURE POSSIBILITIES..................................................................... 57

8. WORDLIST.............................................................................................. 58

9. REFERENCES......................................................................................... 59

APPENDIX A: REQUIREMENT SPECIFICATION VER 1.0 .............. 61

APPENDIX B: REQUIREMENT SPECIFICATION VER 2.0 .............. 69

APPENDIX C: SOLUTION SPECIFICATION....................................... 77

APPENDIX D: DIALOGUE FLOW SCHEME ....................................... 85


viii


1

1. Introduction Finding work for people with severe physical disabilities is not easy. One example is those who break their neck in traffic accidents and get paralysed from the neck down. Most of them cannot go back to their old work and a new job has to be found. An idea that has come up is to give these people a better chance on the labour market by using speech recognition. One possibility is to construct an application where a person with physical disabilities works in a callcentre by using his/her voice. To be able to see what possibilities there are to create work opportunities at callcentres, a couple of different callcentres were visited. It was found out that they work in very different ways, some work with just paper and pen while others use sophisticated computer solutions. The callcentre that seemed to be most suited for this project was the one at Trygg Hansa in Malmö. This is because it is a big callcentre with a well developed technical work environment. A co-operation with Trygg Hansa was initiated and the project took place at their office. As the work proceeded at Trygg Hansa, an idea to make the work for an insurance agent with physical disabilities easier arose. The idea was to make an automatic service where the customer is asked to give insurance number before being connected to manual service. This should be done with a telephone dialogue, using speech recognition. This idea developed into a separate project that was not done in co-operation with Trygg Hansa and all the work was done at Softhouse.

1.1 Aim From the beginning the aim was to voice-control a workstation at Trygg Hansa. A physical disabled person should be able to work with both computer and telephone only by using his/her voice. This was to be realised with the speech recognition program Philips FreeSpeech 2000. For more specific information refer to the requirement specification in Appendix B. The aim with the telephone dialogue project was to show the possibilities of the idea to make an automatic telephone service, by developing a prototype. The prototype should be able to ask for information, show it to the insurance agent and transfer the call. For further details see the solution specification in Appendix C.


2

1.2 Work method The work was divided into three phases: 1. Investigation:

• The speech recognition programs Philips FreeSpeech 2000 and Philips SpeechMania 99 were studied in order to see what they could be used for.

• Different callcentres were visited to see how the work was performed there. • Usablilty theories and design rules were studied to be used later on in the

projects, when developing the applications. 2. Development of the Trygg Hansa application:

• A requirement specification was written. • An application was developed based on the requirement specification. The

design rules and usability theories were considered as much as possible. This was the main focus of the master thesis and it took the most time.

• The software used in the project and the application developed were evaluated in a design perspective.

3. Development of the telephone dialogue application: • A solution specification was written. • The application was developed and tested in accordance with the usability

theories and design rules. Since this application only was a prototype some simplifications were made for the work not to reach unreasonable proportions.

1.3 Chapter overview This thesis has the following structure: Chapter 2 (Background) - gives background information about speech recognition and physical disabilities. Chapter 3 (Usability) - introduces the usability concept and the design process. It describes the theory used when working on the projects described in chapter 4 and chapter 5. Chapter 4 (Voice-controlled workstation at a callcentre) - describes the project at Trygg Hansa. Products used, practical approach, evaluation of the solution and conclusions are discussed in detail. Chapter 5 (Voice-controlled telephone dialogue) - concerns a possible further development of the Trygg Hansa project. The development of a telephone dialogue based on speech recognition is described.


3

Chapter 6 (Summary and conclusions) - discusses conclusions of the Trygg Hansa and the telephone dialogue projects. Chapter 7 (Future possibilities) - is about possible future improvements of the solutions in this master thesis. Chapter 8 (Wordlist) - contains a wordlist. Chapter 9 (References) - contains all references used in this thesis. Appendix A: Requirements specification for the Trygg Hansa project, version 1.0. Appendix B: Requirements specification for the Trygg Hansa project, version 2.0. Appendix C: Solution specification for the telephone dialogue project. Appendix D: Dialogue flow scheme for the telephone project.


4

2 Background To easier comprehend the work in this thesis an introduction to speech recognition and physical disabilities are given in this chapter.

2.1 Speech Recognition A speech understanding system consists of many different components. The recognition is only one part of the speech recognition flow, see figure 2.1.

Figure 2.1 The speech recognition flow of a speech understanding system. The following sections describe the different components of a speech understanding system. All facts in this section are from references [1]-[9].

2.1.1 Phoneme or word based speech recognition Each spoken word is formed by a sequence of phonemes. You can compare it to written words that consist of characters. Phonemes are the smallest acoustical parts of spoken words and together they are sufficient to describe the pronunciation of all our words. Most languages consist of forty to fifty phonemes, Swedish of forty-five. Speech recognition based on phonemes is very flexible compared to recognition that is based on words. When a new word is introduced to a system based on phonemes, the application already know all the phonemes in the word and thereby the acoustic model (more about the acoustic model in section 2.1.2). If the system would be based


5

on word models, a new model would have to be trained for each new word. If phoneme based recognition is used it is possible to recognise natural speech, that is when you speak without pauses between the words.

2.1.2 Speaker dependent or speaker independent Speech recognition can be either speaker dependent or speaker independent. Either way the system has to be trained to increase the recognition accuracy. For a speaker dependent solution each person that is going to use the system has to train it before using it. This is done individually so that all users create their own voice profiles. The system then learns the user’s voice and speaking characteristics. If a system is speaker independent, you train the system by having a wide variety of people training it before usage. The group of people that train the system needs to be from both sexes and all dialects have to be represented in the training group. The data from the training is used to derive statistical information, which later will be used to recognise incoming speech. Speaker dependent recognition is mostly used for dictation purposes. For dictation an extensive wordlist is needed and for the recognition accuracy to be acceptable the words need to be pronounced the same way every time. Speaker independent recognition is common in telephone solutions. One example is SJ’s timetable information system. The dialogue is here controlled by the system and only a limited set of words are expected from the users. The system tries to match every input to these words so a difference in pronunciation is not so critical.

2.1.3 The speech recognition process When a speaker’s words come into the system they are recorded and digitised. The speech is then acoustically analysed using Fast Fourier Transformation (FFT), which results in a sequence of feature vectors, see figure 2.2. This reduces the speech input to only the information needed to be able to recognise it. The feature vectors consist of energy and frequency components and they are the input to the speech recogniser, see figure 2.1. The recognition itself is actually a search process. The system tries to find the most likely sequence of words that matches the input, that is the feature vectors. The feature vectors are assigned to the basic acoustic units (a word or a phoneme), which is done with help of an acoustic model.

Figure 2.2 Feature extraction process.


6

The acoustic model describes the pattern of how a certain person or group of people speak, see figure 2.3. During the recognition the incoming signal is compared to the patterns known from training. The pattern that has the smallest difference compared to the incoming signal is then the most likely sequence of phonemes. The acoustic unit is based on Hidden Markov Models (HMM) theory. An HMM describes the speaking rate and the acoustic unit, for example a phoneme or a word. The speaking rate is important to model since the system should manage to understand both slow and fast speakers. The Hidden Markov Model is a simplified model, but it can be trained by efficient algorithms to recognise speech.

Figure 2.3 An acoustic model.

When a sequence of phonemes is found, the system must find a word that matches its acoustic pattern. A lexicon that has the sequences of phonemes for all words, a phonetic transcription, is used to find a match. To increase the accuracy rate a language model is also used. It has statistical data or grammar rules that describe in what order words are most likely to come. The speech recognition technique used for dictation takes great advantage of the language model. This drastically reduces the number of likely words that can follow a word.

2.1.4 Speech understanding What words should the speech recogniser pass on to the understanding unit? If only the most likely word sequence is passed on, there is a risk of loosing information if one of the words would be incorrectly interpreted. To be sure that you do not loose any information the system would have to pass on several possible word sequences, which would be highly redundant. The solution is to use a word graph, see an example in figure 2.4.


7

Figure 2.4 An example of a word graph describing the phrase “I would like to go to Hamburg”. In a word graph, sentence hypothesis can be extracted by following a complete path through the graph. A statistical grammar is used to find the optimal path in the graph and thereby the most likely sentence. For speaker dependent recognition the word graph is based on language grammar, while for speaker independent recognition it is based on expected words.

2.1.5 Programs used in this master thesis In this master thesis two different programs were used, Philips FreeSpeech 2000 and Philips SpeechMania 99. They are both phoneme based and the same speech recognition engine is used. SpeechMania is a program used for telephone applications and is therefore speaker independent. It samples with the same frequency as the telephone net, that is 8000 Hz. FreeSpeech is used to control your personal computer and to dictate. It is speaker dependent and samples at a higher rate than SpeechMania. The programs will be further discussed in chapters 3 and 4.

2.2 Disabilities and possibilities To most people there is no distinction between persons with physical disabilities. People often think that "She is disabled" or "He is in a wheelchair" describes a person's situation satisfying. That this perception is totally wrong will hopefully be obvious after reading this section. As Peter Anderberg, PhD student at Certec, Lund Institute of Technology and wheelchair user since a few years, says " What unites can sometimes be much less than what differs".

2.2.1 Medical terminology This section introduces terminology that is often used when talking about physical disabilities [10]. • Paralysis - the loss of the ability to move all or part of your body or to feel

anything in it. • Tetraplegia - inability to move your arms, your legs and your body. See figure 2.5. • Quadriplegia - same as tetraplegia. Used in the English speaking parts of the

world.


8

Figure 2.5 Tetraplegia (or Quadriplegia). • Hemiplegia - inability to move one half of your body, that is an arm and a leg on

the same side of your body, see figure 2.6.

Figure 2.6 Hemiplegia. • Paraplegia - inability to move your legs and the lower part of your body. See

figure 2.7. • Diplegia - same as paraplegia. Used when you talk about congenital disability.


9

Figure 2.7 Paraplegia (or Diplegia). All of these conditions are central injuries, which means that the damage can be located to the brain, the prolongation of the spinal cord or the spinal marrow.

2.2.2 What can cause disability? Physical disabilities can have many reasons. This is not a complete list, but will give you an idea of what can cause disability. More information and reasons can be found on Neurologiskt Handikappades Riksförbund’s web site [11]. Injuries on the spinal marrow The most usual reasons for injuries on the spinal marrow are a congenital rupture on the spinal marrow hernia, a rupture on the spinal marrow after an accident or as a result of illness. The degree of paralysis differs depending on which vertebra that is injured. [12] CP - Cerebral Palsy Cerebral Palsy, CP, is not a disease but an injury in the parts of the brain that has to do with the ability to move. The injury is most often congenital but can appear up to the age of two. The paralysis can be paraplegic, quadriplegic or hemiplegic. On the contrary to what many people think, most people with Cerebral Palsy are not mentally retarded. [13] MS - Multiple Sclerosis Multiple Sclerosis is an inflammatory disease in the central nervous system. The immune system attacks the fat layer that is the protecting shield on the nerves. For each attack, scars are formed on the nerves, which makes it harder, or impossible, for signals from the brain and the spinal marrow to reach other parts of the body. [14]


10

Stroke Stroke is a common name for cerebral infarct and cerebral haemorrhage and it is the most common reason for neurological physical disabilities in Sweden. A cerebral infarct is a clot of blood in an artery in the brain. This means that no blood saturated with oxygen will reach the nerves behind the clot and they will then get damaged and stop functioning. A cerebral haemorrhage is when the blood bursts the brain tissue. Stroke often causes hemiplegia. [15]

2.2.3 Possibilities Although a disability makes it hard to find a job it is not at all impossible. It is necessary to find a work that suites a person’s ability and that the work environment is adapted to fit his/her needs, not the other way around. The most common workplaces for people with physical disabilities are, according to Claes Ericson – physiotherapist at Arbetsförmedlingen-RESURS, at offices and at manufacturing and assembly lines. Ericson also mentions that is common for people with physical disabilities to work only four to six hours per day. This is due to the fact that they often are stiffed in their joints in the morning and therefore go to work later. The group with most difficulties finding work is people with quadriplegia and this master thesis investigates the possibility for them to work with customer relations via telephone, using speech recognition. Perhaps could speech recognition also be used to make it easier for them to work with administrative tasks, such as processing invoices and salary payments. Another possible option is web designing and other sedentary works with the computer. The possibilities with speech recognition are numerous and there are many yet to be explored.


11

3. Usability Usability aspects should always be considered though the whole development process, not only when designing user interfaces. This chapter is about usability, what it is and how to achieve it. It also discusses some ways to test the usability of a product. The theory in this chapter has been applied to perform evaluations in chapter 4 and in the practical approaches described in chapter 4 and 5.

3.1 Definitions There seem to be as many definitions of usability as there are authors on the subject. Some of the most frequently used definitions seem to be Nielsen’s and Löwgren’s. In the sections below these and some others are discussed.

3.1.1 Nielsen’s definition of usability Nielsen’s definition does not only concern usability but also usefulness and utility. According to Nielsen [16], usefulness as a concept can be divided into utility and usability: • “Usefulness is the issue of whether the system can be used to achieve some

desired goal.” • “Utility is the question of whether the functionality of the system in principle can

do what is needed.” • “Usability is the question of how well users can use that functionality.”

Figure 3.1 Nielsen’s definition of usability.


12

Nielsen then divides usability into five attributes: • “Learnability: The system should be easy to learn so that the user can rapidly

start getting some work done with the system.” • “Efficiency: The system should be efficient to use, so that once the user has

learned the system, a high level of productivity is possible.” • “Memorability: The system should be easy to remember, so that the casual user

is able to return to the system after some period of not having used it, without having to learn everything all over again.”

• “Errors: The system should have a low error rate, so that users make few errors during the use of the system, and so that if they do make errors they can easily recover from them. Further, catastrophic errors must not occur.”

• “Satisfaction: The system should be pleasant to use, so that users are subjectively satisfied when using it; they like it.”

Nielsen states that only by looking at all these measurable attributes it is possible to do a successful design and evaluation. Learnability is perhaps the most important of the five. Learning is the first contact a user has with a system and if it is hard to learn it is easy to get a negative attitude towards the system. It is also important to reach a certain level of productivity in a reasonable time. There are different goals for different users. For a novice user it is more important to have a short learning period than to have a high efficiency while for an expert user it is the other way around. An expert user uses the system often and for long times and it is therefore acceptable with longer learning periods if a high efficiency is achieved, see figure 3.2. A positive thing about learnability is that it is easy to measure. You simply ask a group of test users to perform a specific task and measure the time it takes them to learn the task.

Figure 3.2 Learning time and efficiency.

It is at the time when the learning curve flattens out for the expert users that you can have a look at the efficiency. All expert users do not reach this level soon, but eventually. There are systems so complex that it can take years to reach expert level


13

and for some systems the curve never flattens out because there are always new things to learn. For measuring efficiency you, of course, need expert users and you must define an appropriate level of expertise. Not all users are novice or expert users. There are those who you can call casual users, people who do not use the system regularly but every now and then. A system should be designed so that once a user has learned it, it will not take much effort to refresh his/her memory when using the system again. Memorability is also important for users who have been away from work for some time, for example due to vacation or being on the sick-list. The best way to test memorability is to use test users who have been away from the system for some time and let them perform some specific tasks. According to Nielsen “…an error is defined as any action that does not accomplish the desired goal…”. The error rate is measured by counting the number of errors during a performed task. Most errors do not lead to any bigger problems and can easily be recovered from. However, there are catastrophic errors that are not so easy to recover from. These errors should be counted separately and must be avoided! Nielsen’s final usability attribute is satisfaction. If a program is not pleasant to use it will not be used, unless you do not have an alternative. To find out what the users think of a program, the easiest way is to ask them. This is most often done with a questionnaire. To get a proper result you need to have many users fill out the form and then make some sort of average of their opinions.

3.1.2 Löwgren’s definition of usability Löwgren [17] suggests that usability consists of Relevance, Efficiency, Attitude and Learnability, REAL: • “The relevance of a system is how well it serves the users’ needs.” • “The efficiency states how efficiently the users can carry out their tasks using the

system.” • “Attitude is the users’ subjective feelings towards the system.” • “The learnability of a system is how easy it is to learn for initial use and how well

the users remember the skills over time.” Löwgren’s definition is quite similar to Nielsen’s definition but there are two major differences. There is no correspondence to the relevance concept in Nielsen’s definition, which can be seen as a deficiency. On the other hand Löwgren does not include the error attribute in his definition. Löwgren’s learnability includes Nielsen’s learnability and memorability while the attitude concept resembles Nielsen’s satisfaction attribute.


14

3.1.3 Other definitions The most natural definition of usability is Eason’s definition. According to Eftring [18], Eason defines usability as: • “The major indicator of usability is whether a system or facility is being used.” This definition has many drawbacks compared to others. It is hard to predict the usability this way and there are no ways to measure it. The International Standardisation Organisation, ISO, has tried to standardise the usability definition. In “ISO DIS 9241 Ergonomics requirements for office work with VDTs, Part 11 Guidance on usability” they define usability as [18]: • “The extent to which a product can be used by specified users to achieve

specified goals with effectiveness, efficiency and satisfaction in a specified context of use.”

Effectiveness is defined as: • “Measures of the accuracy and completeness of goals achieved.” Efficiency is defined as: • “Measures of the accuracy and completeness of goals accomplished relative to

the resources (e.g. time, human effort) used to achieve the specific goals.” None of the mentioned definitions focus on the significance a product can have in a user’s life situation. This caused Eftring [18] to introduce a new concept, useworthiness, which he defines: • “Useworthiness is the individual user’s assessment of the extent to which the

technology meets the user’s high-priority needs.” Eftring’s definition is based on Nielsen’s definition of usability. Usefulness has been replaced with useworthiness and “the user’s high-priority needs” has been added, see figure 3.3.

Figure 3.3 Eftring’s definition of useworthiness.


15

3.2 The design process As well as there are many definitions of usability there are many ways to design a new product. However, some things are common to all design approaches. The usability aspect should be considered in all steps of the process. It is especially important that usability oriented design is used during early design stages and before the implementation. After the implementation it is very expensive to correct mistakes. The design process should not be considered as a straight line. The process should be iterative so that the knowledge of an evaluation will lead to a new and improved design, see figure 3.4.

Figure 3.4 The iterative design process.

3.2.1 Usability engineering This section describes usability engineering according to Jacob Nielsen [16]. He divides the design process into eleven stages, which will be explained below. Together these stages form the usability engineering lifecycle model.

3.2.1.1 Know the user A user study is the first stage in the lifecycle. The study should be made in the users’ working environments. It is important to find all users of the system, including system administrators and other support staff. It can be hard to find users who are willing to participate and sometimes it is difficult to know who the users are, especially for a new product. Once you have found the users you must find out each of their characteristics, such as age, educational level, work experience and previous computer experience. This makes it easier to predict the complexity of the user interface. A task analysis has to be made in order to investigate the users’ goals and how they currently perform their tasks.

3.2.1.2 Competitive analysis If there are similar products on the market, perhaps even competitive ones, it is a good idea to do usability tests on these products. The tests are easily made since you do them on a fully implemented product. You can also assume that there is much effort behind these products and therefore the tests will be more realistic than using


16

prototypes. From these tests you can find good and bad things about the products and use this knowledge when designing your own product.

3.2.1.3 Goal setting Based on the user study you define goals for the system. This might mean that you have to prioritise some of Nielsen’s usability attributes above others. If you for example design a product for a company who has a high staff turnover, learnability is the most important attribute since new people continuously are introduced to the system. The goals you set must be measurable in terms of usability and you have to define what performance is acceptable and what is not, see figure 3.5.

Figure 3.5 An example of a usability goal line

Usability goals can be difficult to set on new systems since it is hard to estimate what is reasonable. Another problem is that it may lead to favouring of goals that are easily measured. At this stage it is also time to do a financial impact analysis of the system’s usability. Improvements made on the usability design must not cost more than what is earned in the end by having a better product. The amount earned in the end is estimated by looking at how many users who are using the system, their salaries and other costs and the time they spend using the system.

3.2.1.4 Parallel design To get several design ideas you often let many design groups work in parallel in the very beginning of the creating process. The groups must be strictly parted to get independent proposals that are not influenced by each other. After completing the design ideas you merge them by picking the best parts of the proposals, which hopefully will lead to a good basic design. Parallel design is especially important when designing a new system since you have little guidance. If a system has competitors on the market the competitive analysis described in 3.2.1.2 can be used instead of parallel design.


17

3.2.1.5 Participatory design Even if you at this stage of the design already should “know the user” you should involve them throughout the whole design process. It is not always certain that the designers understand an actual task for the user, which can lead to mistakes. If the users are involved in the process these mistakes will be found early in the process and thus save a lot of problems later on. As a designer you should not expect the users to come up with design ideas on their own. However, their reactions on your suggestions are very valuable.

3.2.1.6 Co-ordinating the total interface One of the most important things to remember when designing a product is consistency. The whole solution, including user interface, online tutorials and course material should be consistently designed. It is also important with consistency at all development levels, even down to code writing. This is to ease code sharing and to make it easier for other developers to understand the code. Consistency should also be considered over time so that new releases are consistent in design with older products. However, you must be careful not to stick to an old bad design just to be consistent.

3.2.1.7 Guidelines and heuristic evaluation When designing a product there are of course conventional guidelines, that specialists have formed, to follow. Examples can be to always give the user feedback of his/her actions or not to use blinking or too colourful symbols on a web page and so on. The guidelines should be applied and adapted to the specific design case and be more specified. These guidelines are then used to set the development standards. The difference between guidelines and standards are that guidelines are rules that are recommended to think about when designing and standards are what must be followed to achieve a consistent design. The guidelines can then be used to do a heuristic evaluation of the product.

3.2.1.8 Prototyping The traditional way of software engineering is to implement executable programs at the last minute. This makes it difficult to involve the users early in the process, as there is little point of showing them abstract documents they will probably not comprehend anyway. Instead it is a good idea to build a prototype that can be tested by the users earlier in the process. The reason for building a prototype is to save time and money and this is achieved by reducing the prototype compared to the real system. This is accomplished either by reducing the number of features, vertical prototyping, or by reducing the level of functionality of the features, horizontal prototyping.


18

3.2.1.9 Interface evaluation Testing a product’s interface on users before the product is released is of course mandatory. Otherwise you risk releasing a product with major flaws in usability, which in the end will be much more expensive to correct. No matter how you choose to test your product, it will eventually result in a list off usability problems. Probably not all of these can be solved and you must find a way to prioritise the problems. The decisions must then be based on data from the user testing. For example you can consider how many users who have the problem, how often and how much extra time it costs them. It could also be a problem that only arises the first time the users try the product. Unfortunately there are not always data to refer to and then you have to trust your intuition. It is often a good solution to let a group of usability specialists rate the severity of the problems. Since there are so many opinions on usability you need several specialists not to get too subjective judgements. The severity of a problem is often presented on a single scale from zero to four, see figure 3.6, or as a combination of how many users that are experiencing problems and how badly they get affected, see figure 3.7.

0 = This is not a usability problem at all 1 = Cosmetic problems only - needs not to be fixed unless extra time is available on the project. 2 = Minor usability problem - fixing this should be given

low priority. 3 = Major usability problem - important to fix, so should be given high priority.

4 = Usability catastrophe - imperative to fix before

product can be released.

Figure 3.6 Example of a single rating scale for the severity of usability problems.


19

Figure 3.7 Example of table used to estimate the severity of usability problems.

3.2.1.10 Iterative design Information you get from letting users test a system or prototype should be used to create a new and improved version. The information can be things the users said during the test or log files showing where the user paused and what errors happened most frequently. It is not always that you improve the design by iterating a step. Sometimes a change does not solve a problem and sometimes a new one appears. It is also possible that what some users see as an improvement, others consider a change for the worse. As users tend to get familiar with the system it is not good letting the same users perform many tests since they do not represent novice users after a while.

3.2.1.11 Follow-up studies of installed systems When a product has been released one should start collecting usability data for new versions and products. Economical data should also be studied to see what impact the new product has had on its users and if the new product is profitable.

3.2.2 Other design approaches While Nielsen only takes up one design approach, Löwgren takes up four [17]: Participatory design, Contextual design, Usability engineering and Theory-based design. In figure 3.8 you can see how Löwgren rates how much involvement and power the users get in these different approaches.


20

Figure 3.8 User involvement in different design approaches. The usability engineering approach is described in section 3.2.1 and here follows an introduction to the other approaches.

3.2.2.1 Theory-based design Theory-based design is the oldest approach of these four and was developed in the late 1970’s and early 1980’s. This approach is based on psychological studies from which it was formed some guidelines on how to design usable systems. Unfortunately, the guidelines are too general and the approach is hard to apply early in the development process.

3.2.2.2 Contextual design The main purpose of contextual design is to more direct take advantage of the users’ experience and to get an overall view of the context. Context means users, tasks, working environment, technical possibilities and so on. The new system is being developed and tested together with the users.

3.2.2.3 Participatory design In participatory design, users and designers design the system together. The process can be seen as an exchange of knowledge between users and designer. The designers get acquainted to the users’ tasks while the users get to know how technology can make their work easier. This approach is particularly interesting from a research point of view.


21

3.3 Usability testing There are many different ways to test the usability of a product. Which method you choose depends on many things. The first thing to decide is if you are going to evaluate a design proposal, a detailed prototype or a finished product. Other important aspects are if there is a well-defined user group and if you can get access to them and what time and resources you have to perform the testing, see figure 3.9. The following sections describe some of the many test methods used. [16], [17]

3.3.1 Concept testing Concept tests are done early in the process before detailed prototyping is started. They are done to evaluate the results of a user and work-study. Since you do not have any prototype to test you have to use paper solutions and scenario storyboards.

3.3.2 Heuristic evaluation Heuristic evaluation is a design evaluation made by usability specialists. A group of three to five specialists are found to be sufficient to discover most usability problems of a product. They work after specified design guidelines of interest to the problem. It is important that the work is done individually during the first evaluation so that each other’s opinions do not affect them. After the initial analysis, all the specialists should discuss their opinions together to figure out which problems are relevant.

3.3.3 Checklists Checklists can be used to find flaws in a design. There are general lists in many books written on designing and they consist of rules of thumb to think of when designing a new product. Sometimes it is important to specify these checklists to the project you are working on so that they become more relevant. If this is necessary it should be done before you start your design work.

3.3.4 Structured walkthrough Structured walkthrough methods are supposed to be used by designers or usability experts. The cognitive walkthrough method is one example of many. It bases on the theory of the human actions and learning. Methods like these require a lot of time since you need to get down into all the details of the user’s work.

3.3.5 Empirical testing Empirical testing is one of the best test methods, since you can try the design on users. It is not always possible to do this because it can be hard to find representative users to try the product on. There are many things to think about when preparing empirical tests such as good test users, the test environment, the tasks to be performed by the test users and so on [16]. It is also important to consider psychological aspects when performing tests on a user so that you do not get


22

misleading results. Today many companies have special usability laboratories with different equipment. This is so that all the users reactions can be recorded and analysed. It is important to think of moral aspects such as not showing the users result to others and so on.

3.3.6 Thinking aloud Thinking aloud is a special form of empirical testing. You let the user perform specific tasks and you tell them to speak out loud what they think when using the product. As a test leader it is easy to understand how the user views the system and what parts of the design that cause problems. However, there are disadvantages. For most people it is not natural to constantly speak as they use a product. As a test leader you will probably need to ask the test users questions to get them to speak. The fact that the user constantly speaks may also slow down the performance since he/she needs to verbalise his/her thoughts and this will make any performance measurements made less representative.

3.3.7 Field trial Field trial is almost the same as empirical testing. The difference is that the tests are performed with real users and in the real environment, not in a usability laboratory.

3.3.8 Follow-up study and field study Some time after the first real installation of a system, when the product has been used for some time, you can make a field study. This is to evaluate the users’ opinions when just learning the new product. To study the products use over a longer period of time you can make a field study.


23

Figure 3.9 Usability test methods. The S, D, P, I stands for study, design, prototyping and implementation. Dark blue means that the method is very suitable and purple means that it may be useful. The different resources are real users (RU), simulated users (SU), usability experts(UE) and development team (DT). In the resource part of the table, dark blue means that the resource in question has to be used and purple means that it may be used.

3.4 Design rules When designing a user interface, one should try to keep to some simple rules. One set of rules that is well known is Ben Shneiderman’s “Eight Golden Rules of Interface Design” [19], which are discussed in section 3.4.1.

3.4.1 Shneiderman’s Eight Golden Rules Ben Shneiderman is a professor at the department of Computer Science at the University of Maryland. He is well known worldwide for his research in human- computer interaction. “The Eight Golden Rules of Interface Design” are based on his many years of research and experience on the subject. The rules focus on the dialogue with the users and can be applied to almost all interactive systems. Below you find the rules with short descriptions. Naturally, the rules must be interpreted and adjusted to each project.


24

1. Strive for consistency Similar situations should be handled in the same way with the same kind of actions. The terminology should be consistent throughout the interface and you should use the same colours, layout, menus, help screens and so on so that the user easily will know his/her way around. This is the rule most violated.

2. Enable frequent users to use shortcuts Most interfaces should suit both experienced and novice users. The more a program is used, the more important it is to simplify the interaction to speed up the work. Frequent users appreciate shortcuts, special keys and macro facilities. If the program is frequently used it is also important to have fast graphics and short response times.

3. Offer informative feedback Every user action should give some kind of feedback. It is especially important to get substantial feedback for unusual and important actions, whereas frequent and less important actions need little feedback. It is also essential to give informative feedback in order to tell the user what actually happened or what he should do. To indicate change in a graphical interface, visual representation is recommended.

4. Design dialogs to yield closure Sequences of actions should have a beginning and a distinct end. This will give the user a feeling of satisfaction of accomplishment and makes him/her ready to go on with the next action without concerns about the previous sequence.

5. Offer error prevention and simple error handling The system should be designed so that the users cannot make serious errors. If an error is made, the system must be able to find the error and help the user correct it, thus it is important with informative error messages. It should not be necessary to redo a whole sequence of actions, only the erroneous parts. Faulty commands must not change the state of the system, but if it does the system should tell the user how to restore the state.

6. Permit easy reversal of actions If actions are reversible, the users will not be so anxious since they know that it is possible to correct errors. Besides this, it also encourages users to explore the possibilities of the system. Reversibility should concern data entry tasks, single actions or a group of actions.


25

7. Support internal locus of control Experienced users expect the system to respond to their actions and they like the feeling of being in charge. Difficulties retrieving necessary information, unexpected system actions and boring sequences of data entry create apprehension and aversion. See to that the user is in charge of the system and is not reduced to a passive system responder.

8. Reduce short-term memory load The human being’s short-term memory can only handle five to nine information units. This implies that you should avoid spreading information on several pages and that the design should be kept simple. It is also important that there is time and possibility for the users to learn to handle sequences of actions using mnemonics. On-line help should be used to decrease the memory load by displaying abbreviations, codes and command-syntax forms.

3.4.2 Other design rules Of course there are other design rules than Shneiderman’s. Here are some complementary rules taken from Kirsten Rassmus-Gröhn and Henrik Danielsson’s lecture “Introduction to MMI” given in the course “Rehabilteringsteknik AK” at Lund Institute of Technology [20]. Give visual clues A well-designed product should give the user clues to how the product should be used and what it should be used for. For example a knob should say “Please, turn me” to the user. You should also consider where you place control device in an interface. Crucial buttons with diverse functions must not be placed next to each other. For example it is not a good idea to place a save button next to an erase button in a drawing program. Mapping Mapping is to take advantage of natural analogies, cultural standards and general biological relationships to get a simple and unambiguous understanding. An example of a cultural standard is that all reading in the western world is done left to right and top to bottom. This can be used when placing buttons in a user interface, where the most important button should be placed in the upper left corner. A good example of bad mapping is a bike that turns left when you turn the handlebars to the right. For mapping to work it is essential that each control have only one function. Metaphors Metaphors are a good way to achieve mapping between a computer system and the real world. Using metaphors in an interface often helps the user to better understand the functions of a program. A good example is the recycle bin used in personal


26

computers that works as a real recycle bin. You can put files in the bin and pick them up and use them again if you change your mind. Colours Using colours is a good way to emphasise things. It is an effective way of drawing attention to something and to bring out information. Colours can group objects together or keep them apart. It also makes an interface more appealing to the user. However, you must be careful when using colours. It can be hard to grasp information with too many colours. Many colours have certain meanings; red is often connected to stop or heat, green is associated with OK and yellow together with black stands for danger. For example, if you use green to indicate high temperature on a temperature meter in a car it will mislead the driver into thinking that high temperature is okay. As many people are colour-blind, you should not only use colours to indicate things. A good design example is a traffic light where you also use position to signal “drive” and “do not drive”. A useful colour combination is black and yellow since these are the best contrast colours. This combination is often used for persons with visual impairments. Shapes and movement It is not only colour that captures the users’ attention but also shapes and movements. Large objects are easier to notice than small ones, irregular shapes draw more attention then regular and moving objects are more visible than static ones. Sometimes moving objects take too much attention though, just think of the little Office Assistant. It is hard to ignore him even if you try to. Sound Sound is also good to get the user’s attention and to give feedback or enhance a message. As with everything else sound should be used moderately. For frequent actions sound should be avoided since it gets tiresome.


27

4. Voice-controlled workstation at a callcentre In this part of the master thesis the aim was to voice control a workstation at Trygg Hansa with Philips FreeSpeech 2000. Trygg Hansa is a Swedish insurance company with offices in several cities in Sweden. This project took place at the office in Malmö where they work with customer service. All customer contacts are made over the telephone so the personnel in Malmö spend most of their time on the phone. This Trygg Hansa office deals with accident, house and home, vehicle and boat insurance. The personnel are divided into two groups, those who take incoming calls, the insurance agents, and those who make outgoing calls, the sales agents. The insurance agents work daytime and the sales agents work evenings. The sales agents work with campaigns and calls potential customers from a campaign database. A campaign can for example be to give all Statoil’s customers special offers on car insurance. As Trygg Hansa is a Swedish company, all the work is done in Swedish and therefore the Swedish version of FreeSpeech was used. This chapter describes FreeSpeech and the programs used at Trygg Hansa. It also includes a description of the work done, an evaluation and conclusions from the project.

4.1 Philips FreeSpeech 2000 Philips FreeSpeech 2000 is a program that enables you to create and edit documents, surf through web links, control your Windows environment and more by speaking to your computer [21]. FreeSpeech lies on top of all other programs in the Windows environment, see figure 4.1. With FreeSpeech you can control other programs such as Microsoft Office and Microsoft Internet Explorer. The Swedish version of FreeSpeech requires that the Swedish version of Microsoft Windows is used. In this project a microphone called SpeechMike, that comes together with the program, was used to give the input to the computer. It is possible to use other input devices if you would like.


28

Figure 4.1 The graphical user interface of FreeSpeech with the command explorer open.

4.1.1 Training To obtain reasonably good recognition accuracy you must train FreeSpeech. The program is speaker dependent and needs to be trained individually for each user. You create your own vocal reference file by reading FreeSpeech’s training texts aloud. It takes approximately eight hours to get a proper result. Additional training is added to your profile every time you use the program if you choose to use this feature. If a person uses the program with another user’s profile, this profile will be damaged. If you install FreeSpeech at another computer you can export your trained voice profile and import it to the new installation.

4.1.2 Program Modes The program has four different modes, which enables you to use different functions of FreeSpeech. • Command Mode This mode is used to control the operating system,

FreeSpeech and other programs. It also enables you to format and edit texts. This was the most frequently used mode in the project.

• Dictation Mode In this mode you can dictate texts directly to other programs.

Active user Recording level

Current mode

Command explorer

Command group


29

• Spelling Mode In this mode it is possible to spell out words and numbers.

• Sleeping Mode Puts FreeSpeech on hold and waits for your wakeup call.

The command mode is the most useful mode since it is used to perform many different tasks. This mode is used to control the programs you are currently using. All the program windows can be closed, maximised and minimised. You can also switch between open programs by saying “Nästa/föregående program”. If a program has several windows, FreeSpeech cannot, in most cases, switch between them. You can also make your own commands to meet your needs. If you for example would like to open a program that does not have a shortcut on the desktop, you can create a new command where you specify the path to the program file. If you have a shortcut on the desktop, you will not need to create a command since FreeSpeech knows what is on the desktop. You can also control a program that is not supported by FreeSpeech by creating new commands. In this case you record mouse and/or keyboard macros. A macro is a short mouse or keyboard sequence that you give a command name and then you record it. It is then played every time you say the corresponding command. For example, if you want to create a command that clicks the Task Bar with the right mouse button you simply name the command “Högerklicka Aktivitetsfältet” and record a macro where you put the mouse marker on the Task Bar and click the right button. Now when you say “Högerklicka Aktivitetsfältet”, the program will run its macro and the task will be performed. Keyboard macros do not have to be created for the keys that are most used such as Tab, Enter, Space and the “F”-keys, because FreeSpeech already has commands for these. The dictation mode can be used for all kinds of dictation, independent of what programs you use. You can dictate directly into dialog boxes and web browsers. If you need to correct your dictation you can use the EasyCorrect function, which is activated by saying “Correct <text>”. However, the easiest way to correct a dictation with your voice is to switch to command mode. In the command mode you can move back and forth in the text and cut and paste as you wish. To activate the sleeping mode you just say “Vila” and FreeSpeech stops to listen. By saying “Vakna” you return to the previous mode used.

4.1.3 The Command Explorer The Command Explorer assists you with available commands when using FreeSpeech. It follows your work and displays the commands you need to the mode you are working with. There are several alternatives for each command so that you do not have to search for a command, just say what comes naturally. If you still do not like any of the command alternatives you just add your own alternatives.


30

If you use FreeSpeech to surf the Internet the Command Explorer will keep track of the available hyperlinks. Just say the name of a hyperlink and you will soon be there. If the site you are visiting has more than one language you have to pronounce the links with the active context language’s accent.

4.1.4 Evaluation Philips FreeSpeech 2000 is a good product for home environment purposes. It is fun to play with and you get a “future feeling” when talking to your computer. However, for professional applications it is simply not good enough. It needs to be easier to adjust to programs you have made yourself. Mouse macros do what they are supposed to, but they are time consuming to work with. This even though a Pentium 266, which is within the system requirements, was used. FreeSpeech should also be compatible with more programming languages. It cannot handle some programs, for example those written in Visual Basic. A program’s user interface can be very similar to Microsoft Windows program, which FreeSpeech supports, but still be unrecognisable to FreeSpeech. FreeSpeech should support more than one language at the same time. That would make it easier to navigate the Internet, since it is common with Internet sites that mix languages. To surf the Internet with FreeSpeech and have the Command Explorer activated at the same time is slow since the Command Explorer cannot keep up with the pace of the browser. When a page is downloaded and you directly want to “click” on a link, FreeSpeech will not hear you since it is busy updating the Command Explorer. If two hyperlinks on a page have the same name, the command explorer will only find the first. Also, if a link contains an &-symbol the link will not be displayed in the Command Explorer. A big problem with the Command Explorer is that you cannot use your voice to control it completely. You can open it and choose between the different command groups but once there you cannot move down to the different commands. This means that you cannot select a specific command and see the alternative command names. The fact that you cannot move down to the commands also means that if there are too many commands to fit in the window, you cannot see them all. The only mode that works satisfactorily is the command mode. In dictation mode or in spelling mode it is difficult to correct a wrong dictation. If you try to correct the mistakes directly in respectively mode, FreeSpeech often does not understand that you want to correct and it makes an even bigger mistake. If you want to avoid this you have to change mode, to command mode, which is time consuming. The sleeping mode is quite often wrongly activated when FreeSpeech misinterprets your commands and dictation. This is very frustrating as FreeSpeech then ignores all


31

your commands. The opposite, FreeSpeech waking up when you do not want it to, is equally common and just as frustrating. Since the project took place at both Trygg Hansa and Softhouse it was necessary to move voice profiles between the offices. When FreeSpeech first was installed at Trygg Hansa it was necessary to create a new user to be able to start up the program the first time. This is time consuming and very unnecessary if you already have a user you want to import. When you finally are allowed to import a user it is not at all sure that you get what you expect. Sometimes all your personal commands have vanished somewhere in the export/import process. Whatever you are trying to do with FreeSpeech there is always quite a long response time. Every time you tell the program to do something it takes some time to interpret the command. This also applies to dictation and spelling mode. Even when you want to exit FreeSpeech you have to wait a long time and very often you have to use the Task Manager to close the program. Finally, if you encounter problems using this program, you should expect only limited help from the FreeSpeech help service. They will not tell you anything that is not in the manuals and it will take them at least 24 hours to answer by e-mail. The FreeSpeech Direct Help included in the program will give you some help but do not believe in everything you read.

4.1.5 How FreeSpeech follows the design rules This section is about how FreeSpeech follows the design rules discussed in section 3.4. Generally FreeSpeech is a well-designed program, perfect for novice users. However, the frequent and expert users seem to have been forgotten. • The consistency of FreeSpeech is good. The layout is consistent and the pop-up

windows all look the same. Different functions have similar sequences of actions. • FreeSpeech’s user interface is very easy to use. It is no problem to understand

what to do with it and it is therefore perfect for novice users. However, there are no functions to speed up the working process for frequent users. No shortcut keys or special commands are available and some functions are very time consuming since they are very instructive. One example is when you want to create your own commands. You have to open “Inställningar” and then you get a pop-up window. One of the folders, “Mina kommandon” or “Dikteringskommandon”, has to be selected to be able to push the “Nytt” button. After one or two seconds, that is too long time, you select what kind of command, for example “Musmacro-kommando”, you want to create. Then there is a new pop up window where you finally can create your command. Every time you want to create a new command you have to start all over again by selecting the right folder and so on. This is very time consuming if you want to create many commands and it is very annoying.


32

There should be a faster way to create commands, to meet all users’ needs. Perhaps the program could suggest the same folder and command type as used previous when you want to create your second command.

• FreeSpeech also has long response times, especially when you change user. However, this may have technical reasons and it can therefore be hard for Philips to “correct”.

• The feedback given in FreeSpeech is at a good level. You do not get feedback unless it is needed. The most important feedback is the little balloon that shows when you speak. It displays how the program interpreted what you said. Another feedback you get when you speak is the volume control in the right corner of the taskbar, which shows at what volume you speak. When creating your own commands you can choose if and when to get feedback. Pushing the button “Kör macro” will show you what command you have recorded. When pushing the taskbar microphone button you get good visual feedback by the text displayed in the right corner of the taskbar, telling you if the microphone is on or off.

• Dialogs are clearly finished and you know when you are done with an operation. Most of the time it is shown by the fact that the dialog window closes.

• No errors have been made in the Trygg Hansa project when using FreeSpeech. This implies that it is well designed to prevent you from doing erroneous actions.

• All data entry tasks are reversible in FreeSpeech and most of the actions as well. When using FreeSpeech the short-term memory can be a bit overloaded since you need to keep all the commands in your head. The command explorer does not give you much help since you cannot voice control it properly. If this was solved you would probably not need to load your short-term memory at all.

• FreeSpeech’s user interface gives you clear visual clues how to be used. However, there is a big flaw in Microsoft Windows’ program menu where you open FreeSpeech. The uninstall FreeSpeech program is next to the opening of the program. These two menu alternatives should not be placed next to each other!

• Mapping and metaphors are successfully used and this is one of the reasons that FreeSpeech is so easy to use. The buttons used to record or to play a message are shaped like those on a real tape recorder. The designers have also used appropriate colours to indicate what to do with the buttons.

4.2 Workstation at Trygg Hansa A workstation at Trygg Hansa consists of a telephone and a computer. They are placed in office landscapes with approximately sixteen of them per landscape. At the workstations the two most frequently used programs are FRONTER and Centenium, which are described in sections 4.2.1 and 4.2.2.


33

4.2.1 FRONTER FRONTER is a program developed by Trygg Hansa for insurance handling. It is used for retrieving customer information and making insurance cost estimates. FRONTER is only a new graphical user interface that is based on the old software, which is here referred to as the Main Computer. It is written in Visual Basic, which is not supported by FreeSpeech. The version used in this project was FRONTER 2.1, but a version 3.0 was released in August 2000. Since it is only used in Sweden its graphical user interface is in Swedish. Figure 4.2 shows the graphical user interface of FRONTER 2.1. The Main window displays information about the customer’s household and all its insurance. In order to display the information you need to enter a social security number or an insurance number. To get more information you highlight an object and press enter or double click. A new window appears on the screen with more specific facts. In the “Kundjournal” window you can see previous contacts with the customer; contact dates, contact persons at Trygg Hansa and possible notes are displayed.

Figure 4.2 Graphical user interface of FRONTER.

Menus

Insurance number / social security number

Main Window

Kundjournal Window


34

If you want to make an estimate of an accident insurance you open the “Olycksfall” window shown in figure 4.3. You do this by opening the “Arkiv” menu in the Main window and choosing “Ny”>”Olycksfallsförsäkring”. In order to make the estimate you choose product, payment specification, health declaration and possible discounts by filling in the fields. The price is automatically displayed in the window.

Figure 4.3 The “Olycksfall” Window in FRONTER.

4.2.1.1 Evaluation The version of FRONTER that was used in this project was very unstable and quite often the program would not start. The only solution to the problem is to uninstall FRONTER and reinstall it again. Hopefully, this problem will be solved in version 3.0 of the program. Another flaw in the program is that the windows are inconsistently designed when it comes to actions. Different windows have different keyboard commands for the same operation, for example to drop down a combo box. This makes it very hard to navigate through the forms needed to make an estimate. You need to remember exactly what keyboard commands that should be used in every

Product choice

Groups

Products


35

window and therefore you need a lot of practice to be able to navigate easily. It is also necessary that the cursor is placed correctly in a window for certain commands to work properly and this too requires that you have a good memory.

4.2.1.2 How FRONTER follows the design rules After having used FRONTER it was clear that some design rules had been broken, while others had been followed rather well when designing the interface. The design is very similar to the Microsoft Office programs, which is an advantage to those who are familiar with them. • Visually FRONTER is very consistent; all windows look alike and have the same

menus. When it comes to actions the consistency fails though. As long as you use the mouse there is no problem but when using the keyboard some actions are inconsistent. For example when you drop down a combo box you sometimes must press enter and other times you have to press the down arrow key.

• Shortcut keys are available in FRONTER but they are not that easy to find. Instead of displaying the available shortcuts in the dropdown menus you find them on a page on the Intranet, that is if you know that they are there.

• FRONTER offers informative feedback that gives closure to sequences of actions when it is needed. One example is that when you have made an estimate and wants to save it, FRONTER opens a message box that tells you that the insurance is created.

• Sometimes the program crashes without the user doing anything wrong. The error message shown then is not informative and it is impossible for the user to understand what went wrong. However, many errors are prevented since FRONTER uses combo boxes in which you can choose from some alternatives instead of filling in the fields yourself.

• FRONTER applies the mapping principle well. For example are the most important menus at the top of the windows, with the most important to the left.

• When it comes to metaphors, FRONTER is sometimes good and sometimes bad. The little button with the calculator in the “Olycksfall” window is a good example – it is rather obvious that this button is for calculating the cost of insurance. A bad example is the button with the little person in the main window. It is rather hard to see that he is waving goodbye, indicating that this is the button for clearing the window from customer information.

• FRONTER does not have that many colours, which is a good thing since a lot of colours would make the windows blurry. An active window is indicated with a blue border and the line where the cursor is placed is also marked in blue. This is sufficient for the user to know where in the program he/she is.

4.2.2 Centenium Centenium is a program used by the sales agents who call potential customers and sell them insurance. The program is developed by Portal Connect, which is an


36

American company, and thus the basic parts of the program are in English. Centenium is based on the predictive dialling principle and is used to call up campaign customers. This means that the program continuously calls up new customers from a database. When the sales agent is finished with a customer, a new customer is always on the line. Centenium sorts out all customers who do not answer the phone and most of the answering machines, which leads to much higher efficiency for the sales agent. When the sales agent connects to the campaign, he/she presses a button on the phone and blows into the headset to confirm that he/she is ready to take calls. As soon as the customer has answered the phone, Centenium displays customer information such as name and social security number. Centenium also displays a help text for the sales agent on how to present his business, see the graphical user interface in figure 4.4. These parts are adjusted by the administrator at Trygg Hansa to suit a certain campaign and are therefore presented in Swedish.

Figure 4.4 Graphical user interface of Centenium.


37

If a customer does not have time to speak to the sales agent, a note of a callback arrangement can be made in the program, see figure 4.5. When it is time to call the customer again the program will do this automatically.

Figure 4.5 Call-back Window

4.2.2.1 Evaluation Centenium has a good and intuitive user interface but it works poorly. The sales agents are quite often thrown out of the program and need to log on again. There is also supposed to be a connection to FRONTER so that the customer information automatically is shown in FRONTER. However, most of the time this does not work.

4.3 Practical approach Before starting the work at Trygg Hansa a requirement specification was made, see Appendix A. It was made without much knowledge about the computer environment at Trygg Hansa and the work performed there. As the knowledge increased the requirement specification had to be revised, see Appendix B. Somewhere along the way it was decided that this master thesis report was to be written in English, which is why the first version of the specification is in Swedish and the second in English. As you see there are many differences concerning the telephone solution between the two versions. This will be discussed more detailed in section 4.4. The first step at Trygg Hansa was to get to know the users and to get acquainted with their work environment. What does an agent do? Which computer programs are used? Which are the most common tasks performed? The differences between incoming and outgoing calls were also studied and the conclusion was that for


38

incoming calls only FRONTER is used. The most common tasks performed for incoming calls were answering questions about existing insurance and insurance terms and making estimates for new insurance. For outgoing calls both FRONTER and Centenium were used. The aim for a sales agent making outgoing calls is to sell insurance to new customers. Thus, the only task he/she performs is making estimates. At this point it was decided that the project should be limited to outgoing calls, otherwise it would be hard to practically carry through the project, more about this later in this section. It was considered to meet some potential users with physical disabilities in order to find out what they would expect from an application like this. However, it is hard to explain how the work is carried out at Trygg Hansa and how the application is supposed to work, without having a prototype to show. Besides this, it is usually hard for users to know what they want without having a proposal to start from.

4.3.1 FreeSpeech and FRONTER solution The first step towards a solution was to take a closer look at FRONTER. To be able to make estimates and other things that an agent does, without really doing it, a test version of FRONTER was used. It took quite some time to learn FRONTER although the program itself is very intuitive. However, it was only intuitive when using the mouse, not when using the keyboard. Navigating FRONTER with the keyboard was inconsistent and therefore took some time to learn. Shortcut keys existed but were not included in the menus. After this it was time to make a solution for FRONTER using FreeSpeech. Since FreeSpeech does not support programs written in Visual Basic, it could not easily handle FRONTER. Instead, a lot of mouse and keyboard macros had to be recorded. In the striving for consistency, the command names for handling the windows were put together in the same way, window name and menu name. The macros recorded to FRONTER are: • FRONTER Arkiv Opens the “Arkiv” menu in FRONTER’s main window. • FRONTER Redigera Opens the “Redigera” menu in FRONTER’s main

window. • FRONTER Visa Opens the “Visa” menu in FRONTER’s main window. • FRONTER Verktyg Opens the “Verktyg” menu in FRONTER’s main

window. • FRONTER Hjälp Opens the “Hjälp” menu in FRONTER’s main window. • FRONTER Menyrad Marks the first menu in FRONTER’s main window. • Avsluta kund Clears the main window. • Kundjournal Arkiv Opens the “Arkiv” menu in FRONTER’s “Kundjournal”

window.


39

• Kundjournal Redigera Opens the “Redigera” menu in FRONTER’s “Kundjournal” window.

• Kundjournal Visa Opens the ”Visa” menu in FRONTER’s ”Kundjournal” window.

• Kundjournal Verktyg Opens the ”Verktyg” menu in FRONTER’s ”Kundjournal” window.

• Kundjournal Hjälp Opens the ”Hjälp” menu in FRONTER’s “Kundjournal” window.

• Kundjournal Menyrad Marks the first menu in the “Kundjournal” window. • Kundfakta Menyrad Marks the first menu in the “Kundfakta” window. • Nästa grupp Jumps to the next group in the “Olycksfall” window.

• Föregående grupp Jumps to the previous group in the “Olycksfall”

window. • Olycksfall Menyrad Marks the first menu in the “Olycksfall” window. • Spara som försäkring Converts an estimate to insurance. • Nästa flik Next tab in the “Kundfakta” window. • Bakåttab Tabs backward, shift + tab. When first using the new commands, they sometimes worked and sometimes did not. FreeSpeech support service was contacted but they had no solution to the problem. By shear accident, after a few days of struggling, it was discovered that the main window in FRONTER changes name and this was what caused the problem. When first starting FRONTER the main window is named “FRONTER” but as soon as customer information is shown in the window, the name changes to “Kundengagemang”. Probably FreeSpeech identifies a window by its name but this is not confirmed by FreeSpeech support service since they say that they do not answer technical questions. If you create a macro when the window is called “FRONTER”, the macro does not work when the window is called “Kundengagemang”. This meant that all the macros had to be recorded once again. FRONTER has many windows and this caused some problems when using FreeSpeech. As mentioned in section 4.1.2 FreeSpeech can switch between programs but not between windows in the same program. However, FRONTER has a shortcut key for switching between the main window and the “Kundjournal” window. When another window pops up in FRONTER this becomes the active window. If, for some reason, the window is deactivated it is impossible to activate it again with your voice. This means that you cannot finish what you started.

4.3.1.1 Performance The FRONTER solution made with FreeSpeech is working but it is very slow. The macros were made to be as fast as possible but they were still very slow. By using iterative design the performance of the solution was improved. To speed up some of


40

the macros, keyboard macros were used instead of mouse macros. For example, the shortcut key Alt + A was used to highlight the first menu. However, this is not efficient since you also need to say “Enter” to open the menu. In addition to this it is a lengthy procedure to get to the other menus since you only get to the first menu in the row with Alt + A and then have to move to the right to get to the others. Another time consuming operation is when you want to enter a social security number or an insurance number in order to open customer information. For FreeSpeech to understand, you have to say each digit separately, that is you have to use the spelling mode. This is very time consuming and even more if you make a mistake. If you for example say 1976 as “one-thousand-nine-hundred-seventy-six”, FreeSpeech will interpret it as 1000 900 70 6. Instead, creating macros for every year (1900-2000) and all numbers 01-99 solved this problem. This also improves the performance in case of mistakes since you do not have to change modes to correct. In order to get a conception of how much slower it is to use FRONTER with your voice compared to using your hands some measurements were made of the most important tasks, see tables 4.1-4.3.

1st try 2nd try 3rd try Average Hands 3.0s 3.4s 3.5s 3.3s

Voice 5.3s 5.3s 4.5s 5.0s Table 4.1 Time for opening FRONTER.


Voice 27.7s 30.3s 25.3s 27.7s

Voice macros 18.7s 30.5s 18.6 22.6s

Table 4.2 Time for opening customer information in FRONTER.

1st try 2nd try 3rd try Average Hands 30.7s 29.2s 28.1 29.3s

Voice 109.8s 106.4s 102.9s 106.4s Table 4.3 Time for making accident insurance

estimate.

An ordinary stopwatch was used and therefore the human reaction time has to be considered. This means that for small measurements the relative error is large. Another thing to consider is that the test user knows exactly where to go and what to do which gives shorter times.


41

If FRONTER had been written in a programming language supported by FreeSpeech, the solution would have been faster. You would not need macros for the menus and all dialogue boxes would be easier to handle since you could just say for example “Okej” to click an OK button. No usability testing was done for this solution since the result would be misleading. If the test users would be agents working at Trygg Hansa who are familiar with FRONTER, they would probably be too focused on the speed of the solution to be able to evaluate the functionality of it. On the other hand if the test users had never worked with FRONTER before, the evaluation would be more of FRONTER than of the voice-controlled solution.

4.3.2 FreeSpeech and Bojnet solution A lot of information can be found on Bojnet, which is the Intranet at Trygg Hansa. Insurance terms, price lists and company information are some of the things found on Bojnet. This means that you have to be able to navigate the Intranet with FreeSpeech for the application to be interesting in reality. Fortunately, FreeSpeech is very easy to surf with, as mentioned in section 4.1.3, and no adjustment was needed to meet this demand. To see how much slower it was to navigate Bojnet with FreeSpeech than with a mouse, some measurements were made, see tables 4.4 – 4.6. The measurements were made the same way as for FRONTER.


Voice 5.1s 5.3s 3.8s 4.7s Table 4.4 Time for opening Bojnet.


Voice 12.40s 13.27s 10.97s 12.21s Table 4.5 Time for looking for information about

accident insurance on Bojnet.


Voice 11.3s 17.3s 10.5s 13.0s Table 4.6 Time for looking for information about

scooter insurance on Bojnet.


42

4.3.3 FreeSpeech and Centenium solution After finishing the work with Bojnet it was time to study Centenium. Unfortunately there is no test version of the program, which made it impossible to make a complete FreeSpeech solution. The only way to study it properly was to watch a sales agent using it. At the first attempt to do so, Centenium did not work at all and the calls had to be made manually and at the second attempt it was working, but very slowly. This was due to an almost finished campaign with few customers left and the fact that it was holiday season and therefore not many people at home to answer the telephone. In addition to this the connection to FRONTER did not work and no one seemed to know why. In order to make a solution, even though not a complete one, Centenium was opened but without activating a campaign. Surprisingly, FreeSpeech supported the menus although they were in English! This was very strange since the Swedish version of FreeSpeech was used. The surprise was even bigger when it was found out that the graphical user interface was written in Visual Basic, which FreeSpeech is not supposed to support! There was one exception though, for some reason the “Record” menu did not work. For the adjustable part of Centenium it was necessary to create a few commands, which were named after the button label so that the commands would be easy to remember. The commands are: • Record Marks the Record menu. • Argument Clicks the “Argument” button. • Kampanjbeskrivning Clicks the “Kampanjbeskrivning” button. • Bolagsjämförelse Clicks the “Bolagsjämförelse” button. • Avsluta samtal Clicks the “Avsluta samtal” button. • Telefonsvarare Clicks the “Telefonsvarare” button. With the menus and the macros you should be able to control Centenium properly, but it has not been tested. If a person with physical disabilities should use Centenium, you would have to make a mechanical solution so that he/she could press the button on the telephone, to indicate that he/she is ready to take calls. Since the system will not be used in practice this was not done. However, there is one question mark left. When a sales agent book a callback, a dialogue box is displayed, see figure 4.4. Can this be handled with FreeSpeech? Unfortunately there is no answer to the question since it is impossible to test. In order to fulfil the requirements of the project a telephone solution had to be thought of. The main concern is how to switch between talking to the customer and to the computer. You do not want the customer to hear the commands to the computer since the customer probably would find it confusing. Neither can the computer listen to the customer conversation since it would probably misinterpret it as commands. Switching too many times during a conversation would also be time consuming and


43

rude to the customer and must therefore be avoided as much as possible. This was the main reason why it was decided to limit the project to outgoing calls. Outgoing calls need less switching as the social security number is displayed in FRONTER when using Centenium. If you would use FreeSpeech for incoming calls you would have to cut of the customer at the very beginning of the call to enter the social security or insurance number and it would be inappropriate. A possible solution to the problem is described in chapter 5. As it is necessary to switch between customer and computer, how can it be done? In Centenium there is a mute button that could be used to cut off the customer. However, the corresponding function in FreeSpeech, that is the Sleeping Mode, does not work satisfactorily. If it did, the headset would need to be adjusted so that it works with both the computer and the telephone. A better solution would probably be to use a manual switch. In this case the lines would be completely separated and you would avoid the problem with the Sleeping Mode. The switch, of course, would have to be adjusted to the ability of the user. The telephone solution was never implemented because the FreeSpeech solution of FRONTER was too slow to use in practice. It was no use spending time and money on a solution that would never be used. This led to big changes in the requirement specification and the final version is found in Appendix B. No usability testing of the Centenium solution was done since there is no test version of the program.

4.4 Evaluation and conclusions The measurements in section 4.3 shows that the voice-controlled solution using FreeSpeech is much slower than the ordinary, hand-controlled one. There are also big differences within the same series of measurements. This is due to the fact that FreeSpeech sometimes were busy doing other things and did not listen. Another reason is that FreeSpeech did not always understand what you said and therefore the command had to be repeated. Out of one hundred commands given in command mode there were three misunderstandings, that is the error rate is approximately 3%. In dictation mode the error rate is significantly higher, but since there is no need to use dictation mode in this solution no measurements were made on that. The attempt to speed up the opening of customer information was a success, see table 4.2. There are quite big differences within the measurements and this is due to that different social security numbers were tried. Some macros are faster recorded than others are and therefore some social security numbers are faster to enter. However the average time needed is better for macros than ordinary dictation. To use the macros you have to be in command mode, which is better to use than dictation mode. This is because the error rate of the command mode is much better than the


44

error rate of the dictation mode. In addition to this the worst-case scenario, that is the case when FreeSpeech misinterprets you, is much worse in dictation mode since it is much harder to correct mistakes there. The conclusion is therefore that it is better to use macros than dictation to display customer information in FRONTER. The slow solution will of course lead to less productivity. By just looking at the numbers in table 4.3 you can see that it takes more than three times longer to do an estimate with your voice than with your hands. This gives a relative productivity level at approximately 30%. The numbers in tables 4.1 and 4.4 are not relevant when looking at productivity; neither are the numbers table 4.2. This is due to the fact that you only have to open the programs once and the customer information is displayed automatically for outgoing calls. Searching Bojnet, tables 4.5 and 4.6, is not something that is done very often. Even if it was, it would not affect relative productivity since this also takes approximately three times longer to do with your voice. The only thing that will reduce the relative productivity level further is when you have to switch between customer and computer, the more switching, the less productivity. Switching between customer and computer not only affects productivity but also the customer relation. To cut off the customer too often would probably be considered rude and annoying. To avoid this misunderstanding you could explain to the customer why you have to cut him/her off. This has several drawbacks though. It would be very difficult to explain to persons that are not at all familiar with computers. There is also a risk that the customer thinks that the call will take very long time, which is correct to some extent, and therefore ends the call. Another possibility is that the customer gets very interested and wants to know more about speech recognition and unfortunately there is no time to discuss it. However, the most important aspect is the moral and emotional situation for the sales agent. It would not be that fun explaining for total strangers why you voice-control your computer. Finally, switching would also prevent the sales agent from keeping the conversation going. It is very important for a sales agent to lead and control the conversation in order to get the customer to buy insurance. As mentioned earlier in this section relative productivity would only be about 30%. This may not sound all that good, at least not for Trygg Hansa. However, for people with physical disabilities who have difficulties finding employment this could mean very much. The social aspect of having a work to go to and the self-esteem it gives you cannot be measured in money. The relative productivity will of course increase if better programs are developed in the future. Many problems occurred during this project. Since FRONTER crashed every now and then a lot of time was spent on waiting for it to be fixed. The working environment at Trygg Hansa was also a problem. As the sales agents work in open office landscapes


45

the background noise sometimes was too loud for FreeSpeech to filter. There was also a problem having two offices since a lot of exporting/importing users had to be done in FreeSpeech. As mentioned in section 4.1.4 exporting and importing users sometimes worked and sometimes did not. All the problems above can be solved. A better working environment is easily achieved by placing the workstation in a separate room and thereby reducing the background noise. In a real solution the exporting/importing problem would not exist since the same workstation would be used all the time. A new version of FRONTER is under construction and hopefully this will be more stable. One can also assume that there will come new and improved versions of FreeSpeech in the future. If this solution was to be used in reality it would be necessary to evaluate it further and a lot of usability tests would have to be made. Finding a good test user would be rather difficult. If the person never has used FRONTER before the usability test would be more of FRONTER than of the voice controlling. On the other hand, if a sales agent at Trygg Hansa was to be test user the disability part would be lost since there are no persons with physical disabilities working at Trygg Hansa at the moment. It is essential that the test user is unable to use his/her arms since a person with physical disabilities has other frames of reference. To sum up the project, the solution would work in theory but not in practice, at least not for now. A person with physical disabilities would not be able to manage the whole workstation on his/her own as FRONTER, Centenium and FreeSpeech are unstable. Without the ability to use the arms, thus without a mouse, it would be impossible to handle program crashes. With a faster FreeSpeech and better Trygg Hansa programs the solution would probably work in practice as well. Even though this solution is slow it might still be useworthy, since having a job is important to most people. The idea that you could create job opportunities for persons with physical disabilities at callcentres is still good. The only thing missing is technology good enough.


46

5. Voice-controlled telephone dialogue The idea to this project arose when considering all the switching that had to be done for incoming calls in the Trygg Hansa application, that is when reading in the insurance number in FRONTER. Is it possible to make a solution where the insurance number appears automatically in FRONTER when a customer is calling in? The aim of this project is to show that it is possible, by building a prototype using Philips SpeechMania 99. This chapter begins with a description of SpeechMania and its possibilities. It is followed by a description of how the work was carried out and an evaluation of the project as well as some conclusions.

5.1 Philips SpeechMania 99 Philips SpeechMania 99 is made for developing, implementing and operating automatic real-time inquiry and transaction systems. It is especially suited for systems with speech recognition and natural language dialogue over the telephone. It can be used to many different applications such as timetable information, catalogue ordering and bank systems, all over the telephone. SpeechMania 99 is a developing tool that allows you to make your own applications by writing program code. It is also possible to communicate with other self-developed programs or databases. The speech recognition of the software can be adjusted to the users of your new system so that it can understand different accents, more about all this later in this chapter. All the facts in this chapter are gathered from Philips SpeechMania instruction binders [1], [22].

5.1.1 The SpeechMania package The SpeechMania package consists of five programs: • License Manager Used to handle the licenses provided by Philips. It

monitors the computer resources and the allowed number of run-time licences. This also includes the monitoring of how many applications that are running online at the same time.

• Recording Station The Recording Station is used to record the output phrases of the application.

• Lexicon Manager In the Lexicon Manager you decide what language to use. It is also used to handle wordlists with the users expected input. Different pronunciations of the users words can be declared by writing phonetics.

• TrueDialog This is the program that runs the application online.


47

• Transcription Station Used to train the application. The sound files from the online recordings can be replayed and then it is possible to decide if the system interpreted the answers correctly. Gathering information this way improves the system’s performance.

5.1.2 Developing SpeechMania applications To easier comprehend how the SpeechMania online system works, a figure of the system architecture is shown in figure 5.1. The different components have been discussed in chapter 2, except for Dialogue Control. As the name indicates it is there the dialogue is formed and controlled. Dialogue Control and Speech Understanding are controlled by an HDDL program, which is explained in section 5.1.2.1.

Figure 5.1 SpeechMania online system architecture.

The SpeechMania developing process consists of many steps as shown in figure 5.2. Not all of these steps have to be done to get a working solution. The grey colour in the picture shows the parts that are needed to be able to run the application. The other parts are optional to train the system for better performance, but they are recommended. The most important parts in figure 5.2 and other important features are explained in this section.


48

Figure 5.2 The development process flow

5.1.2.1 HDDL The first thing that must be done is to write an HDDL program. It is the base of all applications and it is here the dialogue structure is shaped. HDDL is a programming language that is developed for SpeechMania. There is no tool for code generation and therefore you have to write all the code by hand in some kind of editor. The code must be saved in an HDDL program file, an hdl-file (.hdl). The program must have a Main Dialog, which has several sections: • SETTINGS This section handles all the settings that influence the

speech understanding. For example you define for how long the caller can be silent without the system interpreting it as being finished.

• RULES The rules section is needed for the speech understanding process. All possible user input must be stated here and also how it should be interpreted by the system.

• VARIABLES All variables used in the HDDL program must be declared here, both the variables in the rules section and the global variables.


49

• EVENT HANDLER This is where default actions for errors are handled. If no specified error actions are stated in the Actions section (see below), these actions will be performed. Some typical events are when nothing is recorded, when nothing is understood or when there are recurring problems.

• TRANSACTIONS In this section you declare objects of other classes that the HDDL program is communicating with.

• ACTIONS This is the heart of the HDDL program. In the Actions section the dialogue structure is defined. It states what the system output should be and how the system should respond to different inputs. This also includes error handling, for example when nothing is recorded and so on. The system response can be in the form of output or actions performed by other classes, via transactions.

The Main Dialog can also contain other sections but these will not be discussed here since they are seldom used. Besides the Main Dialog there are optional parts that can be added to the program. If these parts are included they need to be declared in the beginning of the hdl-file. The optional parts are: • CLASS Class declarations of classes that the HDDL program is

communicating with. All class operations with attributes must be declared. A class object needs to be declared in the Transactions section.

• STRUCT Struct is used when creating new types. A new type consists of two or more of the HDDL standard data types; INTEGER, BOOLEAN, FLOAT, STRING and DATE.

• CONVERSIONS The Conversions section is used for transforming internal parameters to speech output. For example, if the parameter month_2 should be output as “February”, this section is needed.

5.1.2.2 DCOFFLIN To check the HDDL program you use an offline tool called DCOFFLIN. This tool performs a syntax check and generates an output phrase list, which later on is used in the Recording Station, see section 5.1.2.4. If the HDDL program code passes the syntax check, DCOFFLIN runs the program and asks the application questions to the user. The user responds by writing answers. The whole dialog can be tested this way


50

and if the HDDL program works in DCOFFLIN it will also work online, via telephone. To be able to run DCOFFLIN two other files are needed, besides the hdl-file. These files are an HDDL application program description file (.apd) and a parameter file (.prm). In the application file all files that belong to an application are listed. For example, more than one hdl-file can be used in the same application. The parameter file contains information of source directories for the HDDL files and the phrase lists.

5.1.2.3 HDDLPARS HDDLPARS is a tool which function is very similar to DCOFFLIN. The difference is that HDDLPARS generates an input word list but does not simulate the dialogue. The word list is later on used in the Lexicon Manager, see section 5.1.2.5.

5.1.2.4 Recording Station As mentioned before, DCOFFLIN or HDDLPARS generates an output phrase list that is used in the recording station. When the phrases are recorded they are stored in a phrase database. The database defines the relationship between the written phrases and their sound files. The recorded phrases are post-processed which is done for two reasons. One reason is to normalize the output volume of the phrases and the other is to reduce the silence in the beginning and at the end of the phrase.

5.1.2.5 Lexicon Manager Each application needs to have a defined background lexicon, a language lexicon that includes most words of the language. As SpeechMania is speaker independent the application has to be able to understand different pronunciations of the same word. To achieve this the input word list, which was extracted by HDDLPARS, is modified in the Lexicon Manager. All possible pronunciations of a word in the list must have a corresponding phonetic description. This has to be made by the developer. This results in a Pronunciation Lexicon that is used by the SpeechMania Online System, that is TrueDialog. For specific applications it is possible to create specific lexicons. These lexicons can for example contain technical terms.

5.1.2.6 Transactions As mentioned earlier in this chapter it is possible for the HDDL program to communicate with other external classes. For an information system it might be useful to connect the customers to manual service. This is not possible to do in the HDDL program and an external class has to be written for this purpose. The external classes are declared in the HDDL program and the class objects that are declared in the Transactions section can call the class operations. The external classes can be written in C, C++, COBOL, FORTRAN or BASIC. The transaction interface between an external class and a HDDL program is based on DLLs (Dynamic Link Library).


51

5.1.2.7 Running the application online To be able to run the application online six files are needed: • An HDDL application program description file (.apd). All the hdl-files are listed in

this file. • One or more HDDL program files (.hdl). Explained in section 5.1.2.1. • Customer parameter files (.prm). In addition to the prm-file mentioned in section

5.1.2.2 a file named “spmania.prm” is needed. This file contains information about what application TrueDialog should run and what language, acoustic model and lexicons that should be used. This file connects the different parts of the application.

• An acoustic phonetic model (.amo). The acoustic model was discussed in section 2.1.2.

• A language model (.lm). This was also discussed in section 2.1.3. • A runtime lexicon (.lex). A file generated by the lexicon manager. Now it is possible to start TrueDialog and the program will run automatically. The program opens telephone channels, which makes it possible to call up the application and use the system.

5.1.2.8 Transcription Station The Transcription Station cannot be used until the application has been run in TrueDialog. In the Transcription Station you can listen to what a caller said and compare it to what the system recognised. This way it is possible to tell the system how the utterance should be interpreted. All the application dialogues should be transcribed and corrected this way to improve system performance.

5.1.2.9 Corpus Manager The input to the tool Corpus Manager is the transcribed dialogs from the Transcription Station. The Corpus Manager is responsible for creating the corpus, which is a list of recognised words from the transcribed dialogs. The corpus is used to train the system with the two tools LMTRAIN and SUTRAIN, see section 5.1.2.10.

5.1.2.10 Training tools To improve the speech recognition accuracy it is possible to create a new language model for your application. This is done with help of the tool LMTRAIN and the new language model replaces Philips’ standard model. The tool SUTRAIN is used to determine grammar probabilities automatically. SUTRAIN generates a training file that should be included in the application’s apd-file to further improve the performance of the online application.


52

5.1.2.11 Work process in summary To make it easier to comprehend the SpeechMania working method, table 5.1 shows the development steps. As mentioned before not all of these steps have to be done to get a working solution.

Step Activity to be performed 1 Perform detailed analysis of the requirements. 2 Analyse the type and structure of the caller dialog (inputs). 3 Establish the SpeechMania online system boundaries.

4 Develop, test and document an HDDL program in combination with the HDDLPARS and DCOFFLIN tools.

5 Extract the response phrases from the HDDL program using the HDDLPARS, for recording with the Recording Station.

6 Extract the word list from the HDDL program.

7 Embed and integrate a SpeechMania online system by customising the transaction interfaces using DLLs.

8 Run the online prototype.

9 Transcribe the dialogs collected with the prototype system using the Transcription Station.

10 Refine the system by improving the HDDL program. Training the language model and the speech understanding with LMTRAIN and SUTRAIN tools.

11 Run the optimised system, reiterate from the transcribe step (Step 9) as necessary.

Table 5.1 Steps involved in application development.

5.2 Practical approach Unlike the project at Trygg Hansa, there were no existing products to consider in this project. This meant that the product could be developed from scratch. During the design process the usability aspects from chapter 3 were considered. When developing the solution specification, see Appendix C, there were some things to consider: What tasks should the system perform? Who would the users be? After some analysing it was decided that the tasks that the system should perform were to ask for insurance number or social security number and then transfer the call to a telephone. At the same time the number should be sent to the corresponding computer and be displayed on the screen. The users are ordinary people calling to an insurance company asking for information about their insurance. This makes it easier to understand the users and predict their difficulties.


53

The next step was to design the dialogue. As Softhouse has similar products, the dialogues in these were studied in order to get a better design, compare with section 3.2.1.2. In most telephone applications there are parts of the dialogue dealing with problems such as when the caller is silent or when the caller needs help. Experiences from previous products were used to form these parts of the dialogue since they have proved to work well. A scheme of the dialogue flow is shown in Appendix D. Since the dialogue is the user interface, the design rules in section 3.4 were considered when designing. Some of the rules were easy to follow. For example, consistency was achieved by making it possible to say “Hjälp” after each question asked. Other rules were more difficult to follow, for example are there no shortcuts for frequent users. On the other hand, this product will probably not have that many frequent users. After this it was time to create the application with SpeechMania. This meant that a lot of studying in the user manuals and the developers guide had to be done. The HDDL program was first developed as a stand-alone program without the call transfer and without forwarding the information. DCOFFLIN was used to check the syntax and to simulate the dialogue. Then a phrase database with recorded phrases and a word list was generated and the program was now ready to run in TrueDialog. After running the application for some time, enough data had been recorded to use the Transcription Station. This enhanced the performance somewhat. The information given by the caller to the HDDL program cannot be sent directly from the HDDL program to the person taking the call – a transaction had to be made, compare with section 5.1.2.6. An external class was written in C++ to handle the information transfer and the transaction DLL was generated when compiling the C++ code. Since this is a prototype and only the principle of sending information was important to show, an easy solution was developed. A database, in which the C++ class could write the caller information, was created. Then a simple graphical user interface that could read from the database was made, see figure 5.3. The interface displays the information from the database and updates every second. Every time a new caller uses the application, the database is cleared. The interface was written in Visual Basic since it is very easy to use. Since Visual Basic and MS Access works well together, the database was written in MS Access. If this had not been a prototype, this solution would not have worked since it would not be able to serve several insurance agents. Instead a more advanced solution using for example DCOM (Distributed Component Object Model) communication or socket communication would have to be created.

Figure 5.3 The graphical user interface of the prototype.


54

To transfer the call from the SpeechMania application to the fictive insurance agent, an existing transfer solution was modified. A similar solution exists from an earlier Softhouse project and parts of it were re-used for this project. To show that it is possible to transfer calls, the call was transferred to a telephone at Softhouse. Three test users, who had no previous knowledge of the application, tested the product. None of the test users noticed that the social security number should be stated with twelve figures. The two first figures that indicate the century were always missed. This led to some design changes in the dialogue for easier understanding. This iterative design improved the understanding, which was shown when three new test users tried the application.

5.3 Evaluation and conclusions The application is working but needs more training to reach an acceptable level of understanding. One of the problems is that the caller input almost only consists of figures. Figures are short words that consist of few phonemes and are therefore more difficult for the speech recogniser to distinguish. The longer the words, the easier they are to understand. This problem is not specific for this application but appears in all phoneme-based applications. Some improvements can be achieved by training. Another problem is the dialect in the southern part of Sweden, which was spoken by many of the test users. The acoustic model is developed for Standard Swedish and the dialect in the southern part of Sweden differs very much from it. Adding many different phonetic descriptions of the words in the word list will improve the understanding, but it is hard to cover all the different pronunciations. Neither the information transfer nor the call transfer solutions are, as mentioned earlier, complex enough to be used in a real solution. A better solution could be created with more time and effort. If an application like this was used at an insurance company, together with a solution like the one in chapter 4, it would be possible for a person with physical disabilities to work with incoming calls. If a product like this one would be used at Trygg Hansa it would no longer be necessary to dictate the insurance number or the social security number into FRONTER. As mentioned in section 4.3.3 this is a big problem since you would have to switch between the customer and the computer in the very beginning of the call. With this solution the switching would be reduced and time would be saved. Another modification that has to be made is that the calls have to be separated into those who give their insurance or social security number and those who do not. This is so that insurance agents with physical disabilities only get calls from people who did give their number.


55

Time and money would also be saved with a product like this. The time spent speaking to the insurance agent would decrease since he/she would not have to ask for a number. This means that if you keep the same number of insurance agents, more calls could be handled and this would decrease the waiting time for the customers. To summarise this project; it is a good complement to the Trygg Hansa project described in chapter 4, but not very useful without it, at least if the aim is to create new job opportunities for people with physical disabilities. The prototype is working and clearly shows the possibilities with a solution like this and thus, the aim is reached.


56

6. Summary and conclusions Creating job opportunities for persons with physical disabilities at callcentres is still a good idea, but unfortunately this master thesis did not succeed in creating a complete working prototype that can be tested by real persons. The speech recognition application developed at Trygg Hansa is too slow to be used in practice. This is partly because some commands take longer time to say than doing by using the mouse. The main reason though is that each command has to be interpreted by FreeSpeech and this takes time. If FreeSpeech misinterprets a command it would be an even more time consuming operation, however this is not a big problem as the error rate is only about 3%. Since the solution is slow to work with, it would lead to less productivity than using the traditional method, the relative productivity is approximately 30%. In the eyes of an employer this is not a good value, but for people with physical disabilities who have difficulties finding employment a job like this could be worth a lot. The problem to put the customer on hold when talking to the computer, or the other way around, has not been solved in this project. This is because the sleeping mode in FreeSpeech is not good enough yet. As it is today, the only well functioning way to tell FreeSpeech to stop listening is by shutting of the microphone. The only way to turn it on again is by using the mouse. This problem can hopefully be solved with an improved version of FreeSpeech. To make it easier to use the FreeSpeech solution for incoming calls a telephone dialogue prototype was developed. The time consuming operation to read in the customer’s insurance or social security number would be avoided this way and it would save the insurance agent a lot of time and switching. The prototype developed is working but not satisfactorily. The biggest problem is that the caller input almost only consists of figures, which speech recognition engines have problems with since the figures consists of few phonemes. The fewer phonemes, the harder the word is to recognise. Extensive training of the application, which is not done on the prototype, will lead to some improvements in this area. Both the information transfer and the call transfer solutions are very simplified compared to what would be needed in a fully integrated real solution. The prototype is good enough to show that it is possible to send customer information and transfer the call to an insurance agent. To sum up the projects, the telephone dialogue prototype is working rather well, but it is not very useful unless the FreeSpeech solution is improved. At least this is true when it comes to finding jobs for people with physical disabilities.


57

7. Future possibilities In the FreeSpeech solution at Trygg Hansa there is a need for many improvements before the solution will work satisfactorily. All the programs used, FRONTER, Centenium and FreeSpeech needs to be more reliable so that they do not crash. FreeSpeech also needs to be faster when interpreting commands from the user and a working sleeping mode would almost solve the switching problem. These problems however, cannot be solved by the authors of this master thesis. A possible future improvement of the telephone dialogue is to use voice barge in in the dialogue. Voice barge in makes it possible for the user to interrupt the system by answering the question at the same time as it is asked. This would make it faster for frequent users to use the system since they do not have to listen to the whole prompts if they already know how to use the system. If these two projects were to be used together, a new information and telephone transfer solution would have to be developed. If the incoming calls to the office come through a telephone exchange neither the database solution nor the call transfer would work. Instead a more complex solution would have to be developed using for example DCOM or sockets.


58

8. Wordlist Binder Pärm Callcentre En arbetsplats där allt arbete sker över telefon.

Används till exempel för försäljning, support och kundservice.

Cerebral Palsy Cerebral Pares, CP Cerebral haemorrhage Hjärnblödning Cerebral infarction Hjärninfarkt Congenital Medfödd Corpus En samling information eller data som ska studeras DCOM En samling Microsoft koncept och program gränssnitt i

vilka klientobjekt kan be serverobjekt på andra datorer i ett nätverk om tjänster

Deficiency Allvarlig brist Digitise Digitalisera, översätta analogt till digitalt Heuristic Heuristik, metod syftande till att låta någon vinna

kunskap steg för steg, genom egen tankeverksamhet Knob Runt ratt eller knapp, exempelvis volymkontrollen på en

stereo Mnemonics Minnesregel Physiotherapist Sjukgymnast Sedentary Stillasittande Socket Kombinationen av en stations IP nummer och ett

portnummer utgör en socket. Används vid dataöverföring

Spinal cord Ryggmärgs nerven Spinal marrow Ryggmärg Spinal marrow hernia Ryggmärgsbråck Unambiguous Otvetydig Usability Användbarhet Usefulness Nytta Userfriendliness Avändarvänlighet Useworthiness Användvärdhet Utility Funktionalitet Utterance Någonting man säger, till exempel ett ord eller en

mening Vertebra Ryggkota


59

9. References [1] “SpeechMania 99 True Dialog Application Creation Environment Developer’s Guide”, Philips Speech Processing, Aachen, 1999 [2] http://www.speech.philips.com/ud/get/Pages/t_131_u.htm, 4 August 2000 [3] http://www.speech.philips.com/ud/get/Pages/t_1311_u.htm, 4 August 2000 [4] http://www.speech.philips.com/ud/get/Pages/t_1312_u.htm, 4 August 2000 [5] http://www.speech.philips.com/ud/get/Pages/t_1321_u.htm, 4 August 2000 [6] http://www.speech.philips.com/ud/get/Pages/t_1322_u.htm, 4 August 2000 [7] http://www.speech.philips.com/ud/get/Pages/t_13121_u.htm, 4 August 2000 [8] http://www.speech.philips.com/ud/get/Pages/t_13122_u.htm, 4 August 2000 [9] http://www.speech.philips.com/ud/get/Pages/t_13123_u.htm, 4 August 2000 [10] G. Knall and Certec, ”Medicinsk terminologi”, http://www.certec.lth.se/lectures/peter/f1/f1_text.html, 4 August 2000 [11] http://www.nhr.se, 4 August 2000 [12] http://www.nhr.se/diagnos/ryggmargsskador.html, 4 August 2000 [13] http://www.nhr.se/diagnos/cp.html, 4 August 2000 [14] http://www.nhr.se/diagnos/ms.html, 4 August 2000 [15] http://www.nhr.se/diagnos/stroke.html, 4 August 2000 [16] J. Nielsen, “Usability engineering”, Academic Press, 1993 [17] J. Löwgren, “Human-computer interaction”, Studentlitteratur, 1993 [18] H. Eftring, “The Useworthiness of Robots for People with Physical Disabilities“, Doctoral Dissertation, Certec, Lund Institute of Technology, number 1:1999 [19] B. Shneiderman, “Designing the user interface”, Third Edition, Addison-Wesley Longman Inc., 1998


60

[20] K. Rassmus-Gröhn and H. Danielsson, “Introduktion till MMI eller Hur du gör teknik lättanvänd“, http://www.certec.lth.se/lectures/kirre/mmi/, 4 August 2000 [21] “FreeSpeech 2000 English UK Manual”, Philips Speech Processing, Aachen, 1999 [22] “HDDL Dialog Description Language Version 2.2 User’s Guide”, Philips Speech Processing, Aachen, 1999 [23] Longman Dictionary of contemporary English, Third Edition, Longman Group Ltd, 1995 [24] Stora svensk-engelska ordboken, Nordstedts Förlag, 1988


61

Appendix A

Kravspecifikation för examensarbete

”Speech recognition – possibility and usability

for people with disabilities”

Version 1.0

2000-04-28

Anna Johnsson Sara Garmark

Softhouse

Certec, Institutionen för Designvetenskaper, Lunds Tekniska Högskola


62

Innehållsförteckning 1 INLEDNING …………………………………………………………….63

2 FÖRKLARINGAR ……………………………………………………...64

2.1 UML, DEFINITIONER ………………………………………………….64 2.2 ÖVRIGA DEFINITIONER ………………………………………………..65

3 ANVÄNDNING AV SYSTEMET ……………………………………...66

3.1 BESKRIVNING AV ANVÄNDARE OCH SYSTEM ………………………….67 3.2 FÖRSÄKRINGSTJÄNSTEMAN …………………………………………..68

3.2.1 Svara i telefonen ……………………………………………….. 68 3.2.2 ”Trycka” på sekretessknapp ……………………………………68 3.2.3 Lagra försäkringsspecifik information ………………………….68 3.2.4 Söka information ……………………………………………..…68 3.2.5 Koppla in/ur från telefonslingan ………………………………..68 3.2.6 Avsluta samtal…………………………………………………...68


63

1. Inledning Att hitta arbete åt personer med begränsad rörelseförmåga i armarna är inte lätt. Exempelvis finns det många trafikskadade som bryter nacken och därmed inte kan gå tillbaka till sina tidigare arbetsuppgifter. En idé som kommit upp för att ge dessa människor en ny chans på arbetsmarknaden är att konstruera en talstyrd arbetsplats. Man kan tänka sig en applikation där en funktionshindrad person arbetar som i ett datorstyrt callcentre genom att enbart använda rösten. Detta dokument beskriver vilka krav som ställs på en talstyrd arbetsplats för en försäkringstjänsteman på Trygg Hansa. Tjänstemannen ska kunna ringa ut på kampanjer och sälja olycksfallsförsäkringar. Allt som en tjänsteman gör vid sin arbetsplats med hjälp av dator och telefon ska kunna göras talstyrt. Arbetsplatserna på Trygg Hansa är försedda med en PC som har Windows NT installerat och en telefon med tillhörande headset. PCn är uppkopplad mot stordatorn som är ett AS/400-system och på den lagras all information. Telefonen är kopplad till en predictive dialling-slinga som hela tiden förser försäkringstjänstemannen med nya samtal. När försäkringstjänstemannen får kontakt med en kund kommer information om kunden automatiskt upp på skärmen. Informationen presenteras i FRONTER som är Trygg Hansas användargränssnitt för försäkringsförsäljning. Det är även i FRONTER som ny information skrivs in för att sedan lagras i stordatorn.


64

2. Förklaringar I dokumentet används en notation som heter UML. Dokumentet inleds med en kort definition av de delar av UML som används i dokumentet. 2.1 UML definitioner Detta avsnitt förklarar UML-notationen som används i dokumentet.

Figur 1. UML-notation

Aktör

Användningsfall (Use Case)

En aktör är en människa eller en dator som använder systemet eller som systemet använder. Aktören är ej en del av systemet som skall utvecklas.

Användningsfallet beskriver en aktörs användning av systemet. Användningsfallet ger alltid något användbart tillbaka till användaren av användningsfallet.

Användningsfall (Use Case)Aktör

En aktör som använder ett användningsfall i systemet. Pilens riktning anger vem som startar användningen.


65

2.2 Övriga definitioner Bojnet Trygg Hansas Intranet. Här finns information om alla

försäkringsvillkor. FRONTER Trygg Hansas användargränsnitt för försäkrings-

försäljning. Innehåller alla kunduppgifter och beräkningslogik för offert räkning. FRONTER baseras på informationen i stordatorn.

Predictive dialling Program som används för uppringning av

kampanjkunder. Programmet ringer hela tiden upp nya kunder så att det finns en ny ”kund i luren” för försäljaren när han/hon är färdig med sin förra kund.

UML Unified Modelling Language


66

3 Användning av systemet Detta avsnitt beskriver hur försäkringstjänstemannen interagerar med systemet. De personer som blir uppringda är inte användare av systemet och finns således inte med i UML-beskrivningarna. Försäkringstjänstemannen kan använda rösten men även tangentbord och mus för att styra applikationen.

Figur 2. Användning av systemet.

Lagra försäkringsspecifik information

Stordator - AS/400

Bojnet - Internet Explorer

Svara

"Trycka" på sekretessknapp

Koppla in/ur från telefonslingan

Avsluta samtal

Söka informationFörsäkrings- tjänsteman


67

3.1 Beskrivning av användare och system Följande aktörer interagerar med systemet:

Figur 3. Användare och system

Försäkrings- tjänsteman

Försäkringstjänsteman som använder systemet för att ringa ut till kunder och sälja olycksfallsförsäkringar. Han/hon använder också systemet för att lagra uppgifter, räkna offerter och söka information.

Stordatorn - AS/400

Applikation som innehåller och förser FRONTER med all information om kunderna.

Bojnet - Internet Explorer

Försäkringstjänstemannen kan söka information på Bojnet med hjälp av systemet.


68

3.2 Försäkringstjänsteman Här beskrivs vad försäkringstjänstemannen ska kunna utföra med hjälp av systemet. 3.2.1 Svara i telefonen Försäkringstjänstemannen ska kunna svara i telefonen om det ringer. Om detta kommer att ske via talstyrning eller på något annat sätt, t.ex blåsmunstycke beror på Trygg Hansas befintliga mjukvara. 3.2.2 ”Trycka” på sekretessknapp Då försäkringstjänstemannen ska kommunicera med datorn måste han kunna koppla bort kunden med hjälp av en så kallad sekretessknapp. Hur detta kommer att ske beror även det på Trygg Hansas befintliga mjukvara. 3.2.3 Lagra försäkringsspecifik information Beräkning av offerter, ändringar och registrering av försäkringar sker via FRONTER som i sin tur lagrar ner informationen i stordatorn. All kommunikation med FRONTER ska ske talstyrt. 3.2.4 Söka information För att söka allmän information om försäkringar och Trygg Hansa används Trygg Hansas Intranet, Bojnet. Informationssökningen ska ske talstyrt. 3.2.5 Koppla in/ur från telefonslingan Försäkringstjänstemannen ska kunna koppla in och ur sig på predictive dialling-slingan. Hur detta kommer att ske beror på Trygg Hansas befintliga system. 3.2.6 Avsluta samtal Efter avslutat samtal måste luren läggas på. Huruvida detta kommer att ske röststyrt eller ej beror även detta på Trygg Hansas befintliga system.


69

Appendix B

Requirement Specification for part I of Master Thesis

“Speech recognition – possibility and usability


Version 2.0


2000 06 07

Softhouse

Certec, Department of Design Sciences, Lund Institute of Technology


70

Table of contents 1. INTRODUCTION ...................................................................................71

2. EXPLANATIONS ....................................................................................72

2.1 UML DEFINITIONS ................................................................................72 2.2 OTHER DEFINITIONS ..............................................................................73

3. THE SYSTEM AND ITS USERS ..........................................................74

3.1 DESCRIPTION OF THE ACTORS ...............................................................75 3.2 SYSTEM REQUIREMENTS .......................................................................75

3.2.1 Open FRONTER ...........................................................................75 3.2.2 Get customer in FRONTER ..........................................................75 3.2.3 Get customer details in FRONTER ...............................................75 3.2.4 Make an insurance estimate .........................................................75 3.2.5 Clear customer .............................................................................76 3.2.6 Close FRONTER .......................................................................... 76 3.2.7 Open Bojnet ..................................................................................76 3.2.8 Search Bojnet for insurance information .....................................76 3.2.9 Close Bojnet ..................................................................................76 3.2.10 Use Centenium ............................................................................76


71

1. Introduction Finding jobs for people who are paralysed from the neck down is not easy. This project is based on an idea that speech recognition could be used to bring these people back to work. The idea is to make an application where a person without ability to move his/her arms work in a computer controlled callcentre only by using the voice. This document describes the requirements for a voice-controlled workstation at Trygg Hansa. A sales agent must be able to call up campaign customers and to sell them accident insurance. Everything a sales agent uses the computer to must be voice-controlled. The workstations at Trygg Hansa are equipped with a PC, which has Microsoft Windows NT installed and a telephone with a headset. To sell insurance, the sales agent works with a programme called FRONTER. This is a new graphical interface that is based on the application former used at Trygg Hansa, the Main Computer. The Main Computer contains all customer information and the technique to make insurance estimates. The telephone at the workstation is connected to a programme called Centenium, which dials up customers. When Centenium calls up a new customer the customer information is automatically displayed in FRONTER. If a sales agent needs to find product information he/she can surf Bojnet, the Intranet at Trygg Hansa. All programmes mentioned above will be voice-controlled with Philips FreeSpeech 2000.


72

2. Explanations In this document a notation called UML is used. The document begins with a definition of the parts of UML used in this document. 2.1 UML definitions This section explains the UML notation used in this document.

Figure 2.1 UML notation

Actor

An actor is a human or a computer who uses the system, or as the system uses. The actor is not a part of the system to be developed.

Use Case

The Use Case describes an actor's use of the system. The Use Case always gives something useful back to the user.

ActorUse Case

An actor that uses a Use Case in the system. The direction of the arrow indicates who starts the using.


73

2.2 Other definitions Bojnet The Intranet at Trygg Hansa. Here you can find

company and insurance related information. Centenium The predictive dialling program used at Trygg Hansa. FRONTER The graphical user interface for insurance handling at

Trygg Hansa. FRONTER uses information from the Main Computer.

Predictive dialling A technique used in computer programmes for calling

new customers. The programme continuously calls up new customers and there is always a customer on the line for the sales agent. As soon as he/she is finished with one customer a new one is on the line. This technique can only be used at big callcentres with many sales agents and many customers.

UML Unified Modelling Language


74

3. The system and its users This section describes how the sales agent interacts with the system, that is FRONTER, Centenium and Bojnet. The sales agent should be able to control the whole system with his/her voice with use of Philips FreeSpeech 2000. A customer who gets a call from a sales agent is not a user of the system and is therefore not in the UML-description of the system requirements in figure 3.1.

Figure 3.1 UML-description of the system requirements

Open FRONTER

Close FRONTER

Open Bojnet

Search Bojnet for insurance information

Close Bojnet

Clear customer

Make an insurance estimate

Get customer details in FRONTERMain Computer

Get customer in FRONTER

Sales agent

Use Centenium Telephone System


75

3.1 Description of the actors Following actors interact with the system: The former AS/400-system used for insurance handling.

It has all customer information and the technique to make insurance estimates.

A sales agent is a person at Trygg Hansa who uses the system to call up customers and sell accident insurance. He/she also uses the system to save customer information, make estimates and look for information on Bojnet.

The telephone switch connected to Centenium.

3.2 System requirements Here are the functions that the sales agent needs to do in his work. It is required that all these function can be carried out using his/her voice and Philips FreeSpeech 2000. 3.2.1 Open FRONTER The sales agent must be able to open FRONTER just by saying “Öppna FRONTER”. 3.2.2 Get customer in FRONTER By entering a social security or insurance number in FRONTER a customer and his/her household’s insurance are displayed in FRONTER. 3.2.3 Get customer details in FRONTER It must be possible to retrieve more detailed information about a customer such as his/her address/es and phone numbers. 3.2.4 Make an insurance estimate It is required that the sales agent can fill in information in FRONTER for making an insurance estimate.

Main Computer

Sales agent

Telephone System


76

3.2.5 Clear customer When the sales agent is finished with a customer he/she needs to be able to remove the customer information from the window. 3.2.6 Close FRONTER After finishing the work in FRONTER, the sales agent must be able to close FRONTER. 3.2.7 Open Bojnet The sales agent has to be able to open Bojnet just by saying “Öppna Bojnet”. 3.2.8 Search Bojnet for insurance information To search for information about insurance and Trygg Hansa, Bojnet is used. It is necessary for the sales agent to be able to surf Bojnet with his/her voice. 3.2.9 Close Bojnet When you are finished using Bojnet you need to close your Internet browser. 3.2.10 Use Centenium To use Centenium, the sales agent needs to be able to open all the menus and click all the buttons.


77

Appendix C

Solution Specification for part II of Master Thesis

“Speech recognition – possibility and usability


Version 1.0


2000 08 08

Softhouse

Certec, Department of Design Sciences, Lund Institute of Technology


78

Table of contents 1.INTRODUCTION.................................................................................…79

2. EXPLANATIONS ....................................................................................80

2.1 UML DEFINITIONS ................................................................................80 2.2 OTHER DEFINITIONS ..............................................................................81

3. THE SYSTEM AND ITS USERS ..........................................................82

3.1 DESCRIPTION OF THE ACTORS ...............................................................83 3.2 SYSTEM REQUIREMENTS .......................................................................83

3.2.1 Customer interaction .................................................................83 3.2.2 System administrator interaction ..............................................84

3.3 THE CUSTOMER DIALOGUE………………………………………….84


79

1. Introduction To speed up the service at Trygg Hansa customer service a speech recognition system could be used. When a customer calls Trygg Hansa he/she could be prompted to say the insurance number or his/her social security number. The number and thereby the customer information could then appear on the screen at the insurance agent who gets the call. This would save time since the insurance agent do not have to ask for the number and then enter it. The efficiency of the insurance agent’s work will thus increase as he/she can take more calls. This document describes a suggestion to a solution for a speech recognition system at Trygg Hansa. However, this solution is not made in co-operation with Trygg Hansa so a prototype will be developed. The speech recognition programme that will be used for the application is Philips SpeechMania.


80

2. Explanations In this document a notation called UML is used. The document begins with a definition of the parts of UML used in this document. 2.1 UML definitions This section explains the UML notation used in this document.

Use Case 2 uses results from Use Case 1.

Actor

An actor is a human or a computer who uses the system, or as the system uses. The actor is not a part of the system to be developed.

Use Case

The Use Case describes an actor's use of the system. The Use Case always gives something useful back to the user.

Actor Use Case

An actor that uses a Use Case in the system. The direction of the arrow indicates who starts the using.

Use Case 1 Use Case 2


81

2.2 Other definitions UML Unified Modelling Language. SpeechMania An automatic speech recognition programme with

language dialogue control over the telephone.


82

3. The system and its users This section describes how the customer interacts with the system. The system is a speech recognition system that allows the customer to read in his/her insurance number/social security code in order to get faster service. If the customer chooses not to read in a number he/she will be connected to an insurance agent. The insurance agent is not a user of the system and is therefore not in the UML-description of the system requirements in figure 3.1.

Customer Telephone system

System administrator

Call the speech recognition system at Trygg Hansa

Start and stop the system

Set system parameters

Read in insurance number or social security number

Present informationInsurance agent

Figure 3.1 UML-description of the system requirements.


83

3.1 Description of the actors Following actors interact with the system: A customer calls Trygg Hansa customer service.

He/she chooses to say the insurance number/social security number or to get directly connected to an insurance agent.

An administrator changes parameters and starts/stops the speech recognition system.

The telephone switch, which connects the customers to the insurance agents.

The insurance agent gets the phone call and the customer information displayed on the computer screen.

3.2 System requirements Here are the functions that the customer and the system administrator need to do with the system. 3.2.1 Customer interaction When the customer calls Trygg Hansa customer service he will be prompted by the speech recognition system to say his/her insurance number or social security number to get faster service. After having said the number the customer will be connected to an insurance agent and the number will be displayed in a window on the agents computer. If the customer chooses not to say any number he/she will be connected to an insurance agent.

Telephone System

Customer

System administrator

Insurance agent


84

3.2.2 System administrator interaction Start and stop the system The system administrator shall be able to start and stop the speech recognition programme. Set system parameters The administrator shall be able to set system parameters, for example how many mistakes that should be allowed before the customer is automatically connected to an insurance agent. 3.3 The customer dialogue All communication between the customer and the system will be verbal. An example of a dialogue is presented below. System: “Välkommen till Trygg Hansas kundservice. Vill du få snabbare service genom att uppge försäkringsnummer eller personnummer?” Customer: “Ja.” System: “Var god tala in ditt niosiffriga försäkringsnummer, siffra för siffra, eller ditt tolvsiffriga personnummer, två siffror åt gången.” Customer: “Nitton femtiotvå nollett nollett tolv trettiofyra.” System: “Stämmer det att ditt nummer är nitton femtiotvå nollett nollett tolv trettiofyra?” Customer: “Ja.” System: “Du kopplas nu till en försäkringstjänsteman.” This dialogue only shows a typical case with no mistakes from either customer or system. It should be possible to get more information at most places by saying “Hjälp” or to get information repeated by saying “Upprepa”. When the help function is activated it should be possible to say “Koppla” and then get connected to manual service. If the system has misinterpreted the customer twice, the customer will be connected to manual service.


85

Appendix D: Dialogue flow scheme


86

Speech recognition - Lunds tekniska högskolaSpeech recognition - possibility and usability for...

Documents

Transcript of Speech recognition - Lunds tekniska högskolaSpeech recognition - possibility and usability for...