Download - Group assignment › studier › emner › matnat › ifi › IN5480 › ... · Skype. One year later, more than 30,000 chatbots have been launched on Facebook Messenger” (Brandtzæg

Group assignment Final report, IN5480

How can chatbots communicate abilities and limitations

IN5480

Aleksander Erichson | [email protected]

Martin Arentsen Espeland | [email protected]

Marius Bråthen | [email protected]

18.10.2018

2

1. Description of group

Our group consists of three second year master students from the program Informatics:

Design, Use, Interaction: Aleksander Erichson, Martin Arentsen Espeland and Marius Bråthen

2. Area of interest

Our interest in interaction with AI will be focused on communication involving chatbots or

conversational agents. We are interested in commercial chatbots that provide customer service

in form of information or executing services for the users. We are interested in the

presentation of the chatbots limitations in terms of what they can and cannot help with as well

as how they present themselves when something goes wrong.

We have chosen this theme on the background of the recent development on chatbots and the

popular use of them in several settings, like commercial, social, informational and for

entertainment purposes. (Brandtzaeg and Følstad, 2017)

As a basis for finding research questions in the area of interaction with AI we tested some

Norwegian text based conversational agents, namely DNB, Nordea, Hafslund Strøm and

Kommune Kari.

3. Questions

Under we present the questions we are most interested in addressing regarding the area we

have chosen.

1. How can chatbots efficiently present its features and limitations to the user regarding

what it can and cannot do?

We have noticed that in the start of the interaction with the chatbots we tested the chatbot

started by presenting us with a text where it describes what its purpose were. Additionally, the

chatbots gave some information about its limitations regarding what it were able to answer

questions about, and sometimes specify what structure of sentences they do understand the

most. We thought this was interesting and wanted to know if this is an efficient way to present

the limitations of the agent. We wonder if a poor presentation of limitations when interacting

with a chatbot could alter or have an impact on the expectations users have to the chatbot or

not. We have seen that in cases where conversational agents fail to present its limitations and

3

are seemingly good in some areas gives the users a false expectation of what the agent can

and cannot do. (Luger and Sellen, 2016)

2. How can chatbots provide meaningful information when something goes wrong or

they cannot answer the questions the user asks?

We believe an interesting question to address is how the chatbots provides information when

something goes wrong or the chatbot is not able to answer a question. We see several

similarities in how the chatbots we tested presented itself when we gave a question it did not

understand or somehow were not able to answer.

4. Background

4.1 Conversational agents (CA) / Chatbots

In this section we will first present some of the existing research and knowledge on the

chosen topic Conversational Agents (CA) / Chatbots and then provide what we believe is

missing as well as position our work. We start with a brief introduction to CAs and chatbots.

Chatbots are on the rise. “In Spring 2016, Facebook and Microsoft provided resources for

creating chatbots to be integrated into their respective messaging platforms, Messenger and

Skype. One year later, more than 30,000 chatbots have been launched on Facebook

Messenger” (Brandtzæg and Følstad, 2017). The recent popularity and renewed interest in

chatbots can be contributed to the advancements in artificial intelligence and machine

learning. Chatbots are a potential great interface for people to interact with services that can

be automated by AI or robots.

But what exactly do we mean when we talk about chatbots? Whether you call them

conversational agents, chatbots, or something else in this nature it usually refers to

applications where a human can communicate with a machine through a natural language

interface, using natural language in some form (Dale, 2016) .The most common forms of

communicating being talking to the chatbot via speech or text often with the intent of

obtaining information, assistance with a task, entertainment or relational purposes

(Brandtzaeg & Følstad, 2017).

4

When it comes to chatbots we can typically divide them into three categories when it comes

to how they function on a technical level. You have the following:

● Generative

● Action and intent

● Scripted

In this paper we narrow in on the type of chatbot “Action and intent” type.

4.2 Related work

For us to position our work we needed to get an overview of the field and find out what kind

of research that had already been done on the topic presenting features and limitation by

chatbots in addition to our own observation while doing the initial testing of different

chatbots. Firstly, we wanted to know if it is any need to look at feedback given from the

chatbot when it comes to features and limitations. To support this, we read Luger and Sellens

study where they looked at people’s daily usage of task-driven Conversational Agents (CAs)

and the interactional factors that affect the usage. (Luger & Sellen, 2016) Here a key finding

regarding our interest were presented.

● They pointed out the problem users have in assessing the systems capabilities and

“intelligence” as well as a lack of feedback. Here with half of the participants

reporting that they did not know what the CA could do. (Luger & Sellen, 2016)

The paper lists a series of suggestions for developers when designing CAs. And we want to

highlight the following two:

1. Consider new ways of conveying CA capability through interactions

Here they suggest looking at other ways the CA can convey its capability, especially

when it struggles. They say that users often tend to blame themselves abandon

particular types of task requests. (Luger & Sellen, 2016)

2. Rethink system feedback and design goals in light of the dominant use case

Because a majority of users stops using the CA when it stops providing utility, the

authors suggest that one rethink the design goals of the CA system to deliver a more

compelling user experience and reflect the dominant use case. (Luger & Sellen, 2016)

In the article “I’m Sorry, Dave, I’m Afraid I Can’t Do That: Chatbot Perception and

Expectations” by Jennifer Zamora she explores the topic of how to better understand the role

and purpose of chatbots. This is done through a user centred perspective and aims to better

5

clarify the expectations that are built toward interacting with chatbots. In the concluding

remarks of the article one of the main issues for users is what input validity to chatbots. As a

design guideline the article suggests that to enhance the user experience and better set

expectations, future chatbots should be designed with multiple input modalities; “Providing a

secondary input channel such as displaying menu options or including voice input in addition

to text allows for better experiences that can reduce error or recovery time” (Zamora, 2017)

This brings us to our next question; what design implications are the most common among

articles? This have been tried to be answered through the article “Here’s What I Can Do:

Chatbots’ Strategies to Convey Their Features to Users” (Valério et al., 2017)

The goal for this paper were to analyze the communicative strategies that popular chatbots

uses to convey their features to the users. The chatbots selected were based on their popularity

as an indication of possible good design decisions. The bots were all related in the fact that

they all were available on the Facebook messenger platform as well as having a similar

purpose. (Valério et al., 2017) The chatbots selected were TechCrunch, CNN, and the Wall

Street Journal (WSJ). (Valério et al., 2017). After the analysis Valério et al. presented 11

strategies used by chatbots to inform their features to the users:

● S1 – Showing the main feature on the first message

● S2 – Guiding the user through a short tutorial during first messages

● S3 – Suggesting the next possible set of actions to the user.

● S4 – Having a persistent menu with main features.

● S5 – Having a main menu with main features.

● S6 – Having a list of available commands

● S7 – Offering contextual help about a feature.

● S8 – Showing the main menu or the most frequent features when user says “help”.

● S9 – Showing the main menu or main features when user says something the bot

cannot understand.

● S10 – Showing the persistent menu instead of a text-input box.

● S11 – Highlighting the most important features.

6

4.3 Our position

After discovering that it is important for the chatbot to show its capabilities as well as looking

at some of the main strategies’ designers use to inform their users about a chatbots features,

the question remains if they are effective, and how the users respond to them. In our paper we

want to test some of the most popular strategies with real users to determine what works, and

what doesn't.

5. Methods

We consider this research questions and paper to be in the interpretive paradigm and to be set

within a case study. The methods we intend to use reflect this general approach to the paper

and questions we have chosen for investigating, the theme of interaction with artificial

intelligence.

5.1 Overall approach

On a general level we would want to perform a literature review of the field to understand

what has been done already on the research questions we have proposed.

From here on it would be beneficial to assess some chatbots that are released and in use for

real life cases today and review them on how they inform their users of their limitations and

capabilities.

After generated enough data through the first iterations of a literature review and assessment

of today's solutions, it could be interesting to create prototypes of chatbots that use design

guidelines given in earlier papers and try out more experimental methods on how to to present

limitations and capabilities.

5.2 Data collection methods

We want to use qualitative research methods to provide us answers to our research questions.

The methods we want to use in to address the questions in the assignment is as follows:

1. User testing | Usability testing

We want to give test groups a couple of open-ended tasks that should be solved through

interaction with a chatbot. Through this session we want to gather data on how users respond

to the chatbots presentations of its limitations, and how they react when a limitation occurs.

7

The usability testing will be applied in our design process when creating prototypes of

chatbots that we can evaluate different approaches to how chatbots can present limitations and

capabilities.

2. Semi-structured interviews

We also want to apply the method of semi-structured interviews to gather qualitative data on

the users view on how limitations in chatbots can be presented in an efficient way, and how

chatbots should handle interaction when actually limitations occurs.

Observations and questionnaires are methods we are considering.

5.3 Design approach

For the design of the prototypes we want to use in our assignment, we want to create them

using a user centered design process. (User-Centered Design Basics | Usability.gov)

For this we will find a specific context of use, like for questions regarding banking as the

DNB chatbot, or related to communal services like “Kommune-Kari”. Along with this

context we will try to find suitable use cases and scenarios for users to accomplish and specify

the requirements for these conditions to be met.

For creating the design solutions, we will use services like DialugueFlow or chatfuel to create

high fidelity prototypes that can be applied to usability testing.

As mentioned earlier in our paper, we will use user testing as a method of data collection and

this will be applied to our design approach as well, to create prototypes that can produce solid

data foundation to analyse for our research questions.

8

6. Sketches

10

7. Findings

There are currently no developed prototypes and evaluations done at this time.

8. Evaluation plan

In order to decide upon an evaluation method to apply to our study we asked ourselves what

kind of data we need, to best answer our research questions. In the context of finding the

correct method for our project, we will present our study in the light of task-oriented and

ability-oriented evaluation methods, and look further into UX evaluation methods and

principles.

Task-oriented evaluation and ability-oriented evaluation

First we looked into the possibility of evaluating the chatbot we are developing through the

means of task-oriented evaluation since we are making a “Specialized AI system”. We

originally thought this evaluation would fit because the goal was not to evaluate the systems

intelligence, but rather the presentation of abilities and limitations related to banking and

communal services. After some research we did not find that any of the categories “Human

discrimination”, “Problem benchmarks” or “Peer confrontation” would help us answer our

research question or be a sufficient way of evaluating our chatbot (Hernández-Orallo,

2017).We then looked at the possibility of ability-oriented evaluation because the paper

suggested that “there are some kinds of AI systems for which task-oriented evaluation is not

appropriate.“ (Hernández-Orallo, 2017) where chatbots were referenced. The main problem

here is that we are not evaluating an AI system as a whole or its intelligence, but rather we

evaluate different strategies used by it to inform their features and limitations to the user, from

the user’s perspective.

UX

Based on our research question being better suited to be measured from a user centered

perspective, we want to use more UX related principles when evaluating our study. Lindblom

and Andreasson(Lindblom and Andreasson, 2016) goes into explaining two main aspects

when evaluating AI from a UX perspective, pragmatic and hedonic aspects. It is the pragmatic

aspect where we want to study the “usability” of the techniques used in the chatbots to display

their abilities and limitations. Here we will also try to evaluate values of effectiveness,

efficiency, satisfaction, ease-of-use and learnability by applying the design techniques to

chatbot prototypes and test them on users, benchmarking them against neutral chatbots. Our

11

evaluation can be described as a formative evaluation, as the design of the chatbots is an

iterative process and we will use the methods described in 5.2 as well as the evaluation

method described in here.

Reflection

We are happy with the proposed research plan, but we would have liked to have made a

working prototype implementing some of the strategies presented in Here’s What I Can Do:

Chatbots’ Strategies to Convey Their Features to Users” (Valério et al., 2017). This would

have given us the opportunity to gather data and present some initial findings.

We would also have liked to find some literature that would discuss different strategies used

to provide meaningful information when something goes wrong or when a chatbot cannot

answer the questions the user asks. Here we would need to do our own research into how

chatbots normally handles those situations.

12

9. References

Brandtzaeg, P. B. and Følstad, A. (2017) ‘Why People Use Chatbots’, in Kompatsiaris, I. et

al. (eds) Internet Science. Cham: Springer International Publishing, pp. 377–392. doi:

10.1007/978-3-319-70284-1_30.

Luger, E. and Sellen, A. (2016) ‘“Like Having a Really Bad PA”: The Gulf between User

Expectation and Experience of Conversational Agents’, in Proceedings of the 2016 CHI

Conference on Human Factors in Computing Systems - CHI ’16. the 2016 CHI Conference,

Santa Clara, California

Dale, R. (2016) ‘The return of the chatbots’, Natural Language Engineering, 22(05), pp. 811–

817. doi: 10.1017/S1351324916000243.

Hernández-Orallo, J. (2017) ‘Evaluation in artificial intelligence: from task-oriented to

ability-oriented measurement’, Artificial Intelligence Review, 48(3), pp. 397–447. doi:

10.1007/s10462-016-9505-7.

Lindblom, J. and Andreasson, R. (2016) ‘Current Challenges for UX Evaluation of Human-

Robot Interaction’, in Schlick, C. and Trzcieliński, S. (eds) Advances in Ergonomics of

Manufacturing: Managing the Enterprise of the Future. Springer International Publishing

(Advances in Intelligent Systems and Computing), pp. 267–277.

User-Centered Design Basics | Usability.gov (no date). Available at:

https://www.usability.gov/what-and-why/user-centered-design.html (Accessed: 12 October

2018).

Valério, F. A. M. et al. (2017) ‘Here’s What I Can Do: Chatbots’ Strategies to Convey Their

Features to Users’, in Proceedings of the XVI Brazilian Symposium on Human Factors in

Computing Systems - IHC 2017. Brazilian Symposium on Human Factors in Computing

Systems, Joinville, Brazil: ACM Press, pp. 1–10. doi: 10.1145/3160504.3160544.

Zamora, J. (2017) ‘I’m Sorry, Dave, I’m Afraid I Can’t Do That: Chatbot Perception and

Expectations’, in. ACM Press, pp. 253–260. doi: 10.1145/3125739.3125766.

13

Appendix

Appendix 1 - Conversational interaction assignment

Creating Filmbot

The chatbot “Filmbot” is a chatbot that is created for people that are struggling with deciding

what movie they want to watch. Filmbot suggests movies for the user based on their

preference of categories, or just randomized suggestions if the user just want to get a movie

suggestion without having any preference of category.

Working with the creation of Filmbot was overall a pleasant experience, but there were some

bumps in the road connected to the use of Chatfuel. We are three members in the group, and

only two out of three managed to get access to making changes to the chatbot at chatfuel. This

was an annoying experience, but we managed to work around it by creating Filmbot using a

big screen so that everyone could take part in the process.

The creation of Filmbot came together by a planning process in the beginning where we

decided on what kind of chatbot we wanted to create, the idea of a chatbot that could come up

with movie suggestions came quickly, and the process of how we wanted the chatbot to

interact and respond started (“setting up the AI). The creation of Filmbot was not very

complicated, but it took some time for everybody to get used to Chatfuel, and how to use it in

the most efficient way. As soon as we became well known with the tool, the creation of

Filmbot went on without any problems, and the process was built up by creating and testing

the new “features” implemented.

The main things we thought of as important when creating Filmbot was supplying the users

with enough options (not too many, as there is a fine line between too many choices and just

enough choices) and on creating a chatbot that does exactly what it is expected to do.

14

Appendix 2 - Machine Learning assignment

Unfortunately none of the team members were able to set up the environment to run the

python scripts with the movie chatbot. We did therefore not get that much output on this

group assignment before the second lecture on machine learning. During the lecture there

were quite good and interesting examples on how configuring the different layers had an

impact on the accuracy, and validity of the answers the A.I gave back when communicating

with it.

It would have been interesting to try out the generative chatbot model as well. Either way it

was a fun and interesting discussion in class on the subject of machine learning.

15

Appendix 3 - Problems with AI

For this assignment we selected a clip from the South Parks season twenty-one premiere

episode, "White People Renovating Houses". The clip selected is from Twitter and located

here: https://twitter.com/MoritzWittmann/status/908319633660416001

What happened/ what was the problem?

In the animated TV show one of the main characters (Eric Cartman) gave a set of voice

commands to his Amazon Alexa. For example the command “tell me a joke” and “set an

alarm for 7:00 AM” as well as adding different things to his shopping cart on the Amazon

store. These voice commands triggered multiple Amazon Alexa in the real world to do the

same as the device did on the TV show. It was reported that a lot of people was waking up to

an alarm at 7am the day after the episode aired. As well as getting a lot of weird items added

to their shopping carts.

So what went wrong?

We think the Amazon Alexa device did not know the difference between the commands given

from the TV and the actual user of the device. We don’t think Amazon suspected that

someone would broadcast voice commands on a series on TV.

Solution

One solution that we know is present in other Natural Language Interfaces is the ability to

differentiate voices and train the interface to only execute commands given from the actual

owner of the device.

16

Appendix 4 - Human-machine partnership

We envision that the human resources intelligent agent (robot) could have the following

functionality and ability to perform tasks such as:

1. Able to find suitable candidates for job openings

2. Undergo interviews with candidates

3. Undergo test with possible candidates

4. Perform background check on candidates

5. Make appointments with candidates for interviews and test

6. Create drafts for employment contracts

7. Create job postings

The functionality is based on both current technology and what we envision will be possible

in the near future.

Scenario 1 - automation level 7

The first scenario the robot should generate all recommendations for the human, this goes for

all the tasks from 1 to 7. In this scenario the robot generate all ideas for actions, and only need

the approval of the human before it performs the tasks, or it perform the tasks the human

inputs on itself instead of the robot solutions.

Advantages and disadvantages

Advantages:

- Efficiently and cuts time consumption on tasks

- Need less manpower

- A possible better foundation on what to base decisions on for human resources

Disadvantages:

- Generated data for consideration could be overwhelming for human resources, create

human bottlenecks.

Possible problems and how to overcome them

Problems related to Task 1 can be if the robot itself has actually has selected the right people

who are suitable for the job. If it is scanning sites like Linkedin and scanning through job

applications sent in and generate options for the human to decide upon it can be difficult to

know if the right people are selected. Here the answer lies in either have a form the applicants

fill out so that is is standardized each time, or let the human look through the people

17

recommended before deciding. Another possible solution is that if the robot in uncertain of

the possible candidates experience or abilities it can ask the person or categorize possible

candidates in a “Possible candidate” group or “Needs more info” for the human resources to

investigate.If the robot shall undergo interviews it needs a high level of semantic

understanding and not misunderstand the applicant. Here we need to let the human observe

the interview either live or after it is done.

Scenario 2 - automation level 10

In the second scenario the robot has a level of automation of 10. Here the robot ignores the

human and acts autonomously. The tasks from 1 to 7 is performed by the robot and the whole

hiring process is decided by it, the final decision of hiring the person selected is also carried

out by the robot effectively as a task number 8.

Advantages and disadvantages

Advantages:

- Big time and cost saving

- No human errors and biased hirings

Disadvantages:

- The human resources intelligent agent may not be able to interpret people in the same

way as humans.

- No way of knowing that you actually hired the right person

- The robot is in charge, not the humans

Possible problems and how to overcome them

Here the problem is that the human has no way of controlling the decisions made by the robot.

The only solution we see here is to restrict the tasks the robot can perform if the bot is to work

fully autonomous. Another possible problem is that if a candidate knows how the robot works

and what is is looking for, it can possibly be tricked into hiring the candidate.