Group assignment Final report, IN5480
How can chatbots communicate abilities and limitations
IN5480
Aleksander Erichson | [email protected]
Martin Arentsen Espeland | [email protected]
Marius Bråthen | [email protected]
18.10.2018
2
1. Description of group
Our group consists of three second year master students from the program Informatics:
Design, Use, Interaction: Aleksander Erichson, Martin Arentsen Espeland and Marius Bråthen
2. Area of interest
Our interest in interaction with AI will be focused on communication involving chatbots or
conversational agents. We are interested in commercial chatbots that provide customer service
in form of information or executing services for the users. We are interested in the
presentation of the chatbots limitations in terms of what they can and cannot help with as well
as how they present themselves when something goes wrong.
We have chosen this theme on the background of the recent development on chatbots and the
popular use of them in several settings, like commercial, social, informational and for
entertainment purposes. (Brandtzaeg and Følstad, 2017)
As a basis for finding research questions in the area of interaction with AI we tested some
Norwegian text based conversational agents, namely DNB, Nordea, Hafslund Strøm and
Kommune Kari.
3. Questions
Under we present the questions we are most interested in addressing regarding the area we
have chosen.
1. How can chatbots efficiently present its features and limitations to the user regarding
what it can and cannot do?
We have noticed that in the start of the interaction with the chatbots we tested the chatbot
started by presenting us with a text where it describes what its purpose were. Additionally, the
chatbots gave some information about its limitations regarding what it were able to answer
questions about, and sometimes specify what structure of sentences they do understand the
most. We thought this was interesting and wanted to know if this is an efficient way to present
the limitations of the agent. We wonder if a poor presentation of limitations when interacting
with a chatbot could alter or have an impact on the expectations users have to the chatbot or
not. We have seen that in cases where conversational agents fail to present its limitations and
3
are seemingly good in some areas gives the users a false expectation of what the agent can
and cannot do. (Luger and Sellen, 2016)
2. How can chatbots provide meaningful information when something goes wrong or
they cannot answer the questions the user asks?
We believe an interesting question to address is how the chatbots provides information when
something goes wrong or the chatbot is not able to answer a question. We see several
similarities in how the chatbots we tested presented itself when we gave a question it did not
understand or somehow were not able to answer.
4. Background
4.1 Conversational agents (CA) / Chatbots
In this section we will first present some of the existing research and knowledge on the
chosen topic Conversational Agents (CA) / Chatbots and then provide what we believe is
missing as well as position our work. We start with a brief introduction to CAs and chatbots.
Chatbots are on the rise. “In Spring 2016, Facebook and Microsoft provided resources for
creating chatbots to be integrated into their respective messaging platforms, Messenger and
Skype. One year later, more than 30,000 chatbots have been launched on Facebook
Messenger” (Brandtzæg and Følstad, 2017). The recent popularity and renewed interest in
chatbots can be contributed to the advancements in artificial intelligence and machine
learning. Chatbots are a potential great interface for people to interact with services that can
be automated by AI or robots.
But what exactly do we mean when we talk about chatbots? Whether you call them
conversational agents, chatbots, or something else in this nature it usually refers to
applications where a human can communicate with a machine through a natural language
interface, using natural language in some form (Dale, 2016) .The most common forms of
communicating being talking to the chatbot via speech or text often with the intent of
obtaining information, assistance with a task, entertainment or relational purposes
(Brandtzaeg & Følstad, 2017).
4
When it comes to chatbots we can typically divide them into three categories when it comes
to how they function on a technical level. You have the following:
● Generative
● Action and intent
● Scripted
In this paper we narrow in on the type of chatbot “Action and intent” type.
4.2 Related work
For us to position our work we needed to get an overview of the field and find out what kind
of research that had already been done on the topic presenting features and limitation by
chatbots in addition to our own observation while doing the initial testing of different
chatbots. Firstly, we wanted to know if it is any need to look at feedback given from the
chatbot when it comes to features and limitations. To support this, we read Luger and Sellens
study where they looked at people’s daily usage of task-driven Conversational Agents (CAs)
and the interactional factors that affect the usage. (Luger & Sellen, 2016) Here a key finding
regarding our interest were presented.
● They pointed out the problem users have in assessing the systems capabilities and
“intelligence” as well as a lack of feedback. Here with half of the participants
reporting that they did not know what the CA could do. (Luger & Sellen, 2016)
The paper lists a series of suggestions for developers when designing CAs. And we want to
highlight the following two:
1. Consider new ways of conveying CA capability through interactions
Here they suggest looking at other ways the CA can convey its capability, especially
when it struggles. They say that users often tend to blame themselves abandon
particular types of task requests. (Luger & Sellen, 2016)
2. Rethink system feedback and design goals in light of the dominant use case
Because a majority of users stops using the CA when it stops providing utility, the
authors suggest that one rethink the design goals of the CA system to deliver a more
compelling user experience and reflect the dominant use case. (Luger & Sellen, 2016)
In the article “I’m Sorry, Dave, I’m Afraid I Can’t Do That: Chatbot Perception and
Expectations” by Jennifer Zamora she explores the topic of how to better understand the role
and purpose of chatbots. This is done through a user centred perspective and aims to better
5
clarify the expectations that are built toward interacting with chatbots. In the concluding
remarks of the article one of the main issues for users is what input validity to chatbots. As a
design guideline the article suggests that to enhance the user experience and better set
expectations, future chatbots should be designed with multiple input modalities; “Providing a
secondary input channel such as displaying menu options or including voice input in addition
to text allows for better experiences that can reduce error or recovery time” (Zamora, 2017)
This brings us to our next question; what design implications are the most common among
articles? This have been tried to be answered through the article “Here’s What I Can Do:
Chatbots’ Strategies to Convey Their Features to Users” (Valério et al., 2017)
The goal for this paper were to analyze the communicative strategies that popular chatbots
uses to convey their features to the users. The chatbots selected were based on their popularity
as an indication of possible good design decisions. The bots were all related in the fact that
they all were available on the Facebook messenger platform as well as having a similar
purpose. (Valério et al., 2017) The chatbots selected were TechCrunch, CNN, and the Wall
Street Journal (WSJ). (Valério et al., 2017). After the analysis Valério et al. presented 11
strategies used by chatbots to inform their features to the users:
● S1 – Showing the main feature on the first message
● S2 – Guiding the user through a short tutorial during first messages
● S3 – Suggesting the next possible set of actions to the user.
● S4 – Having a persistent menu with main features.
● S5 – Having a main menu with main features.
● S6 – Having a list of available commands
● S7 – Offering contextual help about a feature.
● S8 – Showing the main menu or the most frequent features when user says “help”.
● S9 – Showing the main menu or main features when user says something the bot
cannot understand.
● S10 – Showing the persistent menu instead of a text-input box.
● S11 – Highlighting the most important features.
6
4.3 Our position
After discovering that it is important for the chatbot to show its capabilities as well as looking
at some of the main strategies’ designers use to inform their users about a chatbots features,
the question remains if they are effective, and how the users respond to them. In our paper we
want to test some of the most popular strategies with real users to determine what works, and
what doesn't.
5. Methods
We consider this research questions and paper to be in the interpretive paradigm and to be set
within a case study. The methods we intend to use reflect this general approach to the paper
and questions we have chosen for investigating, the theme of interaction with artificial
intelligence.
5.1 Overall approach
On a general level we would want to perform a literature review of the field to understand
what has been done already on the research questions we have proposed.
From here on it would be beneficial to assess some chatbots that are released and in use for
real life cases today and review them on how they inform their users of their limitations and
capabilities.
After generated enough data through the first iterations of a literature review and assessment
of today's solutions, it could be interesting to create prototypes of chatbots that use design
guidelines given in earlier papers and try out more experimental methods on how to to present
limitations and capabilities.
5.2 Data collection methods
We want to use qualitative research methods to provide us answers to our research questions.
The methods we want to use in to address the questions in the assignment is as follows:
1. User testing | Usability testing
We want to give test groups a couple of open-ended tasks that should be solved through
interaction with a chatbot. Through this session we want to gather data on how users respond
to the chatbots presentations of its limitations, and how they react when a limitation occurs.
7
The usability testing will be applied in our design process when creating prototypes of
chatbots that we can evaluate different approaches to how chatbots can present limitations and
capabilities.
2. Semi-structured interviews
We also want to apply the method of semi-structured interviews to gather qualitative data on
the users view on how limitations in chatbots can be presented in an efficient way, and how
chatbots should handle interaction when actually limitations occurs.
Observations and questionnaires are methods we are considering.
5.3 Design approach
For the design of the prototypes we want to use in our assignment, we want to create them
using a user centered design process. (User-Centered Design Basics | Usability.gov)
For this we will find a specific context of use, like for questions regarding banking as the
DNB chatbot, or related to communal services like “Kommune-Kari”. Along with this
context we will try to find suitable use cases and scenarios for users to accomplish and specify
the requirements for these conditions to be met.
For creating the design solutions, we will use services like DialugueFlow or chatfuel to create
high fidelity prototypes that can be applied to usability testing.
As mentioned earlier in our paper, we will use user testing as a method of data collection and
this will be applied to our design approach as well, to create prototypes that can produce solid
data foundation to analyse for our research questions.
8
6. Sketches
9
10
7. Findings
There are currently no developed prototypes and evaluations done at this time.
8. Evaluation plan
In order to decide upon an evaluation method to apply to our study we asked ourselves what
kind of data we need, to best answer our research questions. In the context of finding the
correct method for our project, we will present our study in the light of task-oriented and
ability-oriented evaluation methods, and look further into UX evaluation methods and
principles.
Task-oriented evaluation and ability-oriented evaluation
First we looked into the possibility of evaluating the chatbot we are developing through the
means of task-oriented evaluation since we are making a “Specialized AI system”. We
originally thought this evaluation would fit because the goal was not to evaluate the systems
intelligence, but rather the presentation of abilities and limitations related to banking and
communal services. After some research we did not find that any of the categories “Human
discrimination”, “Problem benchmarks” or “Peer confrontation” would help us answer our
research question or be a sufficient way of evaluating our chatbot (Hernández-Orallo,
2017).We then looked at the possibility of ability-oriented evaluation because the paper
suggested that “there are some kinds of AI systems for which task-oriented evaluation is not
appropriate.“ (Hernández-Orallo, 2017) where chatbots were referenced. The main problem
here is that we are not evaluating an AI system as a whole or its intelligence, but rather we
evaluate different strategies used by it to inform their features and limitations to the user, from
the user’s perspective.
UX
Based on our research question being better suited to be measured from a user centered
perspective, we want to use more UX related principles when evaluating our study. Lindblom
and Andreasson(Lindblom and Andreasson, 2016) goes into explaining two main aspects
when evaluating AI from a UX perspective, pragmatic and hedonic aspects. It is the pragmatic
aspect where we want to study the “usability” of the techniques used in the chatbots to display
their abilities and limitations. Here we will also try to evaluate values of effectiveness,
efficiency, satisfaction, ease-of-use and learnability by applying the design techniques to
chatbot prototypes and test them on users, benchmarking them against neutral chatbots. Our
11
evaluation can be described as a formative evaluation, as the design of the chatbots is an
iterative process and we will use the methods described in 5.2 as well as the evaluation
method described in here.
Reflection
We are happy with the proposed research plan, but we would have liked to have made a
working prototype implementing some of the strategies presented in Here’s What I Can Do:
Chatbots’ Strategies to Convey Their Features to Users” (Valério et al., 2017). This would
have given us the opportunity to gather data and present some initial findings.
We would also have liked to find some literature that would discuss different strategies used
to provide meaningful information when something goes wrong or when a chatbot cannot
answer the questions the user asks. Here we would need to do our own research into how
chatbots normally handles those situations.
12
9. References
Brandtzaeg, P. B. and Følstad, A. (2017) ‘Why People Use Chatbots’, in Kompatsiaris, I. et
al. (eds) Internet Science. Cham: Springer International Publishing, pp. 377–392. doi:
10.1007/978-3-319-70284-1_30.
Luger, E. and Sellen, A. (2016) ‘“Like Having a Really Bad PA”: The Gulf between User
Expectation and Experience of Conversational Agents’, in Proceedings of the 2016 CHI
Conference on Human Factors in Computing Systems - CHI ’16. the 2016 CHI Conference,
Santa Clara, California
Dale, R. (2016) ‘The return of the chatbots’, Natural Language Engineering, 22(05), pp. 811–
817. doi: 10.1017/S1351324916000243.
Hernández-Orallo, J. (2017) ‘Evaluation in artificial intelligence: from task-oriented to
ability-oriented measurement’, Artificial Intelligence Review, 48(3), pp. 397–447. doi:
10.1007/s10462-016-9505-7.
Lindblom, J. and Andreasson, R. (2016) ‘Current Challenges for UX Evaluation of Human-
Robot Interaction’, in Schlick, C. and Trzcieliński, S. (eds) Advances in Ergonomics of
Manufacturing: Managing the Enterprise of the Future. Springer International Publishing
(Advances in Intelligent Systems and Computing), pp. 267–277.
User-Centered Design Basics | Usability.gov (no date). Available at:
https://www.usability.gov/what-and-why/user-centered-design.html (Accessed: 12 October
2018).
Valério, F. A. M. et al. (2017) ‘Here’s What I Can Do: Chatbots’ Strategies to Convey Their
Features to Users’, in Proceedings of the XVI Brazilian Symposium on Human Factors in
Computing Systems - IHC 2017. Brazilian Symposium on Human Factors in Computing
Systems, Joinville, Brazil: ACM Press, pp. 1–10. doi: 10.1145/3160504.3160544.
Zamora, J. (2017) ‘I’m Sorry, Dave, I’m Afraid I Can’t Do That: Chatbot Perception and
Expectations’, in. ACM Press, pp. 253–260. doi: 10.1145/3125739.3125766.
13
Appendix
Appendix 1 - Conversational interaction assignment
Creating Filmbot
The chatbot “Filmbot” is a chatbot that is created for people that are struggling with deciding
what movie they want to watch. Filmbot suggests movies for the user based on their
preference of categories, or just randomized suggestions if the user just want to get a movie
suggestion without having any preference of category.
Working with the creation of Filmbot was overall a pleasant experience, but there were some
bumps in the road connected to the use of Chatfuel. We are three members in the group, and
only two out of three managed to get access to making changes to the chatbot at chatfuel. This
was an annoying experience, but we managed to work around it by creating Filmbot using a
big screen so that everyone could take part in the process.
The creation of Filmbot came together by a planning process in the beginning where we
decided on what kind of chatbot we wanted to create, the idea of a chatbot that could come up
with movie suggestions came quickly, and the process of how we wanted the chatbot to
interact and respond started (“setting up the AI). The creation of Filmbot was not very
complicated, but it took some time for everybody to get used to Chatfuel, and how to use it in
the most efficient way. As soon as we became well known with the tool, the creation of
Filmbot went on without any problems, and the process was built up by creating and testing
the new “features” implemented.
The main things we thought of as important when creating Filmbot was supplying the users
with enough options (not too many, as there is a fine line between too many choices and just
enough choices) and on creating a chatbot that does exactly what it is expected to do.
14
Appendix 2 - Machine Learning assignment
Unfortunately none of the team members were able to set up the environment to run the
python scripts with the movie chatbot. We did therefore not get that much output on this
group assignment before the second lecture on machine learning. During the lecture there
were quite good and interesting examples on how configuring the different layers had an
impact on the accuracy, and validity of the answers the A.I gave back when communicating
with it.
It would have been interesting to try out the generative chatbot model as well. Either way it
was a fun and interesting discussion in class on the subject of machine learning.
15
Appendix 3 - Problems with AI
For this assignment we selected a clip from the South Parks season twenty-one premiere
episode, "White People Renovating Houses". The clip selected is from Twitter and located
here: https://twitter.com/MoritzWittmann/status/908319633660416001
What happened/ what was the problem?
In the animated TV show one of the main characters (Eric Cartman) gave a set of voice
commands to his Amazon Alexa. For example the command “tell me a joke” and “set an
alarm for 7:00 AM” as well as adding different things to his shopping cart on the Amazon
store. These voice commands triggered multiple Amazon Alexa in the real world to do the
same as the device did on the TV show. It was reported that a lot of people was waking up to
an alarm at 7am the day after the episode aired. As well as getting a lot of weird items added
to their shopping carts.
So what went wrong?
We think the Amazon Alexa device did not know the difference between the commands given
from the TV and the actual user of the device. We don’t think Amazon suspected that
someone would broadcast voice commands on a series on TV.
Solution
One solution that we know is present in other Natural Language Interfaces is the ability to
differentiate voices and train the interface to only execute commands given from the actual
owner of the device.
16
Appendix 4 - Human-machine partnership
We envision that the human resources intelligent agent (robot) could have the following
functionality and ability to perform tasks such as:
1. Able to find suitable candidates for job openings
2. Undergo interviews with candidates
3. Undergo test with possible candidates
4. Perform background check on candidates
5. Make appointments with candidates for interviews and test
6. Create drafts for employment contracts
7. Create job postings
The functionality is based on both current technology and what we envision will be possible
in the near future.
Scenario 1 - automation level 7
The first scenario the robot should generate all recommendations for the human, this goes for
all the tasks from 1 to 7. In this scenario the robot generate all ideas for actions, and only need
the approval of the human before it performs the tasks, or it perform the tasks the human
inputs on itself instead of the robot solutions.
Advantages and disadvantages
Advantages:
- Efficiently and cuts time consumption on tasks
- Need less manpower
- A possible better foundation on what to base decisions on for human resources
Disadvantages:
- Generated data for consideration could be overwhelming for human resources, create
human bottlenecks.
Possible problems and how to overcome them
Problems related to Task 1 can be if the robot itself has actually has selected the right people
who are suitable for the job. If it is scanning sites like Linkedin and scanning through job
applications sent in and generate options for the human to decide upon it can be difficult to
know if the right people are selected. Here the answer lies in either have a form the applicants
fill out so that is is standardized each time, or let the human look through the people
17
recommended before deciding. Another possible solution is that if the robot in uncertain of
the possible candidates experience or abilities it can ask the person or categorize possible
candidates in a “Possible candidate” group or “Needs more info” for the human resources to
investigate.If the robot shall undergo interviews it needs a high level of semantic
understanding and not misunderstand the applicant. Here we need to let the human observe
the interview either live or after it is done.
Scenario 2 - automation level 10
In the second scenario the robot has a level of automation of 10. Here the robot ignores the
human and acts autonomously. The tasks from 1 to 7 is performed by the robot and the whole
hiring process is decided by it, the final decision of hiring the person selected is also carried
out by the robot effectively as a task number 8.
Advantages and disadvantages
Advantages:
- Big time and cost saving
- No human errors and biased hirings
Disadvantages:
- The human resources intelligent agent may not be able to interpret people in the same
way as humans.
- No way of knowing that you actually hired the right person
- The robot is in charge, not the humans
Possible problems and how to overcome them
Here the problem is that the human has no way of controlling the decisions made by the robot.
The only solution we see here is to restrict the tasks the robot can perform if the bot is to work
fully autonomous. Another possible problem is that if a candidate knows how the robot works
and what is is looking for, it can possibly be tricked into hiring the candidate.
Top Related