Neuro Usability Testing (or Evaluating Interfaces Using Electroencephalography)

53
Neuro Usability Testing (or Evaluating Interfaces Using Electroencephalography) Jeff Escalante

description

Neuro Usability Testing (or Evaluating Interfaces Using Electroencephalography). Jeff Escalante. Inter-who?. History of Interfaces. History of Interfaces. History of Interfaces. Measuring ‘Usability’. Take the information straight from the brain!. The Error Potential. - PowerPoint PPT Presentation

Transcript of Neuro Usability Testing (or Evaluating Interfaces Using Electroencephalography)

Neuro Usability Testing

Neuro Usability Testing (or Evaluating Interfaces Using Electroencephalography)Jeff EscalanteHi, Im Jeff, and today Im going to talk about people, computers, interfaces, and brains.1Inter-who?

History of InterfacesIn the early days of computing, people had to interact with computers through a command line interface. In order to even operate a computer, in-depth programming knowledge was required, making the computer a tool only for experts.3History of Interfaces

In 1973, Xerox revolutionized the computing world by inventing the first personal computer (PC), the Xerox Alto, and the first graphical user interface (GUI). This interface made operating computers easy and accessible to anyone without extensive programming knowledge. On the left is the Alto interface, and on the right is an early mouse, used for controlling the interface.4History of Interfaces

Today, GUIs have come a long way. We interact with computers through touch and speech, and advances in gaming technology are moving us towards interacting with computers through gestures.5Measuring Usability

So how do most people tackle usability testing for their own interfaces?

Most people now use surveys, focus groups, etc. and get no quantitative data, even if we include most of the methods I described in the previous slide.

I recently interviewed a guy who said that how he tested usability was to give someone his app and watch over their shoulder to see if they get frustratedI talked to someone else who said they hire an app developer to make their app, then give it to people, and if they dont like it, make changes and hire he developer again to change everything around. They called this rapid prototyping. Hmm

On top of all this, self report measures have been shown to be inaccurate in a number of studies.

There must be a better way to scientifically test interface usability. Being a programming neuroscience student, I went immediately to the most logical solution from my studies6

Take the information straight from the brain!The Error Potential

Now lets talk about the study I did. In my study I examined one brain signal I thought was particularly important the error potential. This signal arises in electroencephalography when someone thinks that they made a mistake. It has also been shown by one study to arise when the interface makes a mistake. In my study, I looked to confirm these findings and expand them to a wider range of interface errors.8Electroencephalography (EEG)

Sternberg Memory Task

In the first portion of my study, I tested to make sure that we would get an error potential in general in our lab setting. I created a memory task, which I purposely made very difficult so that my subjects couldnt get all of them right. Heres an example of how it went. The subjects job was to memorize a list of consonants. Then after the list was done (indicated by a blue +), another letter would be shown in red. If this letter was in the list you just memorized, yell out yes, if not, say no. Lets do one quick test right now, so that you guys know what its like.10

Stop Demo Time!MQXL+XEasy, right?LBPZMCTGJQDF+GThe Second PartSymbolsLinksMotionSoundButtons

For the second part of the experiment, I crafted real world interfaces directly in the browser that worked pretty well most of the time. 1/3 of the trials contained errors. I spent a lot of time researching to find websites that people had problems with and found that most of the problems fell within these 4 categories: unexpected symbols, unexpected motion, unexpected affordance behaviors, and unexpected sound.33

Analysis

Ok, lets get into the nerdy stuff. It has been shown that error potentials elicit significantly higher activity along the fronto-cental midline of the brain. We therefore performed a folding average across similar trials and across subjects and compared the Cz and Fz channels for trials that we expected an error potential for vs. not. We then ran an ANOVA at 95% confidence across these conditions. Voltage difference44Analysis10 Subjects (this is enough for EEG studies)Folding average across subjects & conditionsCompared error potential expected and no error potential expected conditions across Cz and Fz sitesUsed an ANOVA at 95% confidence to testResults First PartF(1,8) = 4.15, p = .0421 Found an error potential!So what did I find? In the first part of the experiment, we found that there was a significant difference in activity between error potential and non-error potential conditions in the Cz channel. Although we didnt find a significant difference in Fz, this data is enough for us to confirm the presence of an error potential.

46Results Second PartWe also did directly find error potentials this is a graph of the data from the symbols condition of the second part of the experiment, and with as little as 10 error trials per subject, we were able to show a very clear error potential like the one I showed earlier on (light blue line).47Results Second PartDetected error potentials in:SymbolsMotionSoundNo significant error potentials in:LinksButtonsOverall, we found that there were error potentials for the symbols, motion, and sound parts, but not for the links and button parts. The way I interpreted this is that the error potential is not effective for detecting error on the interfaces part. The motion, sound, and feedback system for the symbols section can all be interpreted as negative feedback indicating that the user was wrong. In the links and button section, there is no doubt that the interface was messed up and it wasnt the users fault.

48Fail?

So the interface error potential was not exactly supported by the results. Does this mean that the error potential cant be used to test usability? Absolutely not it still has tons of potential use. This study itself lays the groundwork for its potential uses. In this case, on the symbol, motion, and sound trials, we hypothesize that the user may have thought that they were wrong because of negative feedback. However, the user was not wrong in any of the trials all of the error trials were programmed specifically against the users expectations. Although I did it on purpose here, this kind of error happens often in computer interfaces the computer makes the user feel as if they are the ones that caused the mistake, when in reality the computer should have compensated.

Lets talk real life. In this example, the computer has failed, but presents it to the user in a rather rude and accusing manner. We see a pop-up window, interrupting all progress, with a notification that something has failed, no explanation why, no instructions on how to fix it, and on top of that, the user is forced to click ok to the error. Is the error really ok? Probably not, but we dont have an option. When faced with an error dialogue like this, users will often feel at fault personally, and this will, as shown by my results, likely elicit an error potential.49Limitations

It is also possible that this interpretation is not correct, and that we should have in fact got error potentials for all conditions and there was experimental error due to study limitations. This is certainly not out of the question. The second section had fewer trials than the first we ran 10 subjects with 10 error trials on each section. Increasing this number would doubtlessly increase the validity of our results.5075.8% Accurate!

Naiive Bayes ClassifierResults In Real Time!With the help of the wonderful Leanne Hirschfield, I was also able to apply a machine learning algorithm to our results. In this type of analysis, rather than averaging across the conditions, we randomly selected a half of our data to train the algorithm and the other half for the algorithm to analyze and evaluate whether, based on its training, it would be able to detect the error potential on its own. We had great results here, as seen above, for the algorithms accuracy. This tool can be useful for detecting error potentials in as close to real time as we can get (only delay being the time it takes the algorithm the process, which is less than a second), as the data from the EEG can be fed right through an algorithm as it comes out, but it cannot be averaged across trials until the experiment is over.51

A Whole New World!

This study certainly will not revolutionize usability testing, and we are still far from the day when small companies will be testing usability using EEG headsets. But this study lays the groundwork for a whole new world of usability testing a world in which interface engineering is converted from guesswork to science. The error potential could be just one signal among a range of others that are scanned for during an interface testing session. The HCNGUL has a range of other parameters in mind that can be tested, and someday I can imagine a world in which usability testing simply involves one piece of open-source software, a cheap, dry-sensor EEG headset, and a galvanic skin response bracelet. After running just a couple users through your interface, the program would be able to provide a detailed report of exactly where the problems are and how the user is feeling. Now thats science.52Thank You!Questions?Thanks so much everyone for listening. If anyone has questions, feel free to ask them now or to come and talk to me afterwards!53nullBlues3004.0847