Detecting Deception in Interrogation Settings · Detecting Deception in Interrogation Settings C.E....

30
Detecting Deception in Interrogation Settings C.E. Lamb and D.B. Skillicorn December 2012 External Technical Report ISSN-0836-0227- 2012-600 School of Computing Queen’s University Kingston, Ontario, Canada K7L 3N6 Document prepared December 18, 2012 Copyright c 2012 C.E. Lamb and D.B. Skillicorn School of Computing Queen’s University {carolyn,skill}@cs.queensu.ca Abstract Bag-of-words deception detection systems outperform humans, but are still not always accu- rate enough to be useful. In interrogation settings, present models do not take into account potential influence of the words in a question on the words in the answer. Because of the requirements of many languages, including English, as well as the theory of verbal mimicry, such influence ought to exist. We show that it does: words in a question can “prompt” other words in the answer. However, the effect is receiver-state-dependent. Deceptive and truthful subjects respond to prompting in different ways. The accuracy of a bag-of-words deception can be improved by training a predictor on both question words and answer words, allowing it to detect the relationships between these words. This approach should generalize to other bag-of-words models of psychological states in dialogues.

Transcript of Detecting Deception in Interrogation Settings · Detecting Deception in Interrogation Settings C.E....

  • Detecting Deception in Interrogation Settings

    C.E. Lamb and D.B. Skillicorn

    December 2012External Technical Report

    ISSN-0836-0227-2012-600

    School of ComputingQueen’s University

    Kingston, Ontario, Canada K7L 3N6

    Document prepared December 18, 2012Copyright c⃝2012 C.E. Lamb and D.B. Skillicorn

    School of ComputingQueen’s University

    {carolyn,skill}@cs.queensu.ca

    Abstract

    Bag-of-words deception detection systems outperform humans, but are still not always accu-rate enough to be useful. In interrogation settings, present models do not take into accountpotential influence of the words in a question on the words in the answer. Because of therequirements of many languages, including English, as well as the theory of verbal mimicry,such influence ought to exist. We show that it does: words in a question can “prompt” otherwords in the answer. However, the effect is receiver-state-dependent. Deceptive and truthfulsubjects respond to prompting in different ways. The accuracy of a bag-of-words deceptioncan be improved by training a predictor on both question words and answer words, allowingit to detect the relationships between these words. This approach should generalize to otherbag-of-words models of psychological states in dialogues.

  • Detecting Deception in Interrogation Settings

    C.E. Lamb and D.B. SkillicornSchool of Computing, Queen’s University

    Technical Report 2012-600

    1 Introduction

    From court cases to airport security, many high-stakes real-world situations rely on thehuman ability to distinguish truth from deception. Sadly, humans do not do this well.Automating the detection of deception would thus bring many benefits. If an automatedmethod outperformed humans, it could be used to augment human judgement in high-stakessituations; and it could also be used in lower-stakes situations where expert human judgementis infeasible. For instance, a filter alerting users to potential deception could be used withonline job postings.

    One obstacle in the way of automation is that deception works differently in differentsituations. One important factor mediating cues to deception is the difference betweenmonologues and dialogues. Many deception models are based on monologues such as speechesor written statements. In many interesting real-world cases, the deceptive person is in adialogue, and each participant responds and adjusts to the linguistic style of the other. Thismakes deception detection in a dialogue complicated. If one participant shows linguisticsigns of deception, are they being deceptive, or are they responding to some aspect of theother participant’s prompting?

    We have undertaken an exploratory study of these issues by analyzing archival transcriptsof real-world interrogations and applying a deception model developed by Newman et al. [43].We found that word classes relevant to the model were vulnerable to a prompting effect: usingcertain words in the question increased the frequency of other words in the answer. Thereforethe model cannot be applied to dialogue data in its present form, because the promptingeffect is a potential confound.

    Our goal, therefore, was to expand the model’s applicability by removing the effect ofthe question from each answer. However, using this correction on real-world data revealedthat our assumptions had been overly simplistic. Deceptive and non-deceptive respondentsin the data responded to prompting in different ways, and removing the prompting effectalso removed some of the signals of difference between these subgroups. Instead of removingthe effect of the question, the relationship between question and answer words needs to betaken into account. Tests with a predictor confirmed that a model based on question andanswer words, rather than just answer words, is indeed more accurate.

    We conclude that, in deception, it is not the case that the respondent has an underlyingmental state whose effects are distorted in a uniform way by prompting. Instead, promptingoccurs, but the respondent’s mental state determines its effect. Fortunately, this second-orderbehavior is itself detectable. We suspect that this effect will generalize to other word-basedpsychological models in dialogues, helping reveal other aspects of the relationship between

    1

  • questioner and respondent.

    2 Background

    2.1 Stop Words

    In text mining it is common practice to remove certain “stop words”: words like “the”, “I”,and “if”, which are thought to be too common to be worth analyzing [31]. It is thought thatthese stop words have only a grammatical function, and thus are unnecessary in a model thatdoes not take word order into account. However, stop words contain interesting meaningsof their own. Aside from their grammatical function, they have contextual, social meanings.(The word “he”, for example, refers to a man, but we need context in order to figure outwhich man.) They also serve a stylistic function. For this reason they are also called “stylewords” or “function words”.

    Function words are particularly useful for the discovery of a speaker’s mental state fortwo reasons. First, they are more plentiful than content words. Although normal Englishspeakers have only 500 function words in their vocabulary (out of 100,000 total Englishwords), 55% of all the words they speak or write on a given day will be function words [55].Second, these words are produced in a different part of the brain than content words [39]and can “leak” information about a person’s mental state even without their awareness [17].

    Social psychologists in the past twenty years have been analyzing people’s speaking andwriting styles by counting both function words and common content words. One impor-tant tool for this analysis is Pennebaker’s [45] Linguistic Inquiry and Word Count program(LIWC). With this program, researchers have discovered correlations between function worduse and everything from depression [49] to sexual orientation [25] and quality of a rela-tionship [50] and, our focus here, deception [43]. Function words have also been shown todistinguish among different kinds of Islamist language [35].

    2.2 Deception Detection

    Unaided humans are poor at detecting deception. Although individual proficiency varies,even groups of trained humans such as police officers rarely perform above chance [22]. Manyattempts have been made to construct empirical models of deception, relying on objectivecues, but this is difficult because different studies produce conflicting results. In fact, nosingle cue is reliably indicative of deception across studies [8]. One reason for this is thatdifferent studies address the issue of deception in different situations, and the relevant cuesin these situations may also be different. For example, DePaulo et al.’s meta-analysis foundthat effect sizes were different depending on the deceptive person’s motivation, the amount ofinteractivity in the setting, and whether a social transgression was involved [21]. Meanwhile,computer-mediated communication appears to function along different lines from face-to-facecommunication, perhaps because each person has time to rehearse what they will say [61].

    2

  • 2.3 The Pennebaker Deception Model

    The model of deception we use comes from Newman et al. [43]. The authors asked collegestudents to speak or write, either deceptively or truthfully, about a variety of topics (theiropinion on abortion, their feelings about their friends, and a mock theft). Then they analyzedall the truthful and deceptive statements with LIWC. Four linguistic signs of deceptionemerged across categories:

    • First person singular pronouns decrease in the deceptive condition. Deceptive peoplehave less personal experience with their subject matter than truthful people, and areless emotionally willing to commit to what they are saying, so they focus on and referto themselves less [43].

    • Exclusive words decrease in the deceptive condition. Such words introduce increasedcomplexity into sentences. Deception causes a higher cognitive load than truthfulnessbecause of the increased difficulty of monitoring a deceptive performance [21], whichproduces an increase in visible signs of effort [62], even when the deceptive person hashad time to prepare [56].

    • Negative emotion words increase in the deceptive condition, consistent with DePaulo etal.’s [21] meta-analysis. Deceptive people are thought to do this as a sign of unconsciousdiscomfort [43]. Alternatively, increased emotion can be a sign of trying too hard topersuade the listener, as in Zhou et al. [61].

    • Motion verbs (“go”, “run”) increase in the deceptive condition. This is the result ofincreased cognitive load and perhaps a need to keep the story moving and discouragesecond thoughts on the part of the reader/hearer.

    Lowered self-reference is consistent with other models (e.g. Zhou et al. [59]) and persistsregardless of motivation [27] but DePaulo et al. [21] and Zuckerman et al. [62], in their meta-analyses, did not find a reliable, statistically significant decrease in self-references acrossstudies.

    DePaulo et al. [21] point out that increases in emotion should be treated with caution.Most lies in the real world are “white lies”, told to smooth over social interactions, andpeople telling them experience very little distress [20]. Moreover, some truthful statements,such as confessions of wrongdoing, might lead to high levels of negative emotions. For similarreasons, this part of the model should not be applied to psychopaths, who experience nodiscomfort when breaking moral rules [47].

    The effect of cognitive load should also be treated with caution: some white lies are easyto tell, and some potentially volatile truths may be as mentally effortful as a lie [21].

    Since the original work, the Pennebaker model of deception has been extensively validatedacross a large number of populations. The words used in these criteria are summarized inTable 1.

    The LIWC model performed significantly better than human raters in distinguishingtruth from deception [43]. Gupta and Skillicorn [26] suggest that the LIWC model detects

    3

  • Categories KeywordsFirst-person pronouns I, me, my, mine, myself, I’d, I’ll, I’m, I’veExclusive words but, except, without, although, besides, however, nor, or,

    rather, unless, whereasNegative emotion words hate, anger, enemy, despise, dislike, abandon, afraid, agony, an-

    guish, bastard, bitch, boring, crazy, dumb, disappointed, disap-pointing, f-word, suspicious, stressed, sorry, jerk, tragedy, weak,worthless, ignorant, inadequate, inferior, jerked, lie, lied, lies,lonely, loss, terrible, hated, hates, greed, fear, devil, lame, vain,wicked

    Motion verbs walk, move, go, carry, run, lead, going, taking, action, arrive,arrives, arrived, bringing, driven, carrying, fled, flew, follow,followed, look, take, moved, goes, drive

    Table 1: Words used in the LIWC deception model

    not only outright falsehood, but also “spin” or “persona deception”, in which a person doesnot make factually false statements, but consciously projects an image of themselves thatthey know to be inaccurate. Skillicorn and Leuprecht have used this reasoning to applyLIWC to the speeches of politicians [51].

    2.4 Fine-Tuning the LIWC Model

    Keila and Skillicorn [34] applied LIWC to a dataset consisting of internal emails at Enronin the period just before its fraudulent accounting scandal. They combined LIWC withsingular value decomposition (SVD). SVD is a useful tool for eliciting correlations frommultidimensional data. An SVD is a factorization of a real matrixA into three other matrices:

    A = USV T

    such that U and V T are unitary matrices and S is a diagonal matrix [53]. The matricesU and V represent the data as vectors in a many-dimensional space. These axes appear inthe matrices in decreasing order of size: the first rows of V T represent the axes of maximalvariation in the data. The nonzero values of the matrix S, meanwhile, correspond to theamount of variation along each axis. One can therefore truncate a set of SVD matrices atan arbitrary number of dimensions and be confident that the truncated version is the mostfaithful possible representation of A in the chosen number of dimensions. The S matrixvalues provide an indication of how much variation is being lost [52].

    Reducing the number of dimensions in this manner not only compresses the data butelucidates correlations between different parts of it, because documents or variables thatvary together will be represented as closely aligned vectors in the SVD space [19].

    Keila and Skillicorn [34] found that LIWC performed well when combined with SVD,because words within a given category (except exclusive words) were correlated with each

    4

  • other and formed sensible structures in semantic space. The top ranked emails by a geomet-rical measure in reduced-dimension semantic space were all deceptive. However, in such alarge dataset, it is difficult to tell how many false negatives there are.

    Skillicorn and Little [52] repeated the SVD-based analysis on transcripts from the Gomerycommission, in which former Canadian government officials were examined regarding allegedcorruption. They found that in this data, deception was associated with increases – notdecreases – in first-person singular pronouns and exclusive words. Altering the model tolook for increases in all categories, they produced results in rough agreement with mediaestimates of who was being deceptive and who was not. (Although ground truth for theGomery data was not available, the most deceptive people according to the model werepeople saying they did not remember basic facts about their own employment, such as whothey were working for. Meanwhile, the least deceptive people were witnesses called in toexplain purely technical matters.) Hence the standard Pennebaker model does not appearto be effective in dialogue settings.

    If deception decreases first-person pronoun use in some situations and increases it in oth-ers, this explains DePaulo et al. [21] and Zuckerman et al.’s [62] inability to find a significanteffect for self-references across many studies. However, it raises the more vexing questionof what causes these words to behave in this way. Skillicorn and Little [52] suggest thatfirst-person singular pronouns and exclusive words increased with deception in the contextof the Gomery commission because it was not an emotionally charged situation. As we shallsee, there is a better explanation.

    2.5 Forcing Word Use in Questions and Answers

    There are two processes of interaction between the language of questions and the languageof answers that alter the word use in both. The first is the technical requirements of the lan-guage itself; the second is the largely unconscious mimicry that participants in a conversationfall into.

    In most languages, the function word patterns in questions force some correspondingstructure into a responsive answer. For example, a question containing the phrase “Did you. . . ” requires either a first-person singular or first-person plural pronoun (“Yes, we did . . . ”;“No, I didn’t”) or a passive verb. A respondent therefore does not have the same freedomof word use in a dialogue that they have in an unforced setting.

    Two people in conversation, even if they are strangers, will imitate each other in every-thing from facial expression and body language [9] to volume [40], pitch [33], and speechrates [58]. People speaking to each other also mimic each other’s words and phrases [36].This mimicry is generally not conscious [9]. Subjects mimic phrases they have heard evenwhen consciously trying not to [3] and when their conscious working memory is filled with adistractor task [36].

    LIWC can be used to measure verbal mimicry on the level of categories of words, whereit is called Linguistic Style Matching (LSM). LSM is measured by comparing LIWC countson all function word categories between both partners in an interaction. This comparisoncan be done through product-moment correlation [44] or a weighted difference score [32].

    5

  • The LSM metric is internally consistent: if a dyad matches in their use of one functionword category, they will probably match to the same degree with all others. It also generalizeswell across different contexts, from online chat conversations [44] to letters between well-known colleagues [32] and face-to-face discussions [44].

    LSM appears to a greater or lesser degree in different circumstances. Some linguisticstyles are more easily matched than others, and different people will match a given stylewith more or less ease [32]. However, while greater LSM is associated with greater socialcohesion, the matching occurs even between strangers who dislike each other [44]. We wouldthus expect a degree of matching to occur in any context – even a courtroom interrogation.

    3 Method and Results

    The prompting effect we expect to see is a generalization of style matching because, unlikepure mimicry, we expect that some categories of question words prompt different categoriesof response words. This is particularly obvious in the case of pronouns. Our first hypothesisis therefore:

    H1. The function words in an answer to a question are not independent of the question,but are prompted (positively or negatively) by the function words in the question. At leastsome of the words affected in this way will be relevant words from the Pennebaker deceptionmodel.

    Our second hypothesis follows:H2. If word frequencies from the LIWC model are affected by the words in the question,

    then this potentially distorts the results of the Pennebaker model. Removing the effectsof the question from the answer before applying the LIWC model should result in greateraccuracy.

    Our plan of attack has three parts. First, we gather archival question-and-answer dataand visualize any relevant changes in frequency that could be caused by H1. Second, wedevelop a method to correct for these changes. Third, although not every answer in ourdata can be labeled as entirely truthful or entirely deceptive, we use some straightforwardmethods to validate the corrections and to investigate whether they do, in fact, improve theseparation between truthful and deceptive responses.

    3.1 Datasets

    We created three datasets by taking archival transcriptions of real-life, high-stakes question-and-answer interactions.

    The Republican dataset comprises transcripts of each of the Republican primary de-bates leading up to the American presidential election of 2012. These involved a totalof ten candidates and were televised and transcribed online by various news organiza-tions [1, 4–7,10–16,18,24,30,42,46,48,57].

    We used the Republican dataset to investigate overall patterns of interaction betweenquestion and answer words. We did not rate the Republican presidential candidates as

    6

  • deceptive or non-deceptive, since there is no reliable estimate of ground truth about thecandidates’ honesty. Fact checking websites may show that specific statements are true orfalse, but they do not help with the issue of persona deception, and there is no reliable wayto judge one candidate overall as more deceptive than another. However, we did judge thedebates in general as a forum in which all candidates would be motivated towards personadeception as defined by Gupta and Skillicorn [26]. Presenting themselves and the factsin the most favorable possible light is essential to getting elected, and much of the timethis favorable light does not correspond exactly with reality. The full Republican datasetcontained 2118 question-answer pairs and 301,539 total words.

    The Nuremberg dataset comprised selected examinations and cross-examinations ofwitnesses from the Nuremberg trials of 1945-1956. Most of these examinations were takenfrom the Trial of German Major War Criminals, transcribed at the Holocaust memorialwebsite nizkor.org [29]. Two were instead taken from the Nuremberg Medical Trial, whichis partially transcribed online at the website of the Harvard Law Library [41]. Unlike theRepublican dataset, the Nuremberg dataset contained obvious subgroups with markedlydifferent motivations towards deception. The first group, Defendants, contained two Naziwar criminals testifying in their own defense who were eventually found guilty on all countsand executed. These men were highly motivated towards deception: they were guilty, andtheir lives depended on convincing the tribunal that they were not. The second group,Untrustworthy witnesses, contained ten lower-ranking Nazis who were not themselveson trial. While this group was not at immediate risk of conviction, it seems reasonableto suppose that their accounts would be moderately deceptive. Most of them would bemotivated to absolve themselves either by minimizing Nazi war crimes as a whole or byminimizing their own involvement. The third group, Trustworthy witnesses, containednineteen survivors of Nazi war crimes who testified about those crimes. Fourteen of thesewere Holocaust survivors, while the other five were civilians who reported on more generalconditions in Nazi-occupied countries. We did not consider any of these witnesses deceptive.

    Some witnesses spoke at great length about their experiences, so we chose to truncateanswers in the Nuremberg dataset at 500 words, matching the maximum size that occurrednaturally in the Republican dataset. This affected less than 1 percent of the answers. Thefull Nuremberg dataset contained 4159 question-answer pairs (1355 from Defendants,1826 from Untrustworthy witnesses, and 978 from Trustworthy witnesses). Itcontained a total of 311,099 words.

    We also make brief use of a third dataset, Simpson. This contains depositions from thecivil trial of O.J. Simpson for the wrongful death of his wife, Nicole Brown, and another man.Extensive transcripts of this material were available online [54]. Simpson contains Simpson’sdeposition in his own defense, which we considered deceptive, as well as the depositions offamily and friends of the deceased, which we considered largely truthful. (We chose the civiltrial rather than the criminal trial because it was the first time Simpson testified directly inhis own defense; it also had the advantage of being a trial at which he was found guilty.)Since Simpson’s own deposition was three times as long as the rest of the dataset, we usedonly every third question-answer pair from him. The Simpson dataset contained 20,810

    7

  • question and answer pairs, totalling 412,385 words.

    3.2 Model Words

    When processing this data, we are most interested in what occurs at the level of a singlequestion and answer pair. Thus, we looked at “windows” in the data, which we defined as asingle question followed by its answer. We recorded the length (in words) of each questionand answer, along with the count of the number of words in these categories:

    FPS First person singular pronouns. The Pennebaker model predicts that deceptive peopleshould use a lower rate of first-person pronouns than truthful people. However, in theGomery commission data, Little and Skillicorn [37] found that deceptive people useda higher rate.

    but Lower rates of this and other exclusive words are expected in deception. The Pen-nebaker model puts “but” and “or” in the same exclusive words category but Littleand Skillicorn’s results [37] suggest that the two words often have quite different prop-erties in semantic space. We therefore counted “but” and “or” separately.

    or Another exclusive word.

    excl The other exclusive words from the model, for example, “unless” and “whereas”.

    neg Negative emotion words. Higher rates of these are expected in deception.

    action Action verbs. Higher rates of these are expected in deception.

    FPP First person plural pronouns. These can be substituted for singular in some circum-stances (the “royal we”) and a questioner using the “royal we” might encourage anrespondent to do likewise. Alternatively, a respondent who is forced grammatically touse a first person pronoun might choose a plural one instead of a singular as a dis-tancing technique. Similarly, if the questioner is referring to a group to which boththey and the respondent belong, this might prompt the respondent to continue talk-ing about this group. Focusing on a group leaves less time to focus on oneself, so weexpected higher FPP rates in the question to prompt lower FPS rates in the answer.

    SPP Second person pronouns. A question containing the word “you” is probably about therespondent, and the respondent is expected to respond by giving information aboutthemselves. Thus, we expected higher SPP rates to prompt higher FPS rates in theanswer.

    “Wh” Who, what, when, where, why, and how. Questions containing these words promptthe respondent for a specific fact. Questions about specific facts should make it harderfor the respondent to be evasive. We expected higher “wh” word rates to prompthigher rates of exclusive words due to an increase in cognitive complexity.

    8

  • TPS Third person singular pronouns. These words are references to a person other thanthe questioner or respondent. Being asked about a third person should induce therespondent to talk about that person, and thus give them less opportunity to talkabout themselves. When talking about another person – not oneself, and not thequestioner – it might also be easier to make disparaging or negative remarks. Weexpected higher TPS rates to prompt lower FPS rates and higher negative emotionrates.

    TPP Third person plural pronouns. We expected higher TPP rates to prompt lower FPSrates and higher negative emotion rates, for the same reason as above.

    these An early analysis with a part-of-speech tagging program showed that rates of “these”,“those”, and “to” in the question were weakly correlated with rates of FPS in theanswer. This early analysis did not yield other interesting results.

    3.3 Gaussian Distributions

    A 500-word answer with five action words is statistically and linguistically different from a15-word answer with five action words. Therefore, using raw word counts in our statisticalanalysis would be inappropriate. For all our analyses, we divided the number of words ofeach type in each single question or answer by the total number of words it contains, givinga rate statistic for each type of word.

    We expected the shortest questions and answers to be difficult to analyze. In a five-wordanswer with one “but”, the rate statistic for “but” would be 0.2 – unusually large. It isn’tclear that the answer has all the properties associated with use of the word “but” to a greaterdegree than, say a 16-word answer with one “but” which would have a rate of only 0.0625.Based on this reasoning and some early inspection of the data, we began the analysis witha minimum window size of 50 words in both question and answer.

    We separated answers into two categories for each pair of question word and responseword. The “unprompted” category contained all answers where the matching question didnot contain the relevant question word. We assume that rates in such answers reflect theirnatural rates without prompting. The “prompted” category contained all answers in whichthe relevant question word appeared at least once.

    For each pair of question word and answer word in the prompted category, we counted therates in all windows and fitted a two-dimensional Gaussian distribution to the data, using theFitFunc toolbox for Matlab. For the unprompted category we computed a one-dimensionalGaussian fitting the rates in the answers.

    These Gaussian distributions provide an indicator of the relationship between the ques-tion word and the answer word pair. A distribution with a positive slope and a mean higherthan that of the unprompted data indicates that the question word prompts higher ratesof the answer word. A distribution with a negative slope and a mean lower than that ofthe unprompted data indicates that the question word prompts lower rates of the answer

    9

  • word. A flat or near-spherical distribution with a mean close to that of the unprompted datasuggests that there is no relationship between the two word categories.

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (a) First-person pluralpronouns prompting first-person singular pronouns

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (b) Second-person pronounsprompt first-person singularpronouns

    0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.080

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    (c) “Wh” words prompting“or”. The other exclusiveword categories looked simi-lar.

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (d) Third person singu-lar pronouns prompt first-person singular pronouns

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (e) Third person plural pro-nouns prompt first-personsingular pronouns

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (f) “These”, “those”, and“to” prompt first-person sin-gular pronouns.

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (g) First-person singularpronouns prompt “but”

    −0.005 0 0.005 0.01 0.015 0.02 0.025 0.03 0.035 0.040

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    (h) “But” prompting “or”

    0 0.01 0.02 0.03 0.04 0.05 0.06 0.070

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    (i) Third person plural pro-nouns prompt action words

    Figure 1: Gaussian distributions from the Republican dataset – minimum window size of50 words. x-axis = rates for answer words, y-axis = rates for question words; the red lineparallel to the y-axis shows the one-dimensional unprompted distribution out to 1 standarddeviation.

    10

  • Our initial estimates of the question word categories that would have significant prompt-ing effects had mixed success. We did not find evidence of many of the relationships we hadexpected (Figures 1). However, we did find that higher rates of second-person pronouns inthe question, as expected, were associated with higher rates of first-person singular pronounsin the answer. So were “these”, “those”, and “to”. With third-person plural pronouns, wefound a weak effect in the opposite direction from what we expected: higher third-personplural pronoun rates prompted very slightly higher first-person singular pronoun rates. Al-though most pairs of question and answer word categories produced no effect, a few pairsfor which we had no hypothesis also showed a prompting effect.

    3.4 Data Correction

    To remove the effect of prompting words, we carried out a series of affine transformationson the points in a bivariate Gaussian model in Cartesian space – producing corrected ratesfor each word category.

    0 0.02 0.04 0.06 0.08 0.1 0.120

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (a) The original data

    −0.04 −0.02 0 0.02 0.04 0.06 0.08

    −0.04

    −0.02

    0

    0.02

    0.04

    0.06

    0.08

    (b) Steps 1 and 2: Trans-lated to a mean of (0, 0) androtated

    −0.02 0 0.02 0.04 0.06 0.08 0.1 0.12−0.02

    0

    0.02

    0.04

    0.06

    0.08

    (c) Steps 3 and 4: Rescaledand translated back to amean that matches un-prompted data

    Figure 2: Steps in the correction process illustrated using second-person pronouns in thequestion and first-person singular pronouns in the answer. The blue point represents aparticular speech as it is transformed.

    The correction method is illustrated in Figure 2. For each question-answer pair, we fit abivariate Gaussian to the prompted data and translate it so the mean is at (0,0). We thenrotate the distribution using a standard rotation matrix until it is “flat” (one of its axes wasparallel to y = 0). We use the smallest possible angle of rotation in either direction.

    After that, we rescale the data on the y-axis so that its standard deviation in the ydirection equals the standard deviation of the unprompted data, and translate it so thatits mean returns to its original value in the x direction, and is equal to the mean of theunprompted data in the y direction.

    We have included a blue dot in Figure 2 representing an example window (in this case, therespondent is Newt Gingrich) so that it can be traced through each step of the process. In the

    11

  • uncorrected data, Gingrich is heavily prompted and responds with a number of first-personsingular pronouns that is about average for the data as a whole, but much less than whatmight be expected given the general trend towards increased first-person singular pronounswith more prompting. After the data is corrected, Gingrich’s first-person singular pronounrate is low, as expected.

    During this process, some windows can be assigned slightly negative values. Obviously itis impossible for a person to say fewer than zero words in response to a question. We treatthe negative values as very emphatic zeroes, but do not correct them until the end of theprocess, since a subsequent correction for another word pair might return them to positivevalues.

    We performed these transformations for each question-answer pair in the data, feedingthe input from one transformation to the next so that, for each response word, a correctionis made for each prompting word in turn. For question-answer pairs where the promptingword had negligible effect on the response word rate, the effect of this correction would alsobe negligible. There was a risk of these small effects producing slight noise in the data, butwe accepted this risk because we did not wish to choose an arbitrary threshold separatingeffects that counted from effects that didn’t.

    Examples of the changes to rate statistics are shown in Figure 3. It is useful to see howlarge a change in word frequencies such a correction represents. An average absolute valueof three words were added or taken away from each window in both the FPS categories –at an average window size of 188 words, about six of which on average were FPS. So abouthalf of these FPS pronouns, according to our model, are caused by prompting. “But” andaction words changed by an average of one word per window, but some windows change in apositive direction and some in a negative, so the average change is much less than one wordper window. The other categories, on average, were barely changed.

    To examine the sensitivity of the correction method we reran the corrections for eachcategory of answer words after removing the 2.5% highest rates and the 2.5% lowest ratesfor each category (5% of the data in total). In this restricted analysis, the average absolutevalue of the difference after correction was still about three words for FPS and about oneword for action words. However, the average change in FPS value was reduced to only twowords, and the effect on “but” disappeared. This suggests that the correction method issomewhat sensitive to outliers, but that its results for FPS and action words are largelyvalid.

    3.5 Window Sizes

    In many settings, the available window sizes are much smaller than 50 words. For example,in the Simpson depositions, the average question and answer contains fewer than 20 wordsin both sides of the window put together, and vanishingly few met the criterion of having 50words or more in both the question and the answer. So we investigated ways to apply ourmodel to smaller windows.

    We experimented with the Republican dataset, using versions with smaller minimumwindow sizes – 30, 10, and 1. We also tried merging adjacent questions and answers, provided

    12

  • that they involved the same questioner and respondent, repeating this until all windowseither met the minimum size for both question and answer, or could not be merged adjacentwindows because there were none. We then removed any windows that still did not meetthe minimum size. We call these composite windows.

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (a) Minimum window size 50

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (b) Minimum window size 30

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (c) Minimum window size 10

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (d) Composites totaling size 50

    Figure 3: Color maps showing the average change in answer rates as the result of correctionsfor each question-and-answer pair at each minimum window size. Prompting words are thecolumns and response words the rows; bright colors indicate that question words force lowerrates of answer words, and so answer word rates are corrected to increase; dark colors indicatethe opposite. All maps are on the same color-based scale.

    We then applied the corrections, measured the average change induced at each stage ofthe correction, and compared it to the average change in the original data. We created colormaps that show the average correction to each question-answer pair (Figure 3). The patternsin 50-word windows degrade as the minimum window size is lowered, particularly to below30. The most notable degradation happened in the most promising question-answer pair –second-person pronouns prompting first-person singular pronouns. The composite windowsalso showed slight degradation, comparable to that in the 30-word windows.

    Table 2 shows how much of each dataset is usable at each window size. For Nurembergand Simpson, small windows predominate, and there are many stretches where one individ-ual answered many questions in a row, making them particularly amenable to the use of com-posite windows. With Nuremberg and Simpson the composite window technique allowedanalysis of much more data than 30-word windows. Furthermore, it caused less degradationthan the small window sizes which would otherwise have been needed to cover this muchdata. In Republican, windows tended to be larger, so the effect was less pronounced, but

    13

  • Republican Nuremberg SimpsonQ A Q A Q A

    Full 65,480 236,411 109,551 201,548 219,262 193,123≥ 10 55,161 159,450 75,313 155,395 45,162 71,357≥ 30 43,876 107,836 31,452 51,735 3,280 4,967≥ 50 33,193 72,078 15,394 22,366 323 350Comp 44,759 101,211 105,423 170,706 219,197 192,537

    Table 2: Number of words that could be included for each minimum window size

    the composite window technique still gave coverage and degradation comparable to that ofthe 30-word windows. For these reasons, we performed the rest of our analysis (in all threedatasets) with composite windows.

    3.6 Validation: Nuremberg and SVD

    We expected corrected data to show a sharper distinction than the original data betweenthe truthful and the deceptive. Using composite windows, we encoded the Nurembergdata and checked that it had similar properties to the Republican data. The magnitudeof correction in FPS and action words was quite similar in the two datasets, although theeffect on “but” the Nuremberg data was much smaller.

    We performed singular value decomposition on the answer data before and after perform-ing the correction on the Nuremberg dataset. This reduced the 6-category deception modelto 3 dimensions. We then made a scatter plot showing each window in the resulting semanticspace, expecting an increase in the distance between truthful and deceptive subgroups.

    Figure 4 shows the result of singular value decomposition on the Nuremberg databefore and after correction. The effect is the opposite of what was expected – the correctionerases most of the spatial distinction that existed between the groups. This suggests that,rather than being a confounding effect that should be removed to improve the detection ofdeception, the prompting effect contains important information about deception.

    3.7 Validation: Subgroups in the Nuremberg Data

    We analyzed each of the three Nuremberg subgroups separately, applying a correction toeach. Figure 5 shows color maps of this data. The differences between Defendants andUntrustworthy witnesses (both the Nazi subgroups) are not large, but the responsepatterns of the Trustworthy witnesses subgroup are quite different.

    Note that these color maps are on the same scale as the earlier Republican color maps. Wechose to present them this way in order to make comparisons easy across the different colormaps. However, by using a scale that works for the Republican data, we have obscuredan important fact about the Nuremberg data – namely that the average change in first-person singular pronouns prompted by second-person pronouns, for both the Defendants

    14

  • −0.04 −0.02 0 0.02 0.04 0.06 0.08−0.06

    −0.04

    −0.02

    0

    0.02

    0.04

    0.06

    0.08

    558

    559

    560

    561562

    563

    564

    565566

    567

    568569

    570571

    572

    573574

    575

    576

    577

    578

    579

    580

    581582

    583

    584

    585586 587588589

    590

    591592

    593

    594

    595

    596597

    598

    599

    600

    601

    602

    603

    604

    605606607

    608

    609

    610

    611

    612

    613

    614

    615616

    617

    618619

    620 621622

    623

    624625626 627

    628

    629630631

    632 633

    634

    635

    636

    637638 639640

    641

    642

    643

    644645646

    647648

    649650

    651652

    653

    654

    655

    656657

    658

    659660

    661

    662

    663

    664665

    666

    667

    668

    669

    670671

    672 673

    674

    675

    676677

    678679

    680681682 683684

    685

    686687

    688

    689690 691

    692

    693694

    695

    696

    697

    698699

    700

    701702

    703

    704

    705706

    707

    708

    709

    710

    711712713714

    715

    716

    717718

    719720 721

    722

    723

    724

    725

    726

    727728

    729

    730731

    732 733734735

    736

    737738

    739740

    741

    742

    743

    744

    745746

    747

    748

    749

    750751

    752

    753

    754

    755

    756

    757

    758

    759

    760

    761 762

    763764

    765766

    767

    768

    769770

    771

    772

    773

    774775776

    777

    778779

    780781

    782

    783

    784

    785

    786

    787

    788

    789

    790791

    792

    793

    794795

    796

    797

    798

    799

    800801802

    803

    804

    805

    806

    807808809 810

    811

    812

    813814

    815

    816817

    818 819

    820821

    822

    823

    824

    825

    826

    827

    828

    829

    830

    831 832

    833

    834

    835836

    837

    838

    839

    840841

    842 843844

    845846

    847

    848

    849

    850

    851

    852

    853854

    855

    856

    857

    858859

    860861

    862

    863

    864865

    866867

    868

    869870

    871

    872

    873874

    875876

    877

    878

    879

    880

    881882

    883

    884

    885

    886

    887

    888

    889

    890 891

    892

    893894

    895896

    897

    898899

    900

    901

    902 903904905

    906

    907

    908

    909910

    911

    912

    913

    914915

    916

    917918

    919

    920921

    922

    923

    924925 926

    927928

    929

    930931

    932

    933

    934935

    936

    937938939

    940

    941942

    943

    944

    945

    946 947

    948

    949

    950 951

    952

    953954955

    956

    957958

    959

    960

    961962

    963

    964965

    966

    967

    968

    969 970

    971972

    973

    974

    975976

    977978

    979

    980

    981

    982

    983

    984

    985

    986

    987

    988

    989

    990 991

    992993

    994

    995

    996

    997

    998

    999

    1000

    1001

    1002

    10031004

    100510061007

    1008

    1009

    1010

    1011

    10121013 101410151016

    1017

    1018

    10191020

    12

    3

    4

    5

    67

    8

    9

    10

    1112 1314

    15

    16

    17

    18

    19

    20

    212223

    24

    2526

    27

    2829

    30 31

    32

    333435 36

    3738

    39

    40

    414243

    444546

    84

    8586 87888990

    9192

    93

    949596

    97

    9899

    100

    101102103

    104

    105

    106

    107

    108

    109

    110111112 113

    114

    115

    116 117

    118 119120

    121122123

    124

    125

    126

    127

    128

    129

    130

    131

    132

    133 134

    135

    136137

    138139140

    141

    142143

    144

    145

    146

    147

    148149

    150

    151

    152

    153

    154155

    156157

    158

    159160

    161

    162 163164

    165166167168

    169

    170

    171172

    173

    174175

    176177

    178179180181

    182

    183184

    185186

    187

    188

    189

    190

    191

    192

    193

    194195

    196197

    198

    199

    200201

    202203

    204

    205206

    207

    208

    209

    306

    307

    308309

    310

    311

    312

    313

    314

    315

    316

    317

    318

    319

    320

    321322

    323

    324

    325326

    327

    328

    329

    330

    331

    332333

    334335

    336

    337

    338339340 341

    342343

    344 345346

    347 348

    349

    350351

    352

    353

    354355356357

    358

    359

    360

    361 362

    363

    364

    365

    366

    367368369

    370

    371

    372373

    374

    375

    376

    377

    378

    379380

    381382

    383384

    385

    386387

    388

    389

    390

    391 392

    393394

    395396

    397 398399

    400

    401

    402

    403404

    405

    406407

    408409410

    411412

    413

    414

    415

    416417

    418419

    420

    421

    422

    423424

    425

    426427

    428

    429

    430

    431

    432

    433

    434

    435 436437438

    439

    440

    441

    442

    443444

    445

    446

    447

    448

    449

    450

    451 452

    453

    454

    455 456457

    458

    459

    460461

    462

    463

    464

    465

    466 467468

    469

    470

    471472

    473

    474

    475

    476

    477478

    479

    480481482

    483

    484

    485

    486

    487

    488

    489

    490

    491

    492

    493 494

    495

    496

    497

    498

    499

    500501

    502

    503

    504

    505

    506

    507508

    509510

    511

    512

    513514515

    516

    517

    518 519 520

    521

    522523 524

    525

    526

    527

    528

    529

    530 531

    532

    533

    534

    535

    536

    537

    538 539

    540541

    542543

    544545546

    547

    548

    549550

    551552

    553554

    555

    556

    557

    4748 495051

    52

    53

    54

    555657

    585960

    6162

    63 646566 67

    68

    69

    70

    71

    72

    73

    74

    75

    76

    77

    7879

    80

    81

    82

    83

    210211212

    213

    214

    215

    216

    217

    218

    219

    220

    221222

    223

    224225

    226

    227

    228

    229230

    231

    232

    233234

    235236237

    238239

    240

    241

    242

    243

    244 245

    246 247

    248

    249

    250

    251252253254255

    256

    257258259

    260

    261262

    263264265

    266

    267

    268269270

    271272

    273

    274

    275

    276277 278

    279

    280

    281

    282283

    284

    285

    286

    287

    288

    289290

    291292

    293

    294

    295296

    297

    298

    299

    300 301

    302

    303

    304305

    1021

    1022

    1023102410251026

    1027

    1028

    1029

    1030

    1031

    10321033

    1034

    1035

    1036

    10371038

    1039

    1040

    1041

    1042

    104310441045

    1046

    10471048

    104910501051

    10521053

    1054

    1055

    105610571058

    1059

    1060

    10611062

    1063

    1064

    1065

    1066

    1067

    10681069

    1070

    1071

    1072

    1073

    10741075

    10761077

    1078

    1079

    108010811082

    108310841085

    1086

    1087

    10881089

    1090 1091

    10921093

    1094

    10951096

    1097

    1098

    1099

    1100

    (a) Before correction

    −0.04 −0.02 0 0.02 0.04 0.06 0.08

    −0.05

    0

    0.05

    558559

    560561

    562563

    564

    565566567

    568569

    570571

    572573

    574575

    576

    577

    578

    579

    580581

    582583

    584585

    586

    587588

    589

    590

    591

    592

    593

    594

    595

    596

    597

    598599600

    601

    602

    603

    604605

    606607

    608

    609610

    611612 613

    614

    615616

    617618

    619

    620621

    622

    623

    624

    625

    626

    627628

    629

    630

    631

    632633

    634

    635

    636

    637

    638

    639640

    641

    642

    643

    644

    645646

    647

    648649

    650

    651 652

    653

    654655656657658

    659

    660

    661

    662

    663

    664

    665

    666

    667668

    669

    670

    671

    672

    673674

    675

    676

    677

    678679680

    681

    682

    683684

    685686

    687

    688

    689

    690 691

    692

    693694695

    696

    697

    698699

    700701

    702703

    704

    705706

    707

    708

    709

    710

    711

    712713

    714

    715

    716

    717718719

    720721

    722

    723

    724

    725

    726

    727

    728

    729730

    731

    732

    733734

    735

    736

    737

    738

    739

    740 741

    742

    743744

    745746

    747

    748

    749

    750751

    752

    753

    754

    755756

    757758

    759760

    761762

    763

    764

    765

    766

    767

    768

    769 770

    771

    772

    773

    774

    775

    776

    777

    778

    779

    780

    781

    782

    783

    784785

    786 787788

    789 790

    791

    792793

    794

    795

    796

    797

    798

    799

    800801

    802

    803804

    805

    806

    807808

    809810

    811812

    813

    814

    815

    816817

    818

    819

    820

    821

    822

    823

    824

    825

    826

    827

    828

    829830

    831

    832

    833

    834

    835

    836

    837

    838

    839

    840

    841842843844

    845

    846

    847

    848849

    850

    851

    852

    853854

    855

    856

    857

    858

    859

    860861862

    863

    864

    865

    866

    867 868869

    870871

    872

    873

    874

    875876

    877

    878879880881882883884885

    886

    887

    888

    889

    890

    891

    892

    893894

    895896

    897 898899900901

    902

    903

    904

    905

    906907908

    909 910

    911

    912

    913

    914

    915

    916917

    918

    919

    920 921

    922 923

    924

    925

    926

    927

    928

    929

    930

    931

    932

    933934

    935

    936

    937

    938939

    940

    941942

    943

    944

    945946

    947

    948949

    950 951

    952

    953954955

    956957958

    959

    960

    961

    962

    963

    964

    965966

    967

    968969

    970

    971

    972

    973

    974

    975

    976

    977978

    979

    980

    981

    982

    983984

    985

    986 987988

    989

    990 991

    992

    993

    994

    995

    996

    997

    998

    999

    1000

    10011002100310041005

    1006

    10071008

    1009

    1010

    101110121013

    1014

    1015

    1016

    1017

    1018

    10191020

    1

    2

    3

    4 5

    6

    7

    8 910

    1112 13

    14 15

    16

    171819

    20

    21

    2223

    24

    2526

    272829

    30

    31

    32

    33

    34

    353637

    3839

    40

    41

    42

    43

    4445

    46

    84

    8586

    87

    888990 91

    92

    93

    949596 97

    9899

    100

    101102

    103

    104105

    106

    107

    108

    109

    110

    111

    112

    113

    114 115

    116117118

    119120121

    122123

    124

    125

    126127

    128

    129130

    131

    132

    133 134

    135136

    137138139

    140

    141

    142

    143144

    145

    146

    147148

    149150

    151152

    153

    154155

    156157

    158

    159

    160

    161

    162

    163

    164

    165166167

    168

    169

    170

    171 172173

    174175

    176

    177178

    179

    180181

    182

    183184

    185186187

    188189

    190191192

    193

    194195196 197

    198

    199

    200201

    202203

    204

    205

    206207 208

    209

    306

    307

    308

    309310

    311 312

    313

    314315

    316

    317318 319

    320

    321 322323

    324

    325

    326

    327

    328329

    330

    331332333

    334335

    336337 338

    339

    340

    341342343

    344345 346347

    348

    349350

    351

    352

    353354

    355

    356

    357

    358359

    360361

    362

    363 364

    365

    366

    367

    368

    369370 371372373374 375

    376377

    378379380

    381

    382

    383 384

    385

    386

    387

    388

    389

    390

    391392

    393394

    395

    396

    397

    398399

    400

    401

    402 403

    404

    405

    406

    407

    408409

    410411

    412

    413

    414415

    416417418

    419

    420

    421

    422

    423

    424425426427

    428

    429

    430

    431

    432433

    434435

    436

    437

    438439

    440441

    442

    443

    444

    445

    446

    447

    448

    449

    450

    451

    452

    453

    454455456

    457

    458459

    460

    461462

    463

    464

    465

    466467

    468

    469470

    471472473

    474

    475

    476

    477

    478479

    480 481482

    483484

    485

    486487

    488489

    490

    491

    492

    493494495

    496

    497

    498

    499

    500

    501

    502

    503

    504505

    506507

    508509

    510511 512513

    514

    515

    516

    517

    518 519520521

    522

    523

    524525

    526

    527

    528

    529

    530531

    532

    533

    534535536 537

    538

    539

    540

    541542

    543

    544545546 547

    548

    549550

    551 552

    553 554

    555

    556

    557

    474849

    505152

    53

    54

    5556

    57

    58

    59

    6061

    62

    63

    6465

    66

    67

    68

    69

    70

    71

    72

    73

    74

    75

    76

    77

    78

    79

    80

    81

    82

    83

    210

    211

    212213214

    215

    216

    217

    218

    219220

    221

    222

    223

    224

    225

    226

    227

    228

    229 230 231

    232

    233234

    235

    236237

    238

    239

    240

    241

    242

    243

    244245

    246 247248

    249

    250

    251252

    253254255256

    257

    258259

    260

    261

    262

    263264

    265

    266

    267

    268

    269

    270271

    272

    273

    274

    275

    276

    277

    278

    279

    280281 282

    283

    284

    285

    286287

    288

    289 290291

    292

    293

    294

    295

    296

    297

    298

    299

    300

    301

    302

    303

    304

    305 1021

    10221023

    1024

    10251026

    1027

    1028

    1029

    1030

    1031

    1032103310341035

    1036

    1037

    1038

    1039

    1040

    1041

    1042

    1043

    104410451046

    10471048

    1049

    1050

    1051

    10521053

    1054

    1055

    105610571058

    1059

    10601061

    1062

    10631064

    10651066

    1067

    1068

    1069

    1070

    1071

    10721073

    10741075

    1076

    1077

    1078

    1079

    1080

    1081 1082

    108310841085

    10861087

    108810891090

    1091

    1092

    1093

    1094

    1095

    1096

    1097

    1098

    1099

    1100

    (b) After correction

    Figure 4: Singular value decomposition of the responses in the Nuremberg dataset.Defendants are marked in red, Trustworthy witnesses in blue, and Untrustworthywitnesses in green. Before correction, the Trustworthy witnesses are concentrated onone side of the semantic space, but this is less the case after the correction.

    and Untrustworthy witnesses, is literally off the scale. Both of these are more thantwice the maximum amount shown by the color map; the Defendants’ change is somewhatlarger than the Untrustworthy witnesses. (Since we were using composite windows forthis, the Republican average change is also slightly higher than it was in the previouscolormaps, which were made with 50-word minimum windows.) Figure 6 shows this moreclearly. (This is not a contradiction of our earlier claim that first-person pronoun corrections,in words per window, were similar in both the Nuremberg and Republican datasets. TheNuremberg data had smaller windows, and it contained subgroups with both much higherand much lower rate statistics, on first-person pronouns, than the Republican data.)

    Seeing these large differences prompted us to look at individual distributions more closely.We overlaid the distribution from one subgroup onto the corresponding distributions for theother subgroups. This confirmed that different subgroups responded to prompting differently.They were not simply being prompted at different rates, or showing different rates of responseindependently of the prompt – many question word/ response word pairs showed completelydifferent distributions. In general, Defendants and Untrustworthy witnesses, thetwo deceptive groups, were similar, but the Trustworthy witnesses group was different.The strongest effects tended to involve first-person pronouns or action words. Four of themost striking sets of distributions are shown in Figure 7.

    The prompting effect must be receiver-state-dependent – deceptive individuals respondto prompting one way, and truthful individuals another. One of the particularly strongdifferences involves second-person pronouns prompting first-person singular pronouns, whichwas already one of the strongest effects. Correcting for prompting reduced, rather thanincreased, the difference between these groups, because the prompting itself is a factor that

    15

  • FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (a) Defendants

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (b) Untrustworthy

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (c) Trustworthy

    Figure 5: Corrections for the three Nuremberg subgroups analyzed separately.

    distinguishes them from each other. Rather than having a base rate of word use (due todeception or lack thereof) which is modulated by prompting, the different subgroups actuallyexperience different kinds of prompting – which means paying attention to the way in whichprompting occurs ought to further elucidate differences between them.

    These results explain the success of the ad hoc coding scheme used by Little and Skillicorn.Without knowledge of the rates of prompting words in questions, deceivers appear to increasetheir rates of first-person singular pronouns and exclusive words relative to less deceptiverespondents. In fact, similar levels of prompting in questions to both groups produces thesedifferentiated responses.

    16

  • REPUBLICAN DEFENDANTS UNTRUSTWORTHYTRUSTWORTHY−0.025

    −0.02

    −0.015

    −0.01

    −0.005

    0

    Figure 6: Average change in rates of first-person singular pronouns prompted by second-person pronouns in the Republican and Nuremberg datasets, all using composite win-dows.

    3.8 Validation: Random Forests

    Does information in the question actually improve the detection of deception or is thatinformation present in the answer, but in some way not captured by the Pennebaker model?We used random forests [2] to estimate the significance of question words. A random forestnot only classifies data but estimates the importance of each attribute in the data by countinghow often each attribute is selected to act as the split point for an internal node of a decisiontree of the forest.

    We trained random forests, one with the rates of only the six response word categories inthe answers, and one with these plus the rates of all the stimulus words in the questions. Forsimplicity, we included only the Defendants and Trustworthy witnesses subgroups.

    Because not every statement by Defendant is a lie, we did not expect high accuracyfrom either of these forests. Rather, we wanted to compare them to each other to see if thequestion words made a difference. If both questions and answers were predictive of deception,then the model that uses both should make better predictions, and it should select the wordsthat appear the most promising (i.e. first-person singular pronouns in answers and second-person pronouns in questions). If only the words in answers were relevant to deception, thenboth models should perform about the same.

    Figure 8 shows the performance of both random forests. While the answer-words-onlyrandom forest performs above chance, it shows a strong bias: its accuracy with Trustworthywitnesses responders is much lower than its accuracy with Defendants, that is it tendsto classify windows of both kinds as Defendants. Adding the question words improvesoverall accuracy by more than 10 percentage points – an even more dramatic result than weexpected. Moreover, it reduces bias by an even larger amount, producing an improvementon the Trustworthy witnesses without reducing the accuracy on Defendants.

    Table 3 shows the results of attribute ranking with the best-performing random forest.

    17

  • 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (a) Second-person pronounsprompting first-person sin-gular pronouns

    0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    (b) “Or” prompting first-person singular pronouns

    0.01 0.02 0.03 0.04 0.05 0.060

    0.002

    0.004

    0.006

    0.008

    0.01

    0.012

    0.014

    0.016

    0.018

    (c) First-person plural pro-nouns prompting “but”

    0.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 0.050

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    0.045

    0.05

    (d) “Or” prompting actionwords

    Figure 7: Gaussian distributions for the Nuremberg dataset, separated by subgroup.Defendants are red, Untrustworthy witnesses are magenta, and Trustworthywitnesses are blue. Not all of the figures are on the same scale.

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

    Chance

    Answers only

    Questions and answers

    NUREMBERG overallDEFENDANTSTRUSTWORTHY

    Figure 8: Prediction accuracy of random forests trained on the Nuremberg data

    The most influential words in the question-and-answer random forest were as predicted:first-person singular pronouns in the answer and second-person pronouns in the question,with similar frequencies. It is likely that the interaction between these two categories drovemany of the decision trees in the forest.

    18

  • Word category SplitsA first-person singular pronouns 10771Q second-person pronouns 10563Q “wh” words 9581Q “these”, “those”, and “to” 9277Q first-person singular pronouns 6839A “but” 6584A action words 6581Q “or” 4934A “or” 4692Q action words 4165Q third-person plural pronouns 3973Q first-person plural pronouns 3641A misc exclusive words 3570Q third-person singular pronouns 3383Q “but” 3119A negative emotion words 2254Q misc exclusive words 1538Q negative emotion words 871

    Table 3: Word categories in the random forest trained with question-and-answer data, rankedby the number of splits in the model that use each word.

    After the top two spots, the three next most important word categories were all questionwords, suggesting that the forest not only supplemented its reasoning with question words,but actually made more decisions based on question words than on answer words. However,a small overrepresentation of question words is to be expected given that we are counting alarger number of word categories in the question than in the answer. Also, in some cases aquestion word may give context to an answer word and thus needs to be considered first.

    3.9 Validation: the Simpson dataset

    Our final task was to see whether these validations generalized. We looked at the Simpsondataset and performed the same analysis.

    The SVD results from Nuremberg generalized to the Simpson data: while a modestvisual distinction existed in semantic space between Simpson and the Plaintiffs, applyingour correction method reduced this distinction (Figure 9). The average change in words perwindow for each answer word category was very similar to that of the Nuremberg data.

    The distinction between subgroups did not generalize as well. While Simpson’s depositionwas somewhat different from those of the Plaintiffs (Figure 10), the two color maps did notdiffer in the same ways that the Nuremberg color maps differed. Looking at the overlaidsubgroups for individual question-answer pairs (Figure 11) was also different: there was

    19

  • −0.15 −0.1 −0.05 0 0.05 0.1 0.15 0.2

    1022

    1097

    1046

    285

    71

    39

    1029978

    608

    10941117

    967

    1026

    1083

    1001694

    358

    160

    76

    9162

    282

    214

    303

    461622606

    692

    7481085

    125

    471

    or346

    1134

    1003

    1100

    I

    221890607

    38211024237

    509546943153

    1028609

    9451143270741

    137

    7766741058

    1107

    431

    296

    653

    1059

    837334860549754518

    199

    146813428

    103710491123

    747

    1032107811191104

    178

    875801807

    354

    1109391

    1157but10421010

    1215479

    434

    2791219

    1053107911811060104710551039532122610024119869844689231207501381996635981068443108211781051211112021552610884071034100612042992832183533941102106311111103478103510271048100011322671223418476971311105011159491005946531903107416194840551004413101298310866781056111366210382798899712181188121170310706193639954609981011111411161098111011336853008731031111839292197610431205841374889975749842950189601052595117412309938488561009159985100768663897366410648671019896680974677107797030461194052218442168992120069984028622255311121108104446459652122233641412201033110639954210181001065456832102011211148110141926595411984845749342744221525851045868102390712161688767545811254811146572415411017309833104059445244710418514746563648053769098799112410714827973016414702959686271073110510729949797207626404279592518924035923877360296512092761551180120640825960410872005549573228436199098711212021030

    634

    48316919025863617109696668842910541228189121618437932587134291991224691341371171383700883991117621329440449188572238917260544906951106264385211901403751013103896925453856695311561080779292377705704551120315752778480031880811916488878039773022291227387631689787768377169810146762911221072881231472409761880183120884663118110669586422547451642071309995627751159

    783

    2716031967781423511236103694795691239672350761511317183077294336469193423081541155119244969715881186180611778506873575339758198935511971175918713650639416780106778533093175972564541512312359335888622933241016102191585386596226011282623526637984983980958650980465939584701502469109198272110764417867118188981210113881512173987275088728742891069726297107511728368122801184810212913440570552688042521179911637excl1227917121317611471706748666325592410611836472249522505677271213853651586376337342693778783518829861014511528174488214160175417410112962012011234402109998190069536649427371576968164475119480290894306114272462110541187904858118248796311779012328298343172561135766847393315855390109275681320314582329333908669777101108484511863868441095312593770753335684123311491931152972455261614119934575737878966827762992992881923290121624359644841573120115862315793327102348589624626240288307917506894642639029142306544062014661015510143895706655737475798141471025560649899795261189116357349117311371057854796755672106193687816778127812259057021821331442081398172235007167172644898438884346309101130798738690667535513286671467326233116451256541261620992549566682311820494292010557811463841115684461193828450113956956342575499709445143881868697352756411451287733974001194162964371246161948161772497647096614320654549673979486649394203681167205728173523122148524635211151734370528257118561321710811090174220838733477424724575033596368228324136116523548871406732253953441631141558210234122912452552053552113513670492148423498305455011277857253736613115074059820886151444350661160116643674374212623924749751651959054011611266964011211059880285796002318646511953070820325358011945151754311959392613121916573012711684757527686597609671457553036210897659169381196185108826736114068283111542725053565917582212123671144658504747678251911795997075764321136557149202524248313732881799340544382

    93077478853911621347244641923724919845480352109347385985749074655945334388561633597488951971667318914231923689625529555227420462100874522890neg11710775116933698211701805811611169325338171331226269138212171214331153660612323

    28484

    360

    78296153425430577632321485

    601

    824245332380719

    310

    129941

    56728779257166567931893435

    156action18750614385

    439556

    365

    150

    104275

    337242

    361

    235238135

    109328

    955

    442339

    628

    927

    69

    445

    65

    935583

    87092

    888

    (a) Before correction (b) After correction

    Figure 9: Singular value decomposition of responses in the Simpson dataset before and aftercorrection, with O.J. Simpson’s responses in red and Plaintiffs in blue. The difference isvery small.

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (a) Simpson

    FPS but or excl neg action FPP SP wh TPS TPP these

    FPS

    but

    or

    excl

    neg

    action −6

    −4

    −2

    0

    2

    4

    6

    x 10−3

    (b) Plaintiffs

    Figure 10: Corrections for the two O.J. Simpson subgroups analyzed separately.

    evidence of differences between the two groups, as there had been with Nuremberg, but notin quite the same way – they tended to be slightly weaker. Moreover, the strongest differencesbetween subgroups in the Simpson data did not usually correspond with the strongestdifferences in the Nuremberg data. In particular, the relationship between second-personpronouns in the question and first-person singular pronouns in the answer appeared muchweaker between subgroups – Simpson used more first-person singular pronouns but was also

    20

  • 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    (a) Second-person pronounsprompting first-person sin-gular pronouns

    0.01 0.02 0.03 0.04 0.05 0.060

    0.02

    0.04

    0.06

    0.08

    0.1

    0.12

    0.14

    0.16

    0.18

    0.2

    (b) “Or” prompting first-person singular pronouns

    0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.160

    0.005

    0.01

    0.015

    0.02

    0.025

    0.03

    0.035

    0.04

    (c) Second-person pronounsprompting “but”

    0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.180

    0.01

    0.02

    0.03

    0.04

    0.05

    0.06

    0.07

    0.08

    (d) Third person singularpronouns prompting actionwords

    Figure 11: Gaussian distributions for the Simpson dataset, separated by subgroup. Simpsonis red and Plaintiffs are blue. Not all of the figures are on the same scale.

    prompted more.

    0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

    Questions and answers

    Answers only

    Chance

    PlaintiffsSimpsonOverall

    Figure 12: Prediction accuracy of random forests trained on the Simpson data

    Finally we constructed the same two random forests using the Simpson data. Therandom forest trained with both question and response words showed an increase in accuracy,but a smaller one than that for the Nuremberg data (Figure 12). In general, the Simpsondata supports the view that prompting effects exist, but it is less clear where the prompting

    21

  • effects actually are.

    4 Discussion

    At present, neither humans nor computers are particularly good at detecting deception.Models that perform with 70% accuracy (as in Mihalcea and Strapavara’s LIWC-basedstudy [38]) are considered promising. But we can hardly afford for three in ten accusedcriminals to be falsely classified as deceptive. We also cannot rely on our own human abilityto detect deception, since that ability is quite weak. Improvements to existing deceptionmodels are desperately needed.

    We have increased the understanding of the processes that occur when a deceptive personis questioned. A prompting effect influences the rates of word usage in answers to questions,but the effect is receiver-state-dependent.

    We developed a method to correct for prompting effects and so remove its influence. Thismethod contains two components: a corrected representation of an utterance, and the sizeof the corrective change that is made. Each of these components might be useful in differentsituations. In our setting, the corrected representation was not useful because it removedsome of the distinction between deceptive and truthful subgroups. However, tracking thesize of the change was useful as it allowed us to see at a glance the similarities and differencesin prompting effects between different subgroups.

    The way in which respondents respond to the prompting effect is, in itself, a cue todeception. We support this conclusion by building random forests to classify deceptive andtruthful subgroups, finding that the random forests performed substantially better whenthey were given both question data and answer data. In this situation, the random forestsperformed at 70-80% overall accuracy.

    This suggest a number of avenues for ongoing research in deception. First, as the randomforest results show, paying attention to both questions and answers will improve word-basedmodels – even without detailed understanding of what the effect of question language is.Other classification approaches to deception will almost certainly benefit from including thewords of questions in the data.

    Beyond this, these results suggest what we ought to look for in the questions. Weknow that the prompting effect is receiver-state-dependent. So tracking the nature of theprompting effect across different groups of respondents in similar situations should illuminatedifferences between respondents. We have made a start at showing which relationshipsbetween words are most useful in such analysis – in particular, second-person pronounsprompting first-person singular pronouns.

    Deception is not the only setting in which first-person pronouns are relevant. Any variablestudied using bag-of-word approaches might be assessed in dialogue settings. In any suchsetting, it seems probable that paying attention to both question words and answer wordswill improve accuracy. It may be the case that people of different personalities, sexes, or agesrespond to prompting differently – which means that taking question words into account willimprove our ability to profile people in dialogues, for example, if a participant’s identity in

    22

  • a computer-mediated chat setting is uncertain. Meanwhile, in the study of relationships –useful, for example, in analysis of social networks – there may be rich and interesting effectsto uncover from the association between relationship style or quality and the way the peopleinvolved respond to each other’s prompting. On the other hand, there may be settings ofthis nature in which all the interesting subgroups respond to prompting in the same way. Inthat case, our method for removing the influence of the question may prove useful.

    4.1 Limitations

    Our analysis is biased towards common word categories. The “average change” metric mea-sures not only the strength of a relationship between two pairs of words but how frequentlythat relationship actually appears. We defend this choice of metric by pointing out that themore a model relies on common words the more applicable it will be to small windows.

    There are almost certainly important words that do not appear in this analysis. The Pen-nebaker model has been extensively validated, but other bag-of-word models have emerged.Hauch et al.’s recent meta-analysis [28] supports all the categories of the Pennebaker modelbut suggests several other cues to deception: increased positive and overall emotion words(as Zhou et al. found [60]), increased negations (“not”, “never”), decreased third-personpronouns, and slightly decreased tentative and time-related words.

    The use of bag-of-words implies the treatment of words as forms without any semantics.There are therefore potential confounds associated both with polysemy, and with the use offunction words in a stylized, rather than active, way (for example, whether “thank you” isconsidered to contain an active second-person pronoun or not).

    The prompting effects in the Simpson dataset were somewhat different from those inthe Nuremberg dataset and generally smaller. The random forest model also showed asmaller improvement when question data was added. Intuitively, either the Simpson datasetunderrepresents the difference between truthful and deceptive testimony, or the Nurembergdataset overrepresents it – or both. This illustrates the difficulty of comparing deceptive andtruthful people across different contexts. In the Simpson dataset, it is plausible that the dataunderrepresents the difference between deception and truthfulness because the Plaintiffswere possibly not entirely truthful. They had a stake in the outcome, and were possiblymotivated towards persona deception. Despite DePaulo et al.’s recommendations, littleresearch has been done on the communication of people who are probably truthful, buthighly motivated to “spin” the truth. Until such research is done, applying the deceptionmodels to civil suits remains problematic.

    In the Nuremberg dataset, the difference between truthfulness and deception may beexaggerated because of other differences between the Nazis and the Trustworthy wit-nesses. These groups generally spoke different first languages and were asked differenttypes of questions; the Trustworthy witnesses were often given perfunctory cross-examination. Care must be taken in interpreting these differences in questions. As Hancocket al. [27] discovered, deception is a process involving both the questioner and respondent.If truthful and deceptive respondents are questioned differently, it may partly be becausethe questioner is biased – or it may be that the questioner is responding unconsciously to

    23

  • deceptive speech patterns. To make matters worse, it may be both. As for demographicdifferences such as language and nationality, these cannot be ruled out as a potential con-found, but their influence is probably small: researchers such as Fornaciari et al. [23] whotried to restrict their datasets to remove such confounds found that it did not increase theaccuracy of their models. Nevertheless, the testimony of many of these witnesses was trans-lated, and this may have distorted the language patterns, and certainly distorted the timingof questions and responses.

    Any deception model used in the field will be faced with this type of confound: it willbe used on men and women, people from different cultures, people experiencing differentemotions and confronted by different kinds of questioning, even people with mental illnessesor disabilities that affect their word choice. No deception model should be considered usablefor practical purposes unless it has been tested across all these different kinds of categories.

    References

    [1] American Broadcasting Company. Full transcript: ABC News Iowa Republican debate,December 11 2011. Accessed in winter 2012 at http://abcnews.go.com/Politics/full-transcript-abc-news-iowa-republican-debate/.

    [2] L. Breiman. Random forests. Machine Learning, 45:5–32, 2001.

    [3] A.S. Brown and D.R. Murphy. Cryptomnesia: Delineating inadvertent plagiarism.Journal of Experimental Psychology: Learning, Memory, and Cognition, 15(3):432–442,1989.

    [4] Cable News Network. Full transcript of CNN-Tea Party Republican debate, 20:00-22:00, September 12 2011. Accessed in fall 2011 at http://transcripts.cnn.com/TRANSCRIPTS/1109/12/se.06.html.

    [5] Cable News Network. Republican debate, June 13 2011. Accessed in fall 2011 athttp://transcripts.cnn.com/TRANSCRIPTS/1106/13/se.02.html.

    [6] Cable News Network. Full transcript of CNN Arizona Republican presidential de-bate, Feburary 22 2012. Accessed in winter 2012 at http://archives.cnn.com/TRANSCRIPTS/1202/22/se.05.html.

    [7] Cable News Network. Full transcript of CNN Florida Republican Presidential debate,January 26 2012. Accessed in winter 2012 at http://archives.cnn.com/TRANSCRIPTS/1201/26/se.05.html.

    [8] J.R. Carlson, J. F. George, J.K. Burgoon, M. Adkins, and C.H. White. Deception incomputer-mediated communication. Group Decision and Negotiation, 13:5–28, 2004.10.1023/B:GRUP.0000011942.31158.d8.

    24

  • [9] T. L. Chartrand and R. van Baaren. Human mimicry. Advances in Experimental SocialPsychology, 41, 2009.

    [10] The Chicago Sun-Times. CNN Republican debate, Nov. 22, 2011. Transcript. Ac-cessed in fall 2011 at http://blogs.suntimes.com/sweet/2011/11/cnn_republican_debate_nov_22_2.html.

    [11] The Chicago Sun-Times. CBS/National Journal GOP debate. Transcript, video, Novem-ber 13 2011. Accessed in fall 2011 at http://blogs.suntimes.com/sweet/2011/11/_cbsnational_journal_gop_debat.html.

    [12] The Chicago Sun-Times. CNBC Republican debate. Transcript, video highlights,November 9 2011. Accessed in fall 2011 at http://blogs.suntimes.com/sweet/2011/11/cnbc_republican_debate_transcr.html.

    [13] The Chicago Sun-Times. Republican Las Vegas CNN debate: Transcript, Octo-ber 19 2011. Accessed in fall 2011 at http://blogs.suntimes.com/sweet/2011/10/republican_las_vegas_cnn_debat.html.

    [14] The Chicago Sun-Times. GOP NH ABC/Yahoo News debate: Transcript, January 82012. Accessed in winter 2012 at http://blogs.suntimes.com/sweet/2012/01/gop_nh_abcyahoo_news_debate_tr.html.

    [15] The Chicago Sun-Times. GOP NH NBC’s Meet the Press/Facebook debate: Transcript,January 8 2012. Accessed in winter 2012 at http://blogs.suntimes.com/sweet/2012/01/gop_nh_nbcs_meet_the_pressface.html.

    [16] The Chicago Sun-Times. South Carolina GOP CNN debate, Jan. 19, 2012. Transcript,January 20 2012. Accessed in winter 2012 at http://blogs.suntimes.com/sweet/2012/01/south_carolina_gop_cnn_debate_.html.

    [17] C. Chung and J. Pennebaker. The psychological functions of function words. InK. Fiedler, editor, Social Communication, pages 343–359. New York: Psychology Press,2007.

    [18] Council on Foreign Relations. Republican Debate Transcript, Tampa, Florida, Jan-uary 2012. Accessed in winter 2012 at http://www.cfr.org/us-election-2012/republican-debate-transcript-tampa-florida-january-2012/p27180.

    [19] S. Deerwester, S. T. Dumais, G.W. Furnas, and T.K. Landauer. Indexing by latentsemantic analysis. Journal of the American Society for Information Science, 41:391–407, 1990.

    [20] B.M. DePaulo, D.A. Kashy, S.E. Kirkendol, M.M. Wyer, and J.A. Epstein. Lying ineveryday life. Journal of Personality and Social Psychology, 70(5):979–95, 1996.

    25

  • [21] B.M. DePaulo, J.J. Linday, B.E. Malone, L. Muhlenbruck, K. Charlton, and H. Cooper.Cues to deception. Psychological Bulletin, 129:74–118, 2003.

    [22] P. Ekman and M. O’Sullivan. Who can catch a liar? American Psychologist, 46(9):913–920, September 1991.

    [23] T. Fornaciari and M. Poesio. On the use of homogenous sets of subjects in deceptivelanguage analysis. In Proceedings of the Workshop on Computational Approaches toDeception Detection, 13th Conference of the European Chapter of the Association forComputational Linguistics, pages 39–47, April 23 2012.

    [24] Fox News. Complete text of the Iowa Republican debate on Fox News channel, August12 2011. Accessed in fall 2011 at http://foxnewsinsider.com/2011/08/12/full-transcript-complete-text-o