Data Visualisation for Supporting Linguists in the ...

12
Data Visualisation for Supporting Linguists in the Analysis of Toxic Messages Ecem Kavaz 1 , Anna Puig 1 , Inmaculada Rodriguez 1 , Mariona Taule 2 , and Montserrat Nofre 1 1 Department of Mathematics and Computer Science, University of Barcelona 2 Department of Catalan Philology and General Linguistics, University of Barcelona Barcelona, Spain [email protected] Keywords: Data visualisation, Corpus Annotation, Toxicity, Hate Speech Abstract: The goal of this research is to provide linguists with visualisations for analysing the results of their hate speech annotation. These visualisations consist of a set of interactive graphs for analysing the global distribution of annotated messages, finding relationships between features, and detecting inconsistencies in the annotation. We used a corpus that includes 1,262 comments posted in response to different Spanish online new articles. The comments were annotated with features such as sarcasm, mockery, insult, improper language, construc- tivity and argumentation, as well as with level of toxicity (’not-toxic’, ’mildly toxic’, ’toxic’ or ’very toxic’). We evaluated the selected visualisations with users to assess the graphs’ comprehensibility, interpretability and attractiveness. One of the lessons learned from the study is the usefulness of mixed visualisations that in- clude simple graphs (Bar, Heat map) - to facilitate the familiarisation with the results of the annotated corpus together with more complex ones (Sankey, Spider or Chord) - to explore and identify relationships between features and to find inconsistencies. 1 INTRODUCTION Social media have become a powerful tool for many people for self-expression as they can share their voices and opinions freely even anonymously if desired. These platforms let people to use their freedom of speech very actively and effortlessly from the comfort of their homes and at any time (Paschalides et al., 2019). As the use of social media has increased, the multimedia data available has also increased, which provides researchers with greater opportunities to examine these data. On a daily basis, millions of messages are created and shared on different online platforms (Chen et al., 2017). The news comment section offered by some online news- papers is one of the possible spaces in which readers can express their opinions, although sometimes these opinions can be conveyed in an aggressive, Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted with- out fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redis- tribute to lists, requires prior specific permission and/or a fee. offensive or inappropriate manner, especially when they are given anonymously or under a false name. This offensive, abusive or toxic language can be labelled as hate speech. It can be simply described as a kind of speech that attacks a person or a group based on characteristics such as race, religion, ethnic origin, national origin, gender, disability, sexual orientation, or gender identity (Gagliardone et al., 2015). In this context, frameworks and tools for automatically clas- sifying messages are becoming ever more essential for detecting the trends and spreading patterns that will help to identify anomalous behaviour and hate speech (Florio et al., 2020). In recent years, methods for the automatic classification of hate speech messages have been widely studied in different social networks and areas (Paschalides et al., 2019), (Grosman et al., 2020). The quality of these frameworks depends greatly on the algorithms used in NLP (Natural Language Process- ing), but also on having access to a sufficiently large corpus of annotated messages in the training steps of these algorithms (Fr´ enay and Verleysen, 2013). This training dataset usually consists of messages (includ- ing tweets and comments), which are manually an- notated by humans. Indeed, the quality of this man- ISSN 2464-4617 (print) SSN 2464-4625 (DVD) Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings 59 ISBN 978-80-86943-34-3 DOI:10.24132/CSRN.2021.3101.7

Transcript of Data Visualisation for Supporting Linguists in the ...

Page 1: Data Visualisation for Supporting Linguists in the ...

Data Visualisation for Supporting Linguists in the Analysis of ToxicMessages

Ecem Kavaz1, Anna Puig1, Inmaculada Rodriguez1, Mariona Taule2, and Montserrat Nofre1

1Department of Mathematics and Computer Science, University of Barcelona2Department of Catalan Philology and General Linguistics, University of Barcelona

Barcelona, [email protected]

Keywords: Data visualisation, Corpus Annotation, Toxicity, Hate Speech

Abstract: The goal of this research is to provide linguists with visualisations for analysing the results of their hate speechannotation. These visualisations consist of a set of interactive graphs for analysing the global distribution ofannotated messages, finding relationships between features, and detecting inconsistencies in the annotation.We used a corpus that includes 1,262 comments posted in response to different Spanish online new articles.The comments were annotated with features such as sarcasm, mockery, insult, improper language, construc-tivity and argumentation, as well as with level of toxicity (’not-toxic’, ’mildly toxic’, ’toxic’ or ’very toxic’).We evaluated the selected visualisations with users to assess the graphs’ comprehensibility, interpretabilityand attractiveness. One of the lessons learned from the study is the usefulness of mixed visualisations that in-clude simple graphs (Bar, Heat map) - to facilitate the familiarisation with the results of the annotated corpustogether with more complex ones (Sankey, Spider or Chord) - to explore and identify relationships betweenfeatures and to find inconsistencies.

1 INTRODUCTION

Social media have become a powerful tool formany people for self-expression as they can sharetheir voices and opinions freely even anonymouslyif desired. These platforms let people to use theirfreedom of speech very actively and effortlesslyfrom the comfort of their homes and at any time(Paschalides et al., 2019). As the use of social mediahas increased, the multimedia data available has alsoincreased, which provides researchers with greateropportunities to examine these data. On a dailybasis, millions of messages are created and shared ondifferent online platforms (Chen et al., 2017). Thenews comment section offered by some online news-papers is one of the possible spaces in which readerscan express their opinions, although sometimesthese opinions can be conveyed in an aggressive,Permission to make digital or hard copies of all or part ofthis work for personal or classroom use is granted with-out fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bearthis notice and the full citation on the first page. To copyotherwise, or republish, to post on servers or to redis-tribute to lists, requires prior specific permission and/or afee.

offensive or inappropriate manner, especially whenthey are given anonymously or under a false name.This offensive, abusive or toxic language can belabelled as hate speech. It can be simply described asa kind of speech that attacks a person or a group basedon characteristics such as race, religion, ethnic origin,national origin, gender, disability, sexual orientation,or gender identity (Gagliardone et al., 2015). In thiscontext, frameworks and tools for automatically clas-sifying messages are becoming ever more essentialfor detecting the trends and spreading patterns thatwill help to identify anomalous behaviour and hatespeech (Florio et al., 2020).

In recent years, methods for the automaticclassification of hate speech messages have beenwidely studied in different social networks and areas(Paschalides et al., 2019), (Grosman et al., 2020). Thequality of these frameworks depends greatly on thealgorithms used in NLP (Natural Language Process-ing), but also on having access to a sufficiently largecorpus of annotated messages in the training steps ofthese algorithms (Frenay and Verleysen, 2013). Thistraining dataset usually consists of messages (includ-ing tweets and comments), which are manually an-notated by humans. Indeed, the quality of this man-

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

59 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

Page 2: Data Visualisation for Supporting Linguists in the ...

ual annotation is a key point to ensure the success ofthe whole process. Annotation involves processinga large number of messages and tends to become adifficult and time-consuming task plagued by errorsand inconsistencies. Linguists usually follow a well-controlled methodology in which a single message isannotated by several annotators (preferably experts).Afterwards, agreement must reached for all the anno-tations. Detecting errors, trends and inconsistenciesefficiently in the individual and the agreed-upon an-notations can be helpful to speed up and to guaranteethe quality and reliability of the final annotation.

Considering the aforementioned aspects, whichmake annotation a complex and challenging task, datavisualisation can be a helpful method that allows lin-guists to analyse the results. Viewing data as merenumbers conveys little meaning, whereas data visu-alisation helps people to process information moreeasily(Knaflic, 2015). Well-designed interactive datavisualisations can appeal to people effortlessly (Wuet al., 2016). However, the design of the most suit-able data visualisation in a particular context is not aneasy task.

The goal of this research is to provide annotatorswith a set of visualisations for analysing the results oftheir hate speech annotation. We use the NewsCom-TOX corpus, which consists of comments posted inresponse to different Spanish online news articles an-notated with toxicity. Concretely, we contribute with:(1) a set of interactive graphs to allow the annotatorsto see the global distribution of comments, find re-lationships between features and detect possible er-rors and inconsistencies in the annotation, and (2) thelessons learned from a preliminary user evaluation ofthe proposed visualisations to detect the most usefulgraphs for linguists.

2 RELATED WORK

In this section we first present research worksaimed at using data visualisation for the monitoring ofautomatic annotation systems. We place the focus onthe visualisation techniques used by the authors. Wethen consider works that are aimed at supporting lin-guistic annotators through meaningful visualisations.

2.1 Visualisations for monitoringautomatic annotation systems

Visual analytics for automatic annotation systemsaims to identify valuable information in social data.Concretely, the following research works analyseanomalous user behaviour, anomalous information

spread and the use of toxic language. (Shi et al.,2019) carried out a survey on the visual analytics ofanomalous user behaviours. The survey revealed fourtypes of user behaviours, including social interaction,travel, network communication and transactions. Foreach of the four types of user behaviours, the authorsanalysed trends in common data types, anomaly de-tection techniques, visualisation techniques and inter-action methods. Our research is focused on the firsttype of user behaviour, social interaction, i.e. thecommunication of ideas and opinions between peoplein online newspapers.

Fluxflow (Zhao et al., 2014) is an interactive datavisualisation system designed to display and evalu-ate anomalous information spread (rumours and mis-information) on Twitter. The novel visualisation de-sign consisted of packed circles (retweets) arrangedalong a timeline showing how an original messagespreads among people over time. The size of the cir-cles symbolised the power of the influence of a userand the colour represented an anomaly score. Simi-larly to FluxFlow, Episogram (Cao et al., 2015) wasdesigned to analyse retweeting behaviours on Twit-ter. It showed the activity of each person separatelyand every message from each person separately in theform of single lines on a timeline. Nevertheless, thisvisualisation caused cluttering, making it hard to un-derstand at first sight.

RumorLens (Resnick et al., 2014) was created tohelp journalists to detect the diffusion of rumours ononline social media and, once again, Twitter was thesource of data for this research. The authors used aSankey diagram to effectively summarise the diffu-sion of a rumour since it makes it very simple to fol-low and see people’s decisions regarding posting ornot posting the rumour.

Mandola (Paschalides et al., 2019) was designedwith NLP and ML techniques to monitor, and detectonline-hate speech on online social media. The Man-dola dashboard included the so called Hate-map andHotspot Map visualisations, both showing findings ona world map. While Hate-Map displays hate data withheat spots in certain countries, Hotspot Maps have acolour scale representing six levels of hate speech.There was also a Heat map that displays hate speechin five topic areas, separated by years. Mandola isclose to our research because of its analysis of hatespeech. The difference resides in that our goal is touse visualisations to support linguists in the annota-tion process, whereas Mandola aimed to use visuali-sations for the monitoring of an automatic annotationsystem.

Overall, previous research focused on novel ap-proaches for monitoring automatic annotation sys-

2

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

60 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 3: Data Visualisation for Supporting Linguists in the ...

tems using different visualisation techniques, such asSankey diagrams and Heat maps. In this paper, weuse these and other types of graphs to help linguiststo visualise the results of their annotations.

2.2 Visualisations for supportinglinguist annotators

Several tools and platforms (i.e. set of tools) are avail-able to support the task of annotating a corpus. All ofthem provide basic functionality for data annotation.Nevertheless, some of them do not support more ad-vanced functionalities, such as the management of theinter-annotator agreement. The inter-annotator agree-ment is a measure of how well two (or more) anno-tators can make the same annotation decision for acertain feature. This measure may impact the qualityand efficacy of the annotation process.

For instance, Brat (Stenetorp et al., 2012) is amainstream annotation tool that does not allow forseveral annotators, with a consequent non-supportof inter-annotator agreement. Another tool is MAT,which supports the annotation and the managementof multiple annotators through a web interface (MAT,2020). IBM Watson Knowledge Studio, which is in-tegrated in the well-known Watson platform, includesan annotation tool, for creating a training corpus thatis well designed and documented (Watson, 2020).

The above-mentioned tools provide support to theannotators but they do not use data visualisations.In contrast, Eras (Grosman et al., 2020) and We-bAnno (Yimam et al., 2013), use data visualisationto show, for example, the results of annotators agree-ment through a Heat map. The visualisations wepresent in this paper are in line with these two frame-works since we also aim to facilitate and improve an-notators’ work by means of new, meaningful visuali-sations.

3 USED DATA

In this section we describe the NewsCom-TOXcorpus, the dataset used for developing the visuali-sations, and the annotation tagset used.

The NewsCom-TOX corpus consists of 1,262comments posted in response to different articles ex-tracted from Spanish online newspapers from August2017 to May 2019 annotated with toxicity. The ar-ticles selected cover four different topics -economy,politics, religion and immigration- and the commentswere selected in the same order in which they appearin the time thread in the web. Those comments that

were duplicated were removed. Table 1 shows the dis-tribution of comments per topic and the correspond-ing newspaper from which they were obtained.

Topic Comments NewspaperEconomy 309 La Informacion, El PaısPolitics 239 Huffpost, La VanguardiaReligion 298 Xataca CienciaInmigration 416 El ConfidencialTotal 1262

Table 1: Distribution of comments per topic

In order to have a balanced representation of com-ments per topic, two different news articles wereneeded in the case of economy and immigration-related topics. Articles were selected to potentiallylead to controversy with the aim of finding commentswith opposing opinions and examples of toxic lan-guage. Toxicity is difficult to define, possibly becauseit can be expressed at different levels and in differentways (Ross et al., 2017), (Davidson et al., 2017), (For-tuna and Nunes, 2018). In order to reflect this diver-sity in the expression of toxicity, the proposal is to as-sign different levels of toxicity, indicating whether thecomment is ’not toxic’, ’mildly toxic’, ’toxic’ or ’verytoxic’. With the aim of reducing the subjectivity in theannotation and, therefore, also the disagreement be-tween annotators, we propose first annotating differ-ent linguistic features such as sarcasm, mockery, in-sult, improper language, constructivity and argumen-tation. These binary features allow us to discriminatethe level of toxicity of the comments. Furthermore,some of these features can be correlated, for instanceargumentation and constructivity, insult and improperlanguage, and these correlations are also useful whenassigning the level of toxicity. Our hypothesis is thatthe combination of these features helps to determinethe level of toxicity in a more objective way. Thetagset used for the annotation of comments with toxi-city is the following:

• <argumentation>: indicates that the commentgives arguments or reasoned explanations orgrounds opinions with evidences.

• <constructivity>: a comment is constructivewhen it is respectful and polite (regardless ofwhether it is in favour or against the content of thearticle or of another comment)1, when it intendsto create an enriching and useful dialogue, whenit contributes with new knowledge, ideas and pro-

1We can find two types of comments, those that com-ment on some specific or general aspect of the article, orthose that are responses to another comment.

3

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

61 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 4: Data Visualisation for Supporting Linguists in the ...

Figure 1: Visualisations of Global Data Distribution. Bar chart and Heat map.

posals and offers new perspectives and insights toapproach the subject.

• <sarcasm>: a comment is sarcastic when thecontent is ironic -that is, when the writer useswords that mean the opposite of what he reallywants to say- and when it is accompanied by aharsh, sharp and negative criticism and made inbad faith. Ironic comments without intention tocause pain (without a negative load) are not con-sidered toxic and are tagged as <sarcasm=no>.

• <mockery>: indicates that the commentridicules, mocks or humiliates a person or group.

• <insult>: indicates that the comment containsone or more insults or slurs with the intention tooffend a person or group.

• <improper language>: indicates that the com-ment contains language not consider to be properor that is vulgar and impolite and/or which in-cludes rude words.

• <toxicity>: a comment is toxic when it attacks,denigrates or disqualifies a person or group on thebasis of certain characteristics such as race, eth-nicity, nationality, religion, gender and sexual ori-entation, among others. This attack can be ex-pressed in different ways -directly (through insult,mockery and inappropriate humour) or indirectly(for instance through sarcasm)- and at differentlevels of intensity, that is at different levels of tox-icity (the most aggressive being those commentsthat incite hate or even physical violence).

It should be noted that all these tags have bi-nary values (value= yes/no) except the toxicity tag,which has four values (<1= non-toxic>; <2= mildlytoxic>; <3= toxic> and <4: very toxic>). Thelevel of toxicity is determined by the presence andcombination of the features presented above. In fact,these features are different ways or mechanisms toexpress the toxicity and, therefore, they also help to

define what is meant by toxicity. The more nega-tive features appear in the comment, the higher thelevel of toxicity. For instance, we tag as ’mildlytoxic’ comments in which only one feature appears,the most frequent being <sarcasm>, <mockery>and <improper language>, whereas in commentstagged as <very toxic> the combination of featuresis higher than two, an especially frequent combinationis <improper language>, <mockery> and <insult>.This annotation allows us to establish fine grained cri-teria for analysing and better defining what can beconsidered a comment with toxic language or hatespeech.

4 GRAPHS TO VISUALISE TOXICMESSAGES

We have used multiple types of graphs and dia-grams to visualise the annotated data, which allow toanalyse and measure these kind of data. We have usedTableau2 (an interactive data visualisation software),DisplayR3 (an online visualisation tool) and lastlySankeyMatic4 (an online Sankey diagram builder) tocreate data visualisations.

4.1 Visualisations of globaldistributions.

All the visualisations are combined in a Tableau story,which offers an individual page for each visualisation.The first page of our Tableau project is an introduc-tion including four icons representing the four topics(economy, politics, immigration and religion), a briefexplanation that provides an overview of the dataset

2https://www.tableau.com3https://www.displayr.com4http://www.sankeymatic.com

4

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

62 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 5: Data Visualisation for Supporting Linguists in the ...

Figure 2: Visualisations of Global Data Distribution. Spider Chart and Sankey diagram.

(such as the origin of the messages and the questionsto be answered using the visualisations) and the textvisualisation of the seven features used in the annota-tion of toxicity. Each feature is represented by size,depending on the total number of ‘yes’ commentsthey have. On the tooltips, there is an example com-ment about the feature and the number of commentsannotated as ’yes’ for that feature. This text visuali-sation is a summary of upcoming visualisations.

Created graphs are then separated into threegroups: (1) visualisations to analyse the global distri-butions of messages by topics (2) visualisations of therelationships between features, and (3) visualisationsby features.

Global data distribution visualisations aim to ex-plore data in terms of topics and features of the mes-sages (see Figures 1 and 2). The first graph in thissection is a simple Bar chart. The Bar chart has foursections represented by icons for four topics. Featuresare colour coded and displayed in each topic individu-ally. The Bar chart displays features in terms of theirtotal numbers. There is a filter that allows users toselect values (’yes’ or ’no’). Please note that everymessage has been annotated with tags ’yes or no’ forevery feature. For example, a comment tagged with’yes’ in the argumentation, mockery and sarcasm fea-tures, can be tagged as ’no’ in constructivity and in-sult Secondly, a Heat map5 is created to show data asa whole and in a simple way without other complexdesign elements. The Heat map illustrates the fea-tures for each topic including a filter to choose ’yes’or ’no’.

The Spider6 graph is created to present the globaldistribution of toxic messages for each topic. The Spi-

5A heat map displays data values with colour, usually byintensity or hue.

6A spider graph displays multivariate data with threeor more quantitative variables represented on axes startingfrom the same point.

der web has seven sides representing the seven fea-tures that appear in the annotated messages. Top-ics are represented by their assigned colours. On theSpider chart, each topic has a unique shape made upof points combined together with lines. Each pointshows the selected percentage value (’yes’ or ’no’)percentage of a feature and lines are used to combinethe points to form the shape. In the page of the Spidergraph, there is a toxicity symbol which displays themean level of toxicity for each topic.

Another approach for examining the global distri-bution of data is to create a Sankey diagram7. In thiscase, features are presented on the left side, splittingto form areas on the right side. The Sankey diagramalso allows users to choose ’yes’ or ’no’ values. Nextto the topics, there are keys that illustrates the percent-age of each feature in terms of the selected value.

4.2 Visualisations of relationshipsbetween features

Two visualisations have been created to observe re-lationship between features (see Figure 3). Thesetwo visualisations aim to show a comparison betweenthe features annotated in the comments, which willmainly allow annotators/linguists to analyse their hy-pothesis.One of them is a Scatter plot with icons fortopics, representing one feature on the x-axis and oneon the y-axis. Each axis shows the possible values ofthe feature represented on that axis (yes-no). There isa filter where users can choose the desired feature foreach axis to compare with each other. Thus, with thisgraph, users can examine the number of occurrencesof each combination of values

(’no-no’, ’no-yes’, ’yes-no’, and ’yes-yes’) in se-lected features. For example, the combination ’no-no’

7Sankey diagram is a type of flow diagram where thewidth of the links are proportional to the size of the data.

5

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

63 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 6: Data Visualisation for Supporting Linguists in the ...

Figure 3: visualisations of relationships between features. Scatter plot and Chord diagram.

for constructivity and argumentation refers to mes-sages tagged as ’no’ for the constructive feature and’no’ for the argumentation feature.

The other visualisation is a Chord diagram8 (Reid,2020). While a Scatter plot analyses data in topics andfeatures, a Chord diagram does not separate data intotopics but rather analyses the data as a whole in thetoxic comment features. The Chord diagram also hasvalue selection buttons for ’no-no’, ’no-yes’, ’yes-no’and ’yes-yes’ comments in features. The Chord di-agram lets users select a feature and then highlightsthe arcs of that selected feature, which allows users toexamine features individually in relation to the otherfeatures. The thickness of the arc that connects twofeatures represents the number of messages annotatedby those two features with the value selection. Thetotal number of comments in the selected feature andothers are also displayed on a separate Bar chart,which displays the total amount of comments in eachfeature.

4.3 Visualisations by features

One of the main objectives of this study is to measurethe level of toxicity. There are four levels of toxic-ity, namely, non-toxic (1), mildly toxic (2), toxic (3)and very toxic (4). The aim of these graphs include,analysing annotated corpus, hypotheses and most im-portantly to identify inconsistencies. Although themean of toxicity of each topic is illustrated on the Spi-der, and the Sankey diagram, three additional graphsare created to analyse the level of toxicity in features.Firstly, in the left part of Figure 4 there is a Bar chartin which each level of toxicity is represented with aseparate bar for each feature. In this graph, the ’yes’and ’no’ values are displayed separately, and can be

8A chord diagram shows the connection between fea-tures. Thickness of the links represents the data size.

selected from the list, while a tooltip displays the to-tal number of messages at each level of toxicity.

A Sankey diagram is proposed to show the levelof toxicity for each feature (see the right hand sideof Figure 4). In this diagram, the data come fromthe level of toxicity (non-toxic to very toxic) and gointo features with lines in which the thickness of thelines represents the total number of comments in eachlevel and its connected feature. The same toxicityicon is used to show the total number of messages onthe tooltip for each features at each level of toxicity.Lastly, in Figure 4 there is a Treemap in which thelevel of toxicity is represented for a selected feature.The Treemap is more complex as it has many compo-nents to explore and is highly interactive. There arefour different maps for each topic. All levels of toxi-city (1 to 4) are divided into ’yes’ and ’no’ are repre-sented by colours blue to red. The darkest blue repre-sents the lowest toxicity and the darkest red shows thehighest toxicity. For example, if the chosen feature isconstructive, we can see the number of messages atthe level of toxicity 2 separately as ’yes’ or ’no’ in theeconomy topic.

There is another small Treemap which representsthe total number of messages by topic and, when thetopic is chosen, the Treemap shows the total numberof messages at each level of toxicity without dividingthem into ’yes’ or ’no’. There is a filter to changethe range of levels. Treemap allows users to comparethe levels of toxicity in terms of features and also top-ics. There is another Treemap that represents the totalnumber of messages in a topic and, when the topic ischosen, the Treemap shows the total number of mes-sages at each level without dividing them into ’yes’or ’no’. There is a filter to change the range of levels.In this graph ’yes’ or ’no’ values are displayed sepa-rately and they can be selected from the list, while thetooltip displays the total number of messages at eachlevel of toxicity.

6

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

64 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 7: Data Visualisation for Supporting Linguists in the ...

Figure 4: Visualisations by features. Left to right, Bar chart, Sankey diagram and Treemap.

5 EVALUATION

5.1 Methodology

The goal of the evaluation was to assess comprehen-sibility (how well the graph communicates the infor-mation to the user), interpretability (how well the usercan extract meaning from the visualisation), attrac-tiveness (to what extent the visualisation is visuallyappealing for the user), and to gather users’ opinions.

The evaluation was exploratory, aimed at obtain-ing participants’ perceptions. It was unmoderated,performed through an online questionnaire. A totalnumber of eight participants were recruited, includ-ing five females and three males. The questionnaireconsisted of demographic questions followed by threevisualisation tasks lasting up to 30-35 minutes. De-mographic questions included, gender, age, and twoquestions related to the participants’ prior degree ofexperience in message annotation and visual analyt-ics. Most of the participants had prior experience indata visualisation and more than half of the partici-pants had expertise in message annotation.

The visualisation tasks intertwined links to ourTableau visualisations and questions related to them:

• Task 1: Please, follow the tableau link of ”visu-alisations of global distribution” (Bar Chart, Heatmap, Spider Chart, and Sankey Diagram).

– Q1-Comprehensibility: ”Score from 1 (themost negative) to 5 (to most positive) how wellthe graph communicates the global distribu-tion.” Please, justify the best and the worsescores.

– Q2-Interpretability: ”Score from 1 (very diffi-cult) to 5 (very easy) how easy is to interpretthe graph.”

– Q3-Attractiveness: ”Score from 1 (very bad) to5 (very good) the visual appeal of the graph.”

• Task 2: Please, follow the tableau link of ”visual-isations of relationships between features” (Scat-ter Chart and Chord Diagram).Same questions asTASK1: Q1-Q3.

• Task 3: Please, follow the tableau link of ”visuali-sations by features” (Bar Chart, Sankey Diagram,and Treemap). Same questions as TASK1: Q1-Q3.The questions were chosen to explore the three

dimensions of visualisations including: the commu-nication of information (how well the visualisationscommunicate), the interpretation of graphs (how easyor difficult it is to understand the graphs, and the at-tractiveness (how appealing the visualisations). Therewere nine closed-questions (i.e. score 1 to 5), threeof which were followed by three open-questions (i.e.justify your answer) presented in the questionnaire.Data were collected anonymously.

5.2 Results

For each task, we first analysed answers to questions(Q1-Q3), then we presented the qualitative data aris-ing from users’ comments.

Task 1 : visualisations of global distributionAccording to the results in Figure 5, the Heat map

received the highest scores in Q1-Comprehensibility,followed by the Bar chart. The Spider graph receivedrelatively good scores in this question, whereas theSankey diagram was the least favourite graph. Theresults show that the Bar chart is the easiest to un-derstand in Q2-Interpretability, with 75 percentage ofparticipants scoring it as very easy and the Sankeydiagram as the hardest to understand, with relativelylow scores. The Spider chart and the Sankey diagramscored the highest in Q3-Attractiveness.

In the first task, the Bar chart was the mostfavoured graph in the dimensions of comprehensibil-ity and attractiveness. The Bar chart was the second

7

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

65 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 8: Data Visualisation for Supporting Linguists in the ...

favourite graph in the communication of informationdimension, scoring slightly lower than the Heat map.The Bar chart was described by multiple participants

Figure 5: Stacked bar chart displaying the results inTask 1. Q1-Comprehensibility, Q2-Interpretability and Q3-Attractiveness.

as very easy to understand and use. A couple ofparticipants commented that they favoured the Barchart as they are used to analysing this type of graph.Another positive comment was that the colours of thebars were both appealing and made the visualisationclearer. However, various participants commentedthat the Bar chart should have included percentagesalong with the actual total number of comments in thefeatures. The Heat map received the highest scoresfor Q1-Comprehensibility. The results also show thatthe Heat map was easy to understand and communi-cates information well, however, visually it was notas attractive as the other three visualisations.

The Spider graph received mixed reviews fromusers. One participant stated that ”I think the Spi-der gives a very clear idea of the distribution of at-tributes by features, with its isolated and superim-posed surfaces, it reflects very clearly what has beenannotated”. Another participant stated that ”The Spi-der chart is also a good way of displaying the datafor a general comparison across topics. However, thecomparison among features within a specific topic isless clear when only one axis contains the percentageindicator.” Another participant agreed with this com-ment by stating that comparisons between the featureswere a little bit difficult when the values matched. Theresults suggested that the Spider graph was visuallyvery appealing but slightly more difficult to under-stand than the Bar chart and the Heat map.

The Sankey diagram was the least favoured of allthe graphs, even though it received high scores in thevisual appeal section. The majority of the participantscommented that the shape of the graph was confusing,difficult to understand and that it was difficult to com-pare features of messages and topics. One participant

stated that ”The Sankey diagram seems rather chaoticcompared to the other ones”. Some participants com-mented that they needed to pay extra attention to theSankey diagram because of its complexity.

Multiple participants commented that to improvecommunication, all of the graphs should have in-cluded both percentage values and actual total num-bers. The interactivity of the graphs was an impor-tant element as results show that participants pre-ferred the Bar chart since it was very interactive whilethey did not like the fact that the Sankey diagramwas static. The comments suggest that participantsfavoured graphs on which they could spend the leastpossible time as they did not want to waste time tryingto understand the graphs.

Task 2: visualisations of relationships betweenfeatures

In the Q1-Comprehensibility, (see Figure 6), theScatter plot and the Chord diagram achieved simi-lar scores. The Scatter plot is favoured slightly moreby the participants. The Scatter plot achieved higherscores in the Q2-Interpretability than the chord di-agram, with a total of seven high scores, while theChord diagram received only two high scores. Forthe Q3-Attractiveness, the Chord diagram was con-sidered visually more attractive than the Scatter plotand it obtained significantly high scores.

Figure 6: Stacked bar chart displaying the results inTask 2. Q1-Comprehensibility, Q2-Interpretability and Q3-Attractiveness.

In the second task, most of the participants agreedthat the Scatter plot was easier to understand thanthe Chord diagram, and the Chord diagram was vi-sually more appealing than the Scatter plot. Therewere mostly positive views about both graphs in termsof their ability to communicate the information and

8

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

66 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 9: Data Visualisation for Supporting Linguists in the ...

the Scatter plot was described as more intuitive thanthe Chord diagram. Participants commented that theScatter plot provided a clear view for comparing top-ics and features, and that it was also very engagingthanks to its interactivity. A participant commentedthat “with the Scatter plot it is easier to understandthe correlation between the different features, an ex-planatory legend appears that also includes an ex-ample, the relationship is quickly associated with thenumber of examples, it displays the relationships be-tween features according to the topic. It is more in-tuitive”. On the other hand, some participants havefound the Chord diagram, relatively easy to under-stand and useful, especially in terms of the overallview of the data. One participant stated that ”ChordDiagram helps to understand the different relation-ships between features very well and allows interac-tion by focusing on various elements and their inter-sections, providing very valuable information.”

Task3: visualisations by featuresThe Bar chart received the highest scores in the

Q1-Comprehensibility, (see Figure 7). The Sankeydiagram and the Treemap obtained similar scores. Inthe Q2-Interpretability, the Bar chart was consideredto be the easiest and the Treemap the hardest to un-derstand. As in task 1, the Sankey diagram giventhe highest scores in the Q3-Attractiveness, followedby the Bar chart which received the second highestscores in Task 3. The Treemap was the least preferredgraph in this task. In the Task 3 (see Figure 7), the

Figure 7: Stacked bar chart displaying the results inTask 3. Q1-Comprehensibility, Q2-Interpretability and Q3-Attractiveness.

most preferred graph was the Bar chart in the dimen-sions of comprehensibility and interpretability. Thisis in line with other comments since the Bar chart is

very commonly used, it is easier to interpret, and com-paring the features with each other was therefore eas-ier. Also analysing a Bar chart does not require priorknowledge of the visual analytics, which was anotherreason for the popularity of the Bar chart.

The Sankey diagram had the highest score in thevisual appeal section. There were mixed observa-tions about the Sankey diagram as some participantsfound it very clear while some did not. One partici-pant stated that “The Sankey Diagram illustrates verywell the distribution of features according to the levelof toxicity and vice versa, it is very easy to understandand provides a lot of information, it would be useful ifthe information could be isolated interactively”. Var-ious participants agreed with the statement and theyhave described the Sankey diagram as being ratherdifficult to understand at first but afterwards it wasclear as the information was globally displayed. Hav-ing more interactive elements could solve the prob-lems of interpretation as users can filter down to ex-plore features separately. On the other hand, someparticipants described it as chaotic and confusing.

Lastly, the results showed that the Treemap wasthe least liked graph in this task. An interesting find-ing was that participants described the Treemap asdifficult to follow and understand though it storedmore information than the Sankey diagram and theBar chart, such as comparisons in topics and compar-isons of ’yes’ and ’no’ comments together. For ex-ample, a participant commented that “The Treemap,which at first glance seems more unpleasant, whenyou look at it closely, it gives very interesting infor-mation, which is when we find the same attribute la-belled two different ways, helping then in the findingof inconsistencies in the annotation. For example, inthe Economy topic, toxicity level 2 is represented byannotated comments such as Toxic = Yes and Toxic =No”. However, most of the participants commentedthat they did not understand the Treemap.

Overall, results show that participants were at-tracted by the graphs that were easier to understand.The appeal of the graphs was also an important ele-ment, though, not as important than the ease of use.Many participants were not attracted to the graphsthey did not understand and did not want to spendtime on them, even though they liked their appear-ance more. Another finding suggests that participantswould have benefited from greater guidance in thecomplex visualisations with various elements, whichwould have facilitated the comprehension, therebyproving more attractive. An idea is to combine sim-ple graphics with more complex ones to create simplegraphics like the Bar chart to guide graphs like theTreemap.

9

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

67 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 10: Data Visualisation for Supporting Linguists in the ...

6 CONCLUSIONS AND FUTUREWORK

This paper presents a study related to data visuali-sation of hate speech (toxicity) annotations. To do so,we proposed various data visualisations, that includesdifferent diagrams to assist annotators in the detec-tion of inconsistent annotations, the analysis of theglobal data distribution and the discovery of relation-ships between features. We used a complex corpuscomposed by comments posted in different Spanishonline new articles. All the comments were annotatedusing a new tagset that combines several features todetermine the level of toxicity in a more objectiveway. The challenge in the proposed data visualisa-tions was providing annotators with a wide spectrumof diagrams that highlight trends and relationships inan easy and comprehensible way. We proposed a di-versity of visualisations (from those that were well-known to some others that could be new or unfamil-iar for annotators). We conducted a preliminary eval-uation of the proposed visualisations from collectedqualitative and quantitative data obtained from a smallbut representative group of annotators. That is, theyhave different degree of expertise on visual analytictools, and different knowledge in the annotation ofcorpus. We evaluated several dimensions of the visu-alisation experience (comprehensibility, interpretabil-ity and attractiveness).

In the following, we share our lessons learned, in-cluding design recommendations useful for future vi-sualisation studies on corpus annotation. Regardingthe comprehension of visualisations, the first consid-eration is that the participants mostly prefer the sim-ple graphics (such as the Bar chart or Heat map),probably because annotators already acquainted withthem on their daily annotation. However, the more ex-pert participants in using visual analysis the more theyvalued visualisations that show more complex details(like the Sankey, the Spider or the Chord Diagram).Moreover, we also observed that in some cases theperception of the same diagram differed depending onwhich task and when it was visualised. For example,participants who rated the Sankey diagram as difficultto understand in Task 1, they found it easier in Task 3.Keeping this understanding in mind, the use of mixedvisualisations that include easiest graphs -to facilitate”the landing” in the annotated corpus-, together withthe more complex ones - to explore more complex re-lations - could help to enhance the visualisation effec-tiveness. In relation to the interpretation of the visual-isations, we found that participants appreciated hav-ing redundant information in the graphs (for instance,percentages and absolute values), and also they highly

valued the interaction offered by some graphs, partic-ularly in the ones that were more complex to under-stand (such as the Scatter Plot and the Tree Map). Theuse of interactivity by prioritising the most importantattributes such as features, topics, Yes/No values, etc.to select the data to be shown is also an important factto remind take into account in future visualisations.Last but not least, participants gave rather positivecomments in open questions in those graphs whichthey scored high in the attractiveness dimension (suchthe Scatter Plot and the Spider Diagram). Making vi-sualisations attractive and clear, using suitable coloursand icons, is engaging and, more importantly, instruc-tive and enlightening.

In the future, the first step is to design novel visu-alisations tailored according to linguists’ needs withthe findings of this paper and validate them by a largergroup of annotators. Moreover, we plan to integratethe visualisations in earlier stages of the annotationprocess. Visualisation will serve as a tool to lead theannotation, letting users examine corpus as well asfinding and editing inconsistent data. Additionally,we plan to introduce more valuable information re-lated to the context in which the comments are ob-tained. Collecting information such as, where themessage is post, who is the user, location and times-tamps, could also help annotators to detect diffusionof hate speech.

ACKNOWLEDGEMENTS

To MISMIS-Language (PGC2018-096212-B-C33),CI-SUSTAIN (PID2019-104156GB-I00) and AccioCOMRDI18-1-0010.

REFERENCES

Cao, N., Lin, Y.-R., Du, F., and Wang, D. (2015). Episo-gram: Visual summarization of egocentric social in-teractions. IEEE Computer Graphics and Applica-tions, 36(5):72–81.

Chen, S., Lijing, L., and Yuan, X. (2017). Social media vi-sual analytics. Computer Graphics Forum, 36(3):563–587.

Davidson, T., Warmsley, D., Macy, M., and Weber,I. (2017). Automated hate speech detection andthe problem of offensive language. arXiv preprintarXiv:1703.04009.

Florio, K., Basile, V., Polignano, M., Basile, P., and Patti,V. (2020). Time of your hate: The challenge of timein hate speech detection on social media. Applied Sci-ences, 10(12):4180.

10

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

68 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 11: Data Visualisation for Supporting Linguists in the ...

Fortuna, P. and Nunes, S. (2018). A survey on automaticdetection of hate speech in text. ACM Computing Sur-veys (CSUR), 51(4):1–30.

Frenay, B. and Verleysen, M. (2013). Classification in thepresence of label noise: a survey. IEEE trans. on neu-ral networks and learning systems, 25(5):845–869.

Gagliardone, I., Gal, D., Alves, T., and Martinez, G. (2015).Countering online hate speech. Unesco Publishing.

Grosman, J. S., Furtado, P. H., Rodrigues, A. M.,Schardong, G. G., Barbosa, S. D., and Lopes, H. C.(2020). Eras: Improving the quality control in theannotation process for natural language processingtasks. Information Systems, page 101553.

Knaflic, C. N. (2015). Storytelling with data: A data visu-alization guide for business professionals. John Wiley& Sons.

MAT (2020). The MITRE annotation toolkit. [Online; ac-cessed 2-sep-2020].

Paschalides, D., Stephanidis, D., Andreou, A., Orphanou,K., Pallis, G., Dikaiakos, M. D., and Markatos, E.(2019). Mandola: A big-data processing and visu-alisation platform for monitoring and detecting on-line hate speech. ACM Trans. on Internet Technology,37(4):1–21.

Reid, M. (2020). Creating a Chord Diagram with TableauPrep and Desktop. [Online; accessed 13-sep-2020].

Resnick, P., Carton, S., Park, S., Shen, Y., and Zeffer, N.(2014). Rumorlens: A system for analyzing the im-pact of rumors and corrections in social media.

Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky,N., and Wojatzki, M. (2017). Measuring the reliabilityof hate speech annotations: The case of the europeanrefugee crisis. arXiv preprint arXiv:1701.08118.

Shi, Y., Liu, Y., Tong, H., He, J., Yan, G., and Cao, N.(2019). Visual analytics of anomalous user behaviors:a survey. arxiv:1905.06720v2.

Stenetorp, P., Pyysalo, S., Topic, G., Ohta, T., Ananiadou,S., and Tsujii, J. (2012). Brat: a web-based tool fornlp-assisted text annotation.

Watson (2020). Watson knowledge studio. [Online; ac-cessed 2-sep-2020].

Wu, Y., Cao, N., Gotz, D., Tan, Y.-P., and , Keim, D.-A.(2016). A survey on visual analytics of social mediadata. IEEE trans. on multimedia, 18(11):2135–2148.

Yimam, S. M., Gurevych, I., de Castilho, R. E., and Bie-mann, C. (2013). Webanno: A flexible, web-basedand visually supported system for distributed annota-tions.

Zhao, J., Cao, N., Wen, Z., Song, Y., Lin, Y.-R., and Collins,C. (2014). fluxflow: Visual analysis of anomalous in-formation spreading on social media. IEEE trans. onVisualization and Computer Graphics, 20(12):1773–1782.

11

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

69 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7

kiv
Rectangle
Page 12: Data Visualisation for Supporting Linguists in the ...

ISSN 2464-4617 (print) SSN 2464-4625 (DVD)

Computer Science Research Notes CSRN 3101 WSCG 2021 Proceedings

70 ISBN 978-80-86943-34-3DOI:10.24132/CSRN.2021.3101.7