Explicitness and implicitness of discourse relations ...€¦ · This work: Data • Tweets and...
Transcript of Explicitness and implicitness of discourse relations ...€¦ · This work: Data • Tweets and...
Explicitness and implicitness of discourse relations across
social media Tatjana Scheffler
University of Potsdam
January 18, 2020
Do explicitness and implicitness of discourse
relations differentiate between types of social
media?
Structure of this presentation
• Coherence relations in spoken / written language
• Coherence relation marking in social media
• Dataset
• First results: qualitative sample
• (First results: quantitative)
Background
• Expression of discourse coherence differs between spoken and written media
• Some annotation efforts: Tonelli et al. (2010), Rehbein et al. (2016), Zeyrek et al. (2018)
• Discourse parsing in spoken domain: Riccardi et al. (2016)
• Conceptional work: Crible/Cuenca (2017), Zeyrek et al. (2018)
Coherence relations in speech vs. writing
Speech Writing
Fewer overall relations More coherence relations TRPJ2010
Explicit 2 : 1 Implicit Explicit 1:1 Implicit TRPJ2010, RSD2016
Truncated relation structures Full structures (both args present) CC2017
Connectives: far scope, vague, multifunctional CC2017
Temporal and Causal relations,Epistemic cause (Many EntRels) TRPJ2010,
RSD2016
New functions: Repetition, Hypophora (Q-A)
TRPJ2010, ZMK2018
Research questions
• Do blogs exhibit more or less explicit discourse relations than tweets?
• Which types of connectives and relations vary across the two media?
• Are individual author choices relevant for explicitation or implicitation of discourse relations?
Coherence relations in social media
• Scheffler/Stede 2016: Argumentative relations in PCC news text vs. Twitter
• Scheffler/Aktas/Das/Stede 2019: Annotating shallow discourse relations in Twitter conversations
Scheffler/Stede 2016• identify a subset of argumentative structures by text
segments using two types of “pragmatic” rhetorical relations:
• adversative relations and causal relations
• compare the linguistic signalling of these relations in two types of German corpora
• PCC newspaper editorials
• political Twitter conversations
• feature set: connectives, negation, 1st person, (modals)
• RST annotations
Complexity
• newspaper text segments much longer (18.9 vs. 7.4 words)
• newspaper segments also somewhat more complex
• number of verbs:
•
Connectives
• adversative relations more often marked than causal
• nucleus in causal relations almost never marked in Twitter (predominance of weil ‘because’ is similar to spoken language)
Types of connectives
• paratactic connectives in adversative relations on Twitter
• causal connectives (denn vs. weil) reflect spoken/written continuum
Twitter PDTB annotation• Explicit discourse relations occur frequently in English
Twitter data.Out of 1756 tweets, over 40% contain at least one tweet-internal explicit connective.
• Different relation distribution from PDTB (more Contingency)
• Dominance of few common connectives (and, but)
• Many fragments or incomplete utterances
• Connectives used as discourse markers (e.g., and)
This work: Data• Tweets and blog posts from the same authors
• Twitter list: “Elternbloggerkarte” (parenting theme)
• Identified blog associated with the Twitter account from the bio
• Extracted Twitter timeline (~ last 4-5 months of tweets) through API and last 5-10 blog posts via RSS feeds
• Excluded users w/ < 1000 tokens in tweets or blogs*
=> 62 users, 580 blog posts, >120,000 tweets
Blogs/tweets corpusBlogposts Tweets PCC (news)
users 62 62
items 580 120,728
tokens 463,743 1,892,146
type/token ratio (avg.) 0.28 0.22 0.54
word length (chars.) 4.68 4.85 6.36
Which is it?
• Connectives are more frequent in blogs:
1. Discourse relations are generally more frequent (per sentence/token).
2. Discourse relations are more frequently made explicit.
Individual connective frequency
• One author with typical distribution (#11)
• One author with similar frequency in tweets/blogs (#63)
#63 - Example tweets(1) Ich so: Ich muss jetzt die Steuer machen! Konzentration!
Wo sind die Kekse? Muss mir erst Tee kochen. Oh, die Katze will gekrault werden!Me: I have to do the taxes! Concentration! Where are the cookies? Must make tea first. Oh, the cat wants to be pet!
(2) Ich will Steuer machen und die Katze will auf meinem Schreibtisch gekrault werden. Tja.I want to do the taxes and the cat wants to be pet on my desk. Well.
#63 - Implicit
• Tweets: Count of relations depends on whether only intra-speaker implicit relations are allowed (see RSD 2016)
(4) @USER mein Vater hat das auf FB geteilt 🙄@USER my father shared that on FB 🙄
(5) @USER aber ich liebe Drews Haare 😍@USER but I love Drew’s hair 😍
#63 - NoRel
• Tweets: First tweet of a thread
• (also discounted tweets of only links, English tweets, and retweets)
• Blogs: Title, first sentence, parentheticals
(3) … (hab’ ich was verpasst?) …… (did I miss something?) …
#63 - Implicit• Tweets: Many answers to previous tweets, including Hypophora
(4) @USER mein Vater hat das auf FB geteilt 🙄 @USER my father shared that on FB 🙄
(5) @USER aber ich liebe Drews Haare 😍 @USER but I love Drew’s hair 😍
• Blogs: Chopped, narrative style
(6) Im Flugzeug, ich sitze am Gang. eine ältere Dame auf der anderen Seite.On the airplane, I’m sitting in the aisle seat. an older lady across the aisle.
#63 - ExplicitTweets Blogs
23 und / and 14 und / and
10 aber / but 7 aber / but
8 wenn / if 2 weil / because
6 weil / because, dann / then außer, dann, denn, nachdem, nämlich, wegen
2 dabei / while , um…zu / to except, then, since, after, therefore, due to
als, damit, danach, deshalb, doch, nachdem, ohne…zu
when, in order to, after, therefore, however, after, without
#63 - Explicit
• Tweets: Connective clusters:
(7) Der Mann sagt,er will erst ab April tapezieren, weil man dabei nicht lüftet und dann danach keine kalte/feuchte Luft rein soll. Helft mir :(My husband says he doesn’t want to paint till April, because one shouldn’t air the room, and then afterwards shouldn’t let cold/damp air in. Help me :(
• Blogs:Question: What is the argument of
‘danach’ (afterwards)?
#63 - Explicit
• Tweets: Connective clusters:
(7) Der Mann sagt,er will erst ab April tapezieren, weil man dabei nicht lüftet und dann danach keine kalte/feuchte Luft rein soll. Helft mir :(My husband says he doesn’t want to paint till April, because one shouldn’t air the room, and then afterwards shouldn’t let cold/damp air in. Help me :(
• Blogs: Narrative ‘und’, ‘aber’
#11 - Implicit• About as many implicit relations as explicit relations in the
tweets -> Overall fewer (intra-speaker) relations
• Implicit relations mostly narrative fragments (as before):
(8) @USER Das Büro ist nicht betretbar. Alles Ordner aus den Schränken, Akten zerrissen. Geld noch da.@USER The office is inaccessible. All the folders off the shelves, files torn. Money still there.
• Many short replies
#11 - ExplicitTweets Blogs
11 aber / but 23 und / and
9 und / and 5 als / when
8 wenn / if 4 doch / however
2 dann / then 3 um…zu / in order to
ansonsten, da, nachdem, oder, weil, zumal 2 aber, dann, denn, wenn / but, then, since, if
otherwise, since, after, or, because, since also, auch, da, damit, dennoch, entweder…oder, nachdem, obwohl…so, also, since, to, however, either…or,
after, although, …
#11 - Explicit
• In explicit relations, one argument is often missing (see Cuenca/Crible for spoken language):
(9) @USER na wenn sonst nichts los ist 😂@USER well if there’s nothing else happening 😂
Summary Qualitative Analysis
• More explicit discourse connectives in blogs than in tweets (per sentence and per token)
• In blogs, about the same number of explicit and implicit relations (similar to other written text!)
• For tweets, all depends on one’s definition of a discourse relation:
• If only intra-speaker implicit relations are considered, then there are fewer overall discourse relations
• If inter-speaker implicit relations are allowed, then there are many more (implicit) discourse relations in tweets than blogs
This contradicts previous research on speech
Concession• Concession cannot be expressed implicitly
• Concessive connectives that occur on average more than 5 times in the data
•
Causal connectives• Causal connectives are frequent in all media
• Conceptually oral/informal style of justification on Twitter
•(Scheffler, 2014)
Causal connectives• Fewer epistemic and speech-act level causes in Twitter
than in spoken language:
•(Scheffler, Schlüter, Stede, 2016; Volodina, 2010)
Quantitative analysis
• Ongoing
• For explicits only (for now): disambiguate connectives
• Quantify both individual variation and cross-media-effects