Journalismus trifft Informatikforschung - fif.tu-darmstadt.de · Ilham Anas, a 40-year-old from...

1

Journalismus trifft Informatikforschung // Journalism meets Computer Science

Wissenstransferworkshop des DFG-Graduiertenkollegs GRK 1994 „Adaptive Informationsaufbereitung aus heterogenen Quellen“ (AIPHES) 11. November 2016, Lichtenberghaus, Darmstadt.

2

Überblick // Overview

Auftakt // Preface .......................................................................................................................... 3

Programm // Program ................................................................................................................... 4

Über AIPHES // About AIPHES ...................................................................................................... 5

Beteiligte: Journalismus // Participants from Journalism ............................................................... 6

Beteiligte: Informatikforschung // Participants from Computer Science ........................................... 7

Forschungsprojekte // Research projects ......................................................................................... 9

Anfahrt // Directions ................................................................................................................... 34

Tagungsort // Venue .................................................................................................................... 35

Kontakt // Contact ....................................................................................................................... 36

3

Auftakt // Preface

Intensive Recherche ist für viele Tätigkeiten zentral und unterliegt sehr engen Zeitvorga-ben. Insbesondere bei Entscheidungsprozes-sen hat die Qualität der Rechercheergebnisse weitreichende Konsequenzen. Zugleich explo-diert die relevante Informationsmenge und elektronische Quellen werden immer komple-xer, sind hochgradig heterogen und weisen unterschiedliche Informationsqualität auf.

Im April 2015 haben die Universitäten Darm-stadt und Heidelberg sowie das Heidelberger Institut für Theoretische Studien (HITS) das Graduiertenkolleg „Adaptive Informationsauf-bereitung aus heterogenen Quellen“ (AIPHES) aufgelegt, in dem Fragen der Recherche, Strukturierung, Aggregation und Bewertung von Informationen erforscht werden.

Ziel dieses Workshops ist es, mögliche Ein-satzgebiete für die erforschten Informatikme-thoden im Journalismus zu identifizieren und zu diskutieren. Dazu werden die Mitglieder des Kollegs ihre Forschungsvorhaben vorstel-len und gemeinsam mit Expert aus der journa-listischen Praxis diskutieren. Die richtigen Anknüpfungspunkte zu finden, wird keine leichte Aufgabe, aber wir sehen großes Poten-tial etwa im Bereich Investigativ- und Daten-journalismus sowie bei der Faktenverifikation. Insofern freuen wir uns auf viele anregende Diskussionen und Ideen zu aktuellen oder auch kommenden Forschungsarbeiten.

The importance of thorough research under tight deadlines is increasing rapidly and the consequences of the quality of the research re-sults are far-reaching, especially for decision making processes. At the same time, the amount of information is growing exponentially and there is a continual increase of complexity, het-erogeneity, and a high variation in the quality of electronic information sources.

In April 2015, the Universities at Darmstadt and Heidelberg together with the Heidelberger Institut für Theoretische Studien (HITS) found-ed the research training group “Adaptive Prepa-ration of Information from Heterogeneous Sources” (AIPHES). Core research questions in AIPHES are investigation, structuring, aggrega-tion, and assessment of information

The main goal of today’s workshop is to identify and discuss possible applications in journalism for the newly researched computational meth-ods. To this end, the members of the research training group will present their research pro-jects and discuss them together with experts and practitioners from the journalistic domain. Finding the best links between computer science and journalism is by no means easy, but we expect high potential, for instance, in investiga-tive and data journalism as well as for fact veri-fication. This is why we are looking forward to many stimulating discussions and ideas for cur-rent and prospective research projects.

4

Programm // Program

11. November 2016

ab 08:45 Uhr Ankunft Arrival

09:00 Uhr Begrüßung und Vorstellung Welcome address and introduction

09:30 Uhr Projektmesse (Teil 1) Poster and demo presentation (1)

10:45 Uhr Kaffeepause Coffee break


12:30 Uhr Mittagspause Lunch break


15:00 Uhr Kaffeepause Coffee break

15:30 Uhr Plenumsdiskussion Plenary discussion

bis 16:30 Uhr Abschluss End of the workshop

5

Über AIPHES // About AIPHES

Die Vision des Graduiertenkollegs „Adaptive Informationsaufbereitung aus heterogenen Quellen“ (AIPHES) ist es, strukturiertes Wis-sen aus heterogenen Textquellen mit automa-tisierten Methoden zu extrahieren und zu einem informativen und stilistisch homogenen Dossier aufzubereiten. Es werden Methoden entwickelt, die sich an unterschiedliche Textsorten und Sachgebiete anpassen und sich so auf unterschiedliche Aufgabenstellun-gen, Nutzer und Sprachen übertragen lassen.

Das komplexe Problem der adaptiven Infor-mationsaufbereitung bedarf der Erforschung integrierter Techniken auf Basis mehrerer Wissenschaften. Zentrale Forschungsfragen bestehen in der computerlinguistischen Dis-kursverarbeitung und in sprachtechnologi-schen Methoden zur Strukturierung und Ag-gregation von heterogenen Dokumentsamm-lungen, in der Repräsentation und Analyse von textinduzierten Strukturen unter Einsatz von Netzwerkanalyse und maschinellem Ler-nen sowie in den Kriterien und Mechanismen zur Qualitätsbewertung von heterogenen Quellen und Dossiers im Informationsma-nagement. Die Multidokumentzusammenfas-sung dient als prototypische Aufgabenstellung und soll erste Anknüpfungspunkte zur An-wendungsdomäne Journalismus liefern.

Im Graduiertenkolleg forschen 11 Promovie-rende gemeinsam mit 21 assoziierten und 8 leitenden Wissenschaftler*innen. Das Pro-gramm wird von der Deutschen Forschungs-gemeinschaft (DFG) seit April 2015 gefördert.

The vision of the graduate program “Adaptive Preparation of Information from Heterogeneous Sources” (AIPHES) is to extract structured knowledge from heterogeneous text sources us-ing automated means in order to create in-formative dossiers of stylistically homogeneous content. Within this project, we develop meth-ods that are able to adapt to different text gen-res and domains, so that the results can easily be transferred to other tasks, user groups, and languages.

Adaptive information processing is a complex task which requires extensive research on inte-grated methods that incorporate knowledge of multiple scientific disciplines. The main research questions are the computational linguistic mod-eling of discourse phenomena and natural lan-guage processing methods for structuring and aggregating heterogeneous text genres, the rep-resentation and analysis of text-induced struc-tures based on network analysis and machine learning, and the criteria and mechanisms for selecting and assessing the quality of heteroge-neous sources and dossiers in information man-agement. Multi-document summarization serves as a prototypical task that is jointly addressed by all AIPHES members and that should yield first ideas for cooperations with application experts in journalism.

In AIPHES, 11 Ph.D. students work together with 21 associated researchers and 8 principal investigators from multiple fields. The German Research Foundation DFG funds the program since April 2015.

6

Journalismus trifft… // Participants from Journalism

Daniel Drepper CORRECT!V, Essen/Berlin Telefon: (030) XXXX [email protected] http://correctiv.org Twitter: @danieldrepper

Anne Preger Freie Wissenschaftsjournalistin Dipl. Geoökologin, Bonn Telefon: (0228) XXXX [email protected] http://www.hoerweiten.de Twitter: @apreger

Lars Hennemann Chefredakteur Echo Zeitungen, Darmstadt Telefon: (06151) XXXX [email protected] http://www.echo-online.de Twitter: @larsoliverhen

Kersten A. Riechers quäntchen + glück, Darmstadt Telefon: (06151) XXXX [email protected] http://www.quäntchen-und-glück.de Twitter: @dasKerst

Andreas Loos Data Scientist ZEIT ONLINE GmbH, Hamburg Telefon: (030) XXXX [email protected] http://www.zeit.de

Tanjev Schultz Professur für Journalismus Johannes Gutenberg Universität, Mainz Telefon: (06131) XXXX [email protected] http://www.journalistik.uni-mainz.de

Lorenz Lorenz-Meyer Professur für Online-Journalismus Hochschule Darmstadt Telefon: (06071) XXXX [email protected] https://oj.mediencampus.h-da.de Twitter: @lorenzlm

Jutta Witte Journalistenbüro Surpress GbR, Tübingen Telefon: (07472) XXXX http://www.surpress.org

Stefan Michaelsen Entrepreneurship Fellow Media Lab Bayern, München Telefon: (089) XXXX http://www.linkedin.com/in/ stefan-michaelsen-844062102

Vanessa Wormer Datenjournalistin Süddeutsche Zeitung, München Telefon: (089) XXXX [email protected] http://www.sueddeutsche.de Twitter @remrow

7

Informatikforschung // Participants from Computer Science

Projektleitung // Principal Investigators

Judith Eckle-Kohler UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX eckle-kohler (ät) ukp.informatik.tu-darmstadt.de

Anette Frank Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX frank (ät) cl.uni-heidelberg.de

Johannes Fürnkranz Knowledge Engineering, TU Darmstadt Telefon: (06151) 16–XXXX info (ät) ke.tu-darmstadt.de

Iryna Gurevych (Sprecherin) UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX gurevych (ät) ukp.informatik.tu-darmstadt.de

Christian M. Meyer UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX meyer (ät) ukp.informatik.tu-darmstadt.de

Michael Strube Natural Language Processing Group, HITS Telefon: (06221) 533–XXXX michael.strube (ät) h-its.org

Karsten Weihe Algorithmik, TU Darmstadt Telefon: (06151) 16–XXXX weihe (ät) algo.informatik.tu-darmstadt.de

Promovierende // Doctoral Researchers

Thomas Arnold Netzwerkanalyse, TU Darmstadt Telefon: (06151) 16–XXXX arnold (ät) aiphes.tu-darmstadt.de

Avinesh PVS Informationsmanagement, TU Darmstadt Telefon: (06151) 16–XXXX avinesh (ät) aiphes.tu-darmstadt.de

Benjamin Heinzerling Computerlinguistik, HITS Telefon: (06221) 533–XXXX heinzerling (ät) aiphes.tu-darmstadt.de

Gerold Hintz Sprachtechnologie, TU Darmstadt Telefon: (06151) 16–XXXX hintz (ät) aiphes.tu-darmstadt.de

Tobias Falke Sprachtechnologie, TU Darmstadt Telefon: (06151) 16–XXXX falke (ät) aiphes.tu-darmstadt.de

Andreas Hanselowski Informationsmanagement, TU Darmstadt Telefon: (06151) 16–XXXX hanselowski (ät) aiphes.tu-darmstadt.de

Ana Marasovic Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX marasovic (ät) aiphes.tu-darmstadt.de

8

Promovierende // Doctoral Researchers

Teresa Martin Maschinelles Lernen, TU Darmstadt Telefon: (06151) 16–XXXX martin (ät) aiphes.tu-darmstadt.de

Todor Mihaylov Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX mihaylov (ät) aiphes.tu-darmstadt.de

Maxime Peyrard Sprachtechnologie, TU Darmstadt Telefon: (06151) 16–XXXX peyrard (ät) aiphes.tu-darmstadt.de

Markus Zopf Maschinelles Lernen, TU Darmstadt Telefon: (06151) 16–XXXX zopf (ät) aiphes.tu-darmstadt.de

Projektmitarbeiter // Project staff

Christopher Tauchmann Informationsmanagement, TU Darmstadt Telefon: (06151) 16–XXXX tauchmann (ät) aiphes.tu-darmstadt.de

Weitere Beteiligte: http://www.aiphes.tu-darmstadt.de

Assoziierte // Associated Researchers

Lisa Beinborn UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX beinborn (ät) ukp.informatik.tu-darmstadt.de

Chris Biemann Sprachtechnologie, Universität Hamburg Telefon: (040) 42883 XXXX biemann (ät) informatik.uni-hamburg.de

Richard Eckart de Castilho UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX eckart (ät) ukp.informatik.tu-darmstadt.de

Eneldo Loza Mencía Knowledge Engineering, TU Darmstadt Telefon: (06151) 16–XXXX eneldo (ät) ke.tu-darmstadt.de

Margot Mieskes Forschungs- und Wirtschaftsdaten, Hochschule Darmstadt Telefon: (06151) 16–XXXX margot.mieskes (ät) h-da.de

Éva Mújdricza-Maydt Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX mujdricz (ät) cl.uni-heidelberg.de

Daniil Sorokin UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX sorokin (ät) ukp.informatik.tu-darmstadt.de

Christian Stab UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX stab (ät) ukp.informatik.tu-darmstadt.de

9

Forschungsprojekte // Research projects

Session 1 (09:30–10:45)

1. Entity Linking: Automatically Grounding Text in a Knowledge Base (Benjamin Heinzerling)

2. Automatic Text Summarization with Concept Maps (Tobias Falke)

3. Argument Retrieval through Real-Time Analysis on Big Data (Christian Stab)

4. Motif analysis of text-based graphs (Thomas Arnold)

Session 2 (11:15–12:30)

5. Deep Learning with Sentiment Inference for Discourse-oriented Opinion Analysis (Ana Marasovic)

6. Argumentation Analysis Techniques for Fact Checking (Andreas Hanselowski)

7. Computer Assisted Multi-document Summarization and Evaluation (Avinesh PVS)

8. Machine Learning for Information Importance Estimation (Markus Zopf)

Session 3 (13:45–15:00)

9. Content Selection as an Optimization Problem (Maxime Peyrard)

10. Fact Extraction via combined SRL and RE (Teresa Martin)

11. Quantitative assessment of text quality (Christopher Tauchmann)

12. Extracting Event Structures from Text (Todor Mihaylov)

10

Poster #1

Entity Linking: Automatically Grounding Text in a Knowledge Base Benjamin Heinzerling

Entity linking (EL) is the task of automatically linking mentions of entities such as persons, loca-tions, or organizations to their corresponding entry in a knowledge base (KB). EL grounds a given document to a KB and thus makes the rich data contained in the KB available for use cases such as automatic semantic indexing, text enrichment, and entity-aware search.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

Entity Linking: Automatically

Grounding Text in a Knowledge Base

Benjamin Heinzerling

What is this text about?

Ilham Anas, a 40-year-old from Jakarta, Indonesia, works asObama’s doppelgaenger.

Human Answer

Ilham Anas, a 40-year-old from Jakarta, Indonesia,works as Obama’s doppelgaenger.

Entity Linking Answer

��

��

�! "

��#��#��

��#��#��

��#��#$��%�&��

��#��#'��(��

Why? Why is this hard?

� Ambiguity of language:Obama Barack Obama, Michelle Obama, MaliaObama, Sasha Obama, Barack Obama Sr, Obama(Fukui, Japan), Mt. Obama, Obama Burmeisteri?

� Variability of language:Barack Obama Barack Obama II Barack Obama Jr. Barack

Obama Junior Barack Obama, Jr. Barack Obama, Junior Barack Hussein

Barack Hussein Obama Barack Hussein Obama II Barack Hussein Obama

Jr. Barack Hussein Obama Junior Barack Hussein Obama, Jr. Barack Hussein

Obama, Junior Barack Hussein obama Barack H. Obama Barack H. Obama II Barack H.

Obama Jr. Barack H. Obama Junior B. H. Obama B. Hussein Obama B. Obama Pres. Obama

President Barack H. Obama President Barack Hussein Obama II President Barack Obama President

Obama Sen. Obama Senator Barack Obama US President Barack Obama United States President

Barack Obama 2008 Democratic Presidential Nominee 44th President of the United States Barack

Obana Barack Obbama Barack Ubama Barack OBama Barack obma BarackObama Barak

Obamba Barck Obama Barock obama Borack Obama Borrack Obama Brack Obama Brock

Obama Burack obama Hussein Obama Obamma 0bama Barack O’Bama O’Bama O’bama

How?

� Look at context words:� The Japanese city of Obama. . .

� Obama’s last day in office as President

� Look at context entities:� Obama, located in Fukui prefecture, is a. . .

� Obama’s wife Michelle. . .

Use cases

� Automatic semantic indexing of documents� Which persons, locations, organizations, etc. are mentioned?

� Ties rich data in knowledge base to documents

� Automatically enrich text, display info boxes

� Entity-aware search, no naive string match

� Complex queries based on knowledge base relations

Get in touch

Questions or comments: [email protected]

Acknowledgements: This work has been supported by the German Research Founda-tion as part of the Research Training Group “Adaptive Preparation of Information fromHeterogeneous Sources” (AIPHES) under grant No. GRK 1994/1.

12

Poster #2

Automatic Text Summarization with Concept Maps Tobias Falke

Concept maps are labeled graphs in which every node represents a concept and every edge a rela-tionship that holds between the two connected nodes. Originally developed by psychologists for applications in the education domain, this formalism generally allows to represent information in a structured way. In our work, we aim to automatically create concept maps for document collections such that the map is a summary containing the most important content from the collection. Meth-odologically, this challenge is tackled in several steps, in which potential labels for concepts and relations are extracted from the documents, ranked by importance and then combined into a map. Besides their usage as summaries, we see several other applications of these maps: In an interactive text exploration system, generated concept maps could be used as navigation structures, allowing to explore the document collection, to focus on subsets of the documents and to navigate to locations of interest in them.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

Automatic Text

Summarization with Concept Maps

Tobias Falke

Concept Map Summaries

Generation Approach

Application: Interactive Text Exploration

Munich shooting

Olympia

shopping mall

Moosach

district

10 people

David Sonboly

suicide

36 people

were

injured

atwere

killed

at

is located at

took place at

is

responsible

for

comitted

Munich shooting

behavior

modification

alternative ADHD

treatments

alt. ADHD treatments

bio feedback

multiple studies

Feingold diet

good information

medical

professionalscan provide

dismiss

is

is kind of

include

support

Concept

Extraction &

Grouping

Relation

Extraction &

Grouping

Concept

Ranking

Relation

Ranking

Map

Construction

gesture recognition Idea:

Enhance search engine with

generated concept maps

Possible usage:

Navigate to occurrences of

concepts in the documents

Filter documents by a selected

concepts, build new map

Navigate through map from

concept to conccept

13

14

Poster #3

Argument Retrieval through Real-Time Analysis on Big Data Christian Stab

We present a cognitive system designed to help journalists to quickly get an overview of opinions around a disputed topic. The system presents a customized summary of potential arguments rele-vant to a user-specified query. The summary is created in real-time from arbitrary large web sources. For example, the system could help to find a balanced set of options around the question "What are the most important reasons for and against TTIP?". Rather than having to read through many (potentially) non-relevant documents retrieved using a common web search engine, the sys-tem will select and summarize the most relevant standpoints with respect to the given topic. It does so by aggregating automatically preselected claims in a customizable manner (e.g. for and against a standpoint). The system makes use of recent advances in the fields of natural language processing and machine learning, by abstracting meaningful insights from a small body of human-created ar-gumentative structures.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

Argument Retrieval

through Real-Time Analysis on Big Data

Christian Stab, Johannes Daxenberger, Chris Stahlhut, Can Diehl and Iryna Gurevych

Ubiquitous Knowledge Processing Lab (UKP),

Department of Computer Science, Technische Universität Darmstadt

•  High application potential

•  Change of search paradigm

•  Novel way to use big data •  Quick and easy to use

Currently we are searching for collaborators

Contact us!

The Vision

Unstructured Web Data Structured System Output Benefits

�  Arguments comprehensively summarized

�  Grouped into pro and con arguments

�  No time-consuming web searches

�  Sources are directly accessible for further investigation

Use Cases

Scenario: Journalism�  A journalist is interested in news and arguments related to a common topic

�  Instead of using the web, the journalist first uses this system�  This way the journalist gets an instantaneous overview over arguments about

the topic�  Real-time analysis ensures that the overview includes the latest news

�  Browsing the results, links to all web resources are available for further analysis

Summary

Scenario: Purchase Decision

�  A user is interested in purchasing a product �  Instead of searching through numerous product reviews, the user can search for

the product with the system �  This way the user gets an instantaneous overview over pros and cons of this

product

Migration in Europe

⊕  Migration is a chance for development (15

Hits)

⊕  Religious pluralism strengthens society (12

Hits)

⊕  Prosperity will result (8 Hits)

  Economy needs skilled workers (4 Hits)

→  Not enough apprentices [Economist]

→  Germany lacks workers [Huffpost]

→  Number of students drops [Sun]

→  Orders cannot be processed [Economist]

Argument 5-10 of 75

⊕  Woman face discrimination (9 Hits)

  Crime rates are rising (4 Hits)

→  Several detained in Cologne [Independent]

→  Right-wing violence raised by 65%

[Spiegel]

→  More gun licenses issued [Telegraph]

→  Arson attack on gymnasium [Twitter]

⊕  Extremist parties gaining popularity (3 Hits)

⊕  Walls raised at the EU border (2 Hits)

Argument 1-5 of 22

Goal

•  Extract arguments from heterogenous web

content for both German and English

•  Aggregate arguments based on user-defined queries

•  Display arguments in a structured way which helps to gain an instant overview over the topic

System Architecture

User interface�  User requests a topic�  System searches in

annotated documents�  Live Aggregation of

arguments�  Structured results are

presented to the user

Argument Extraction�  Crawling heterogeneous

sources from the web�  System preprocesses data�  Arguments are extracted

using Deep Learning

CrawlerPre-

processingArgument Extraction

User-interface

Information Retrieval Aggregration

15

16

Poster #4

Motif analysis of text-based graphs Thomas Arnold

Motif analysis deals with recurrent patterns in graph structures and networks. We transform docu-ments into different graph representations of nodes and edges, using language features and charac-teristics of the source text. This allows us to apply principles and algorithms of graph theory to search for statistically significant patterns – so-called motifs. These motifs can be used not only to classify, but also learn about hidden properties of the texts. Possible applications include quality assessment of text, categorization, or using motif signatures to create author / genre profiles.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

Motif analysis ofText-based GraphsThomas Arnold

Main Idea

Text

Graph Motifs

1

4

2

3

Our father in heaven. Your kingdom come.

On earth as in heaven.

Graph Representation - Example

For the kingdomof the father.

Our father in heaven. Your

kingdom come. For the kingdom

of our father. On earth as in

heaven.

Graph Motifs - Examples

FrequentSubgraphs

CentralNodes

LongPaths

1 2

3

1 2

3Featured article Nonfeatured article

1 2

3

1 2

4 3

We identified motifs that are positively or negatively correlated with Wikipedia article quality.

Done: Predict Article Quality in Wikipedia

Current Research Ideas

Motif based timeline analysis.

Sentence ASentence BSentence CSentence DSentence E

Sentence A

Sentence B

Sentence C

Sentence D

Sentence E

Motifs for sentence ordering to improveautomatic summarization.

Text

Improve automatic evaluation of text quality.

Use motif signatures for author / genre profiles.

Profit!

17

18

Poster #5

Deep Learning with Sentiment Inference for Discourse-oriented Opinion Analysis Ana Marasović

Fine-grained opinion analysis is important for a variety of NLP tasks including opinion-oriented question answering and opinion summarization. A neural network-based method for joint extrac-tion of opinion entities, i.e. opinion expressions and their holders and targets, and relations among them will be proposed. To combat the problem of scarcity of labelled data for languages other than English, we will exploit an adversarial framework for the proposed model, and evaluate it on Ger-man data. We will approach the sentiment analysis task from a discourse perspective. We apply sentiment inference beyond the sentence level, with the aim of obtaining a denser, fine-grained rep-resentation of sentiment across the entire discourse. We address the sentiment analysis as a knowledge base completion task, using matrix factorization or distillation as a method. This ap-proaches lead to dense, structured discourse representation, enriched with inference rules that ap-ply within and beyond sentences. The importance of an integrated dense discourse representations and sentiment inference rules will be tested in multi-document and multi-perspective scenarios for both English and German. A challenge for the applicability of inference rules are occurrences of propositional anaphors that refer to situations or events. We will investigate the impact of proposi-tional anaphora resolution on sentiment analysis, and vice versa. To this end, we develop an ac-count for abstract anaphora resolution, and investigate its impact on sentiment analysis, in a pipe-line and in a joint modelling approach.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

Deep Learning with Sentiment Inference

For Discourse-oriented Opinion Analysis

Ana Marasovic

What needs to be addressed

1. Fine-grained opinion analysis: detect explicit opinion expressions and their sentiment, identify targets (entities or propositions at which thesentiment is directed), holders (entities that express the opinion) and relations among extracted entities.

⇒

holders

targets

⇒

2. Detect implicit sentiment via inference on explicit sentiment and events that positively or negatively affect entities.

Example: Mexico’s president criticised U.S. gun laws, that enable weapons flow from the U.S. into the hands of Mexican drug cartels.

AND ⇒

3. Connect pieces of texts to obtain meaningful information.

Example: U.S. gun laws enable weapons flow from the U.S.

into the hands of Mexican drug cartels. [...] Mexico’s president

criticised this issue.

Methods

Deep learning methods:� a deep bi-directional LSTM as a baseline for labelling opinion entities

� improvement: multi-task learning with structurally related low-level SRL task

� progressive neural networks as a MTL framework

� the adversarial framework for cross-lingual labelling

� a distillation method that transfers the structured information of logic rulesinto the weights of neural networks

Use-case

� highlight extracted entities to follow a text more easily

� find groups of people with similar viewpoints

� detect conflicting opinions

� opinion-oriented summarization

� non-factual question answering

Get in touch

Questions or comments: [email protected] Acknowledgements: This work has been supported by the German Research Founda-tion as part of the Research Training Group “Adaptive Preparation of Information fromHeterogeneous Sources” (AIPHES) under grant No. GRK 1994/1.

20

Poster #6

Argumentation Analysis Techniques for Fact Checking Andreas Hanselowski

With the enormous number of articles which are daily published on the web, it is difficult to keep up with the amount of available information and to verify whether it is reliable and can be utilized for further use. As a result, unverified rumors spread quickly through social media and lead to disin-formation of the public. To be able to verify the confidence of the articles as they are appearing on the web quickly, we propose a tool which is able to identify the argumentative structure in the arti-cles and verifies the claims automatically. This is done by identifying the claims and aggregating evidence which support or contradict the claims. On the basis of the evidence, the validity of the claims is assessed by leveraging state-of-the-art natural-language-processing techniques. In a subse-quent step, the validity of the major claim is validated so a confidence score for the article as a whole can be determined.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

��

��

�� !�"��#�� $�%��& �� '�� !�"��#��()�**� *��+�*�� ,��-�&��"+�.��#��/�� ,0��12�� )�**�� )��)�� 0�� !�"%��#

�� !"��

��"�#$��%

�� * 1�� .��&�)�)�� $�� %�� )�� "��#

#� ��#��

��2 34�5 �6789:;<�5 �/5�� :�� =�� $� ��$ �� #� � %��&� $� ��

%��

1��*��2��+��*�� 1��2��+��*�� 9��+��+��*� 1��+��*� >�*4��+��+��+��*� <��+��*�?��*��

'(�� #��

1 *�� *�� *�� @�*��2 �� 2�� % �� ,�� @�*�� +� ��/ (

&�'��"�(�)"��"��"��"�� *��+"��

#�� ;��**�� @�*��

1�*��*��*��/

��,��@�*��+��/

��

�� /�"��%�

��/�� ,0��12�� )�**� A �� )��)��

"%��#

�� * 1��.��&�)�)�� (��* B�� /�"��#

%�� )��*

1��*�� 2�

<��

<��+��*��<��+��*��

��

��

��

��

C�� -��0 ��

C�� -�� ,

C�� -�� ,�

'�� ;��**�� @�*��

1�*��*��*��/

��,��@�*��+��/

��

��

21

22

Poster #7

Computer-assisted Multi-document Summarization and Evaluation Avinesh PVS

For the past couple of decades there has been intensive work in the area of automatic summariza-tion, but unfortunately, the quality of the resulting summaries is still low. In view of this problem, we propose computer-assisted summarization (CAS) systems incorporating user feedback as an al-ternative to existing fully automatic summarization systems. CAS systems have the potential to pro-duce high-quality, human-like summaries, as they allow the user to post-edit an automatic summary to draft according to their requirements.

We propose a novel methodology by capitalizing on the recent advances in machine learning, par-ticularly in the area of online learning and active learning. The resulting approach will enable the exploitation of the user feedback in a novel summarization framework. The proposed method can be used as a journalistic writing aid in multiple ways, such as to ease the process of writing an arti-cle, to analyze the quality of the article and to increase the attractiveness of the article to cater vast audience.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

��

��

��

��

��

�� !��"#��

�� $�� %��&��'�$�� (��!��

��!��)�� #��!��#��

�� *��#��+,��

�� -�� '��.��

'��/��0+�

1��#��

2��2�

*��

��-��

(��34"#�� -��

��"#�� (�+��

��"#��-��

��*��!�� 0+��&��

(��54"�-��3/#��6�� $��5��3��6�� 7�� /��52+��63.��3��

7�� )��87��#��+�

1

��#��!��

��#��/��

��/��

2��*��*��80��

��

0��/��

��0��

/��

9 9

0��

��!4��

0�

�

��4��0��

:;*��80��

��

��&��<��

2��*��*��!/��

��

/��

��/��

9 9

/��

��!4��

/

��4��/��

:;*��8/��

�

��

23

24

Poster #8

Machine Learning for Information Importance Estimation Markus Zopf

We develop machine learning methods which are able to rank information according to importance and to avoid redundancy jointly. This is crucial in all automatic summarization scenarios. Our ap-proach for multi-document summarization (MDS) learns which information is in general considered to be important and applies the knowledge to summarization tasks. Prior work only analyzes the given source documents and estimates which information is most central in the documents which might be misleading in non-newswire texts. Prior work for incremental update summarization (IUS) used either a pipeline, which is fast but inaccurate, or clustering, which is slow but more accurate. Our method combines best of both worlds and detects important information fast and accurately. Our work can help journalists to detect important information in vast amounts of data and to write informative articles with less effort. We discuss the famous Panama Papers repository as an example for MDS as well as the Munich Shooting as an example for IUS.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

��

��

� ��"��"��,��0��

� "�$��"��

� 0��5��,�60��"�� $��"�"�"��5��$��"��5��"�"��5��5��$��

��5��"��5��06��5

��

��

7��"��89;7��"��<��7��"��<��

� =��0��$��"��>�# ��?��0��"�� "��6�"��"�"��"��$��"��0�0��6�"��"��

� ��"�"��,"��@"��

��

��

$�"��0��,(� "�$��7��H��"��H�� "��"��,��"��"��

��0��,(� �� 0��"��"�$��

��#��6�$�"��J�>��$��K

��

��"��"��6(

��

�� !��

��(�<��"�"��"��"��

"�#$��%�#$��%�&$��''�'(

7�8��T�V��6�=��?��*&X��7��"��$��"��"��"��Z�>��7�[�$��"��\��]��$��"��?�&��0�^��?��"��

X��,(�_"��0�`�"�$��"��"��$��"��

)

*�� !��

�� $�� +��

��,�7��"��"��"��"�$��

�� ,��

'"�#('-�($ '%�$" '%�'/'%�$0

�

�

\"$��"��

)��`��$��"�"��

[��"��

��0�`�)��"�"��

��5 ��

67��8��5 ��

��5 �� '��

<�@��"��[��"��

)��`�)��"�"��

��,(��"��J7��,"��K��

��+��9��

� � � � ��

� � � � ��

�

�

: � : � : � : � : � :��7 ��"j��>��$��( ��0��"��"��65

��7 ��"��(��$��"�"��"��"��

�

�

)

) )

��+��

9��

25

26

Poster #9

Content Selection as an Optimization Problem Maxime Peyrard

Every day, large digital document collections about any particular topic are produced which contain nuggets of important information hidden among many pages of content. The automatic discovery of relevant and important information nuggets in such collections is a task of great urgency, but it is very challenging. We approach this task by casting the discovery and selection of important and relevant information nuggets as an optimization problem. However, the definition of importance and relevance for a set of information nuggets is subjective and domain-dependent. We investigate several different definitions of relevance, and solve the corresponding optimization problems with various techniques from the field of optimization.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

��

� ��

��

Task:- Start with a collection of documents.- Extract a small set of ‘relevant’ sentences.

Optimization Problem:- Define properties that the selected set

should have.- Find the best set!

� ��

��

��

��

��

��

��

��

��

��

��

��

��

��

Goal:- Representing the content of the documents.- Extract sentences and generate text that maximizes the information span.

Semantic Graph:- Nodes are entities.- Edges are relations.- Encodes all the information in the source.

!��"��#��"��#��

27

28

Poster #10

Fact Extraction via combined SRL and RE Teresa Martin

We are combining two different annotation schema for sentences in order to obtain a richer annota-tion. The one is Semantic Role Labeling (SRL) which answers the question of “Who does What to Whom, When and Where?” for a given sentence; this is annotating a sentence with the right frame for the predicate and the corresponding roles for the arguments. The other is Relation Extraction (RE) for the construction of Knowledge Bases (KB) where KBs store facts about entities and RE is the process of finding facts in form of entities connected via relations in raw text. Combining these two schema brings the advantages of both, the semantic annotation as well as the concrete fact-focused annotation. We can see applications for Journalism when facts or information about specific topics of interest need to be extracted automatically.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

��

��

��

!��"�� !��" �#��

"��#�� !�� " �#��!��

"��#�� " �#�� $�� !��

%�� !��& ��'%&�(�� )� ��*�� ')*(��)*�*� �� #��!��

+��

��

��

��

�$$$�

��

%��&��'� ,� -� � -�

%��&��'� -� -� � ,�

$� ��.�� )� ��.��

/�� )� ��. �� !�'/).(��$� �� '$� ��+(��)� ��. �� !��/).�*� �� !��

0��1��23�� 45�

��

/).�� #�� !�� %&�)*��

�� #��

��(�)��

��*��+��,��-��*��.��

�#��/��0��

�� ! "�#��! $��! %�

%�&��'��'� ��$��()*+��%�

%�$��&��'��()*+�%�

�",�-� �! .��! /��! %�

��0��1��/�� "�#��

��6��!��

)��

29

30

Poster #11

Quantitative assessment of text quality Christopher Tauchmann

Manually evaluating the quality of aggregated data is a laborious and time consuming task. Besides assessing source documents, evaluation and monitoring of written articles e.g. during journalist education or in the scope of traineeships can substantially benefit from machines supporting experts in their work. Therefore, the work of journalists can be supported with the help of measures from different disciplines such as linguistics, information theory and automatic summary evaluation that we offer within the scope of our research.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

��

��

<=�>?��>

��

��

��

@��>�?��=H�?I��J�?��K�=��?>L

M>�?��J��H?��J��>�=��K�=��?>��N��K��I��>>HIL

QNH=H?��J�?�R?��=H�?I�J��

�=��H�>?��K=H?��H�K�?�H��>��U

� <=H�?�?H?�N��H�HI>�>�>=��?>��R��?��N��H�K�>HN�>�?��?�

� V>>�>>�?��=H�?I��J�>�=��>

� W��?��=H�?I��J��>�K�

?�R?

X�?�NH?��

��

��

M�>��?��?�R?��=H�?I��?��

�=H�?�?H?�N��H>=��>

��

!�� "�"�A!��

��

#��$��

#��

%�� &��

"""

'� ��

��

'� ��

VKH�?H?��

�

(��

A��A��

)��

31

32

Poster #12

Extracting Event Structures from Text Todor Mihaylov

Large number of events are happening every day. They are usually discussed in news articles in dif-ferent media websites or social media. Extracting information about events and arguments (“Who did what to whom”) is desired in order to keep track of them and disambiguate the articles about different events. In our work we develop system for automatic event extraction and classify event-event relations (temporal, causal, subevent and reporting relations). This allows us to extract rich event descriptions and schemas that can later be used for visualization or summarization.

Notizen

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

.........................................................................................................................................................

�� !"#��$%"�&��' �'�%(�)�*+�,%��,*-*��.!/�01*$

�� 2*+2%-�� 3�� 3!11%-�� !"4'�%-�� !�� " �� #�� $�� -!%-��%� �� $�� 3��&� �� 51�""%-��'� "�� & (�� (�� $��$�� -!%��)� �� 3(��*� �� +��,�

6%�% ��$%"��"-�,/%!��7��! !5�"�(

�$%"��,05%��"- 8%�1!(�6%�% �!*"

�� -��(��-� .��"��/��/01-�/��

��-��/0-� .��"��/��/01-�/��

��-0��-� 2�"��3��1-�/��

��-��#�� 2�"��+�#��1-�/��

��)�-��-� 2�"��3��1-4��/

��*�-��/0-� .��"��/��/01-4��/

�$%"�(�9*�%)%�%" % :"-�8%1��!*"�6%�% �!*"

!"4'�%-��

3!11%-��

-!%-��%�

2*+2%-��

�� 3��

�� 3��&�51�""%-��'�

5�!67�

5�!67�5�!67�.�89�

5�!67�.�89�

5�!67�.�89�

�!��7

;��!$%�& /%+��& �!5�� !*"

51�"

�� 3

3!11!"4'�%

-!%

1**3�)*�

��

$��$��

$��/�

4'-#%

4�!1

�5�'�%

/��

(��:/��;��

.%�/*-(

� 3��$-2��;9��-��(-��(��-<��0

� 2��/-��-��-"��-��-��-��-�(��

� .��

:551! ��!*"(�!"�<*'�"�1!(+

� �=��/-��"��(��-��-��1-��-$��/�$��-��-��-��-��-"��(-��-��/�

� ��(��-��-�$��-�((��>��-��-�-��-�$�/

� �=��/-/��$-"��-��$�/��-��

=%�%�*#%"%*'(�(*'� %(

� <��-��/�� .�((��-"��(� 9�/��-(��?-��1-7�� /�

� -@-��-@-�&

33

34

Anfahrt // Directions

…mit dem Auto: Nach Darmstadt kommen Sie über die Autobahnen A5 (von Frank-furt/M. bzw. Heidelberg/Basel) und A67 (von Köln/Wiesbaden bzw. Mannheim). Verlassen Sie die Autobahn an der Ausfahrt Darmstadt Stadtmitte und folgen Sie der Rheinstraße geradeaus in den Cityring-Tunnel. Nach der Rechtskurve im Tunnel fahren Sie am Ende des Tunnels links in die Hügelstraße.

Anschließend fahren Sie an der nächsten gro-ßen Ampelanlage wieder links in die Kirch-straße. An der nächsten größeren Kreuzung geradeaus, das Darmstädter Schloss liegt dann zu Ihrer Linken. An der nächsten Ampel rechts in die Alexanderstraße einbiegen, die im weiteren Verlauf zur Dieburger Straße wird. Das Georg Christoph Lichtenberg-Haus liegt in der Dieburger Straße in Fahrtrichtung rechts, Hausnummer 241. Parkmöglichkeiten sind vorhanden.

…mit der Bahn: Darmstadt wird von ICE-, IC- und EC-Zügen auf vielen Nord-Süd-Verbindungen angefahren. Im Nahverkehr ist Darmstadt an die Bahnlinien zwischen Frank-furt und Heidelberg bzw. Mannheim, Wiesba-den/Mainz und Aschaffenburg sowie Darm-stadt und Erbach bzw. Eberbach (Odenwald) angeschlossen.

Die Bus-Linie F (Richtung „Oberwaldhaus“) fährt das Georg Christoph Lichtenberg-Haus direkt an. Die Die Haltestelle „Fasanerie“ ist direkt vor dem Haus. Aus der Innenstadt kön-nen Sie die Busse ab Luisenplatz, Schloss und Alexanderstraße/TU leicht erreichen. Am Hauptbahnhof benutzen Sie bitte den West-ausgang (Europaplatz). Dort finden Sie den F-Bus an Haltestellenplatz 22.

Adresse: Dieburger Straße 241, 64287 Darmstadt

35

Tagungsort // Venue

Georg-Christoph-Lichtenberg-Haus

Das Georg-Christoph-Lichtenberg-Haus ist das Gästehaus der TU Darmstadt für internationa-le Gastwissenschaftlerinnen und Gastwissen-schaftler, Doktorandinnen und Doktoranden, Postdocs sowie Forschungsstipendiatinnen und -stipendiaten.

1898 erbaut und 1910 im Jugendstil restau-riert, ist es ein Beispiel jenes architektoni-schen Stils, der zum Markenzeichen Darm-stadts wurde und der Stadt zu weltweiter Be-kanntheit verholfen hat.

The Georg Christoph Lichtenberg-Haus is a TU Darmstadt guest house for international guest professors, doctoral candidates, postdocs and visiting research fellows.

Built in 1898 and restored in Art Nouveau style in 1910, the house is an example of this unique architectural style that has become a landmark in Darmstadt, which has helped the city gain international prominence.

Foto: Katrin Binner

36

Kontakt // Contact

Graduiertenkolleg AIPHES

Iryna Gurevych (Sprecherin) Telefon: (06151) 16–25290 gurevych (ät) ukp.informatik.tu-darmstadt.de

Christian M. Meyer (Wissenstransfer) Telefon: (06151) 16–25293 meyer (ät) ukp.informatik.tu-darmstadt.de

Hochschulstraße 10 64289 Darmstadt http://www.aiphes.tu-darmstadt.de

Veranstaltungsorganisation

Forum interdisziplinäre Forschung (FiF) Telefon: (06151) 16–22130 fif (ät) fif.tu-darmstadt.de http://www.fif.tu-darmstadt.de

Pressestellen

Technische Universität Darmstadt Jörg Feuck Telefon: (06151) 16–20017 presse (ät) tu-darmstadt.de

Universität Heidelberg Ute Müller-Detert Telefon: (06221) 54-19017 kum (ät) uni-heidelberg.de

HITS gGmbH, Heidelberg Peter Saueressig Telefon: (06221) 533 245 peter.saueressig (ät) h-its.org

Journalismus trifft Informatikforschung - fif.tu-darmstadt.de · Ilham Anas, a 40-year-old from...

Documents

Transcript of Journalismus trifft Informatikforschung - fif.tu-darmstadt.de · Ilham Anas, a 40-year-old from...