Journalismus trifft Informatikforschung - fif.tu-darmstadt.de · Ilham Anas, a 40-year-old from...
Transcript of Journalismus trifft Informatikforschung - fif.tu-darmstadt.de · Ilham Anas, a 40-year-old from...
1
Journalismus trifft Informatikforschung // Journalism meets Computer Science
Wissenstransferworkshop des DFG-Graduiertenkollegs GRK 1994 „Adaptive Informationsaufbereitung aus heterogenen Quellen“ (AIPHES) 11. November 2016, Lichtenberghaus, Darmstadt.
2
Überblick // Overview
Auftakt // Preface .......................................................................................................................... 3
Programm // Program ................................................................................................................... 4
Über AIPHES // About AIPHES ...................................................................................................... 5
Beteiligte: Journalismus // Participants from Journalism ............................................................... 6
Beteiligte: Informatikforschung // Participants from Computer Science ........................................... 7
Forschungsprojekte // Research projects ......................................................................................... 9
Anfahrt // Directions ................................................................................................................... 34
Tagungsort // Venue .................................................................................................................... 35
Kontakt // Contact ....................................................................................................................... 36
3
Auftakt // Preface
Intensive Recherche ist für viele Tätigkeiten zentral und unterliegt sehr engen Zeitvorga-ben. Insbesondere bei Entscheidungsprozes-sen hat die Qualität der Rechercheergebnisse weitreichende Konsequenzen. Zugleich explo-diert die relevante Informationsmenge und elektronische Quellen werden immer komple-xer, sind hochgradig heterogen und weisen unterschiedliche Informationsqualität auf.
Im April 2015 haben die Universitäten Darm-stadt und Heidelberg sowie das Heidelberger Institut für Theoretische Studien (HITS) das Graduiertenkolleg „Adaptive Informationsauf-bereitung aus heterogenen Quellen“ (AIPHES) aufgelegt, in dem Fragen der Recherche, Strukturierung, Aggregation und Bewertung von Informationen erforscht werden.
Ziel dieses Workshops ist es, mögliche Ein-satzgebiete für die erforschten Informatikme-thoden im Journalismus zu identifizieren und zu diskutieren. Dazu werden die Mitglieder des Kollegs ihre Forschungsvorhaben vorstel-len und gemeinsam mit Expert aus der journa-listischen Praxis diskutieren. Die richtigen Anknüpfungspunkte zu finden, wird keine leichte Aufgabe, aber wir sehen großes Poten-tial etwa im Bereich Investigativ- und Daten-journalismus sowie bei der Faktenverifikation. Insofern freuen wir uns auf viele anregende Diskussionen und Ideen zu aktuellen oder auch kommenden Forschungsarbeiten.
The importance of thorough research under tight deadlines is increasing rapidly and the consequences of the quality of the research re-sults are far-reaching, especially for decision making processes. At the same time, the amount of information is growing exponentially and there is a continual increase of complexity, het-erogeneity, and a high variation in the quality of electronic information sources.
In April 2015, the Universities at Darmstadt and Heidelberg together with the Heidelberger Institut für Theoretische Studien (HITS) found-ed the research training group “Adaptive Prepa-ration of Information from Heterogeneous Sources” (AIPHES). Core research questions in AIPHES are investigation, structuring, aggrega-tion, and assessment of information
The main goal of today’s workshop is to identify and discuss possible applications in journalism for the newly researched computational meth-ods. To this end, the members of the research training group will present their research pro-jects and discuss them together with experts and practitioners from the journalistic domain. Finding the best links between computer science and journalism is by no means easy, but we expect high potential, for instance, in investiga-tive and data journalism as well as for fact veri-fication. This is why we are looking forward to many stimulating discussions and ideas for cur-rent and prospective research projects.
4
Programm // Program
11. November 2016
ab 08:45 Uhr Ankunft Arrival
09:00 Uhr Begrüßung und Vorstellung Welcome address and introduction
09:30 Uhr Projektmesse (Teil 1) Poster and demo presentation (1)
10:45 Uhr Kaffeepause Coffee break
11:15 Uhr Projektmesse (Teil 2) Poster and demo presentation (2)
12:30 Uhr Mittagspause Lunch break
13:45 Uhr Projektmesse (Teil 3) Poster and demo presentation (3)
15:00 Uhr Kaffeepause Coffee break
15:30 Uhr Plenumsdiskussion Plenary discussion
bis 16:30 Uhr Abschluss End of the workshop
5
Über AIPHES // About AIPHES
Die Vision des Graduiertenkollegs „Adaptive Informationsaufbereitung aus heterogenen Quellen“ (AIPHES) ist es, strukturiertes Wis-sen aus heterogenen Textquellen mit automa-tisierten Methoden zu extrahieren und zu einem informativen und stilistisch homogenen Dossier aufzubereiten. Es werden Methoden entwickelt, die sich an unterschiedliche Textsorten und Sachgebiete anpassen und sich so auf unterschiedliche Aufgabenstellun-gen, Nutzer und Sprachen übertragen lassen.
Das komplexe Problem der adaptiven Infor-mationsaufbereitung bedarf der Erforschung integrierter Techniken auf Basis mehrerer Wissenschaften. Zentrale Forschungsfragen bestehen in der computerlinguistischen Dis-kursverarbeitung und in sprachtechnologi-schen Methoden zur Strukturierung und Ag-gregation von heterogenen Dokumentsamm-lungen, in der Repräsentation und Analyse von textinduzierten Strukturen unter Einsatz von Netzwerkanalyse und maschinellem Ler-nen sowie in den Kriterien und Mechanismen zur Qualitätsbewertung von heterogenen Quellen und Dossiers im Informationsma-nagement. Die Multidokumentzusammenfas-sung dient als prototypische Aufgabenstellung und soll erste Anknüpfungspunkte zur An-wendungsdomäne Journalismus liefern.
Im Graduiertenkolleg forschen 11 Promovie-rende gemeinsam mit 21 assoziierten und 8 leitenden Wissenschaftler*innen. Das Pro-gramm wird von der Deutschen Forschungs-gemeinschaft (DFG) seit April 2015 gefördert.
The vision of the graduate program “Adaptive Preparation of Information from Heterogeneous Sources” (AIPHES) is to extract structured knowledge from heterogeneous text sources us-ing automated means in order to create in-formative dossiers of stylistically homogeneous content. Within this project, we develop meth-ods that are able to adapt to different text gen-res and domains, so that the results can easily be transferred to other tasks, user groups, and languages.
Adaptive information processing is a complex task which requires extensive research on inte-grated methods that incorporate knowledge of multiple scientific disciplines. The main research questions are the computational linguistic mod-eling of discourse phenomena and natural lan-guage processing methods for structuring and aggregating heterogeneous text genres, the rep-resentation and analysis of text-induced struc-tures based on network analysis and machine learning, and the criteria and mechanisms for selecting and assessing the quality of heteroge-neous sources and dossiers in information man-agement. Multi-document summarization serves as a prototypical task that is jointly addressed by all AIPHES members and that should yield first ideas for cooperations with application experts in journalism.
In AIPHES, 11 Ph.D. students work together with 21 associated researchers and 8 principal investigators from multiple fields. The German Research Foundation DFG funds the program since April 2015.
6
Journalismus trifft… // Participants from Journalism
Daniel Drepper CORRECT!V, Essen/Berlin Telefon: (030) XXXX [email protected] http://correctiv.org Twitter: @danieldrepper
Anne Preger Freie Wissenschaftsjournalistin Dipl. Geoökologin, Bonn Telefon: (0228) XXXX [email protected] http://www.hoerweiten.de Twitter: @apreger
Lars Hennemann Chefredakteur Echo Zeitungen, Darmstadt Telefon: (06151) XXXX [email protected] http://www.echo-online.de Twitter: @larsoliverhen
Kersten A. Riechers quäntchen + glück, Darmstadt Telefon: (06151) XXXX [email protected] http://www.quäntchen-und-glück.de Twitter: @dasKerst
Andreas Loos Data Scientist ZEIT ONLINE GmbH, Hamburg Telefon: (030) XXXX [email protected] http://www.zeit.de
Tanjev Schultz Professur für Journalismus Johannes Gutenberg Universität, Mainz Telefon: (06131) XXXX [email protected] http://www.journalistik.uni-mainz.de
Lorenz Lorenz-Meyer Professur für Online-Journalismus Hochschule Darmstadt Telefon: (06071) XXXX [email protected] https://oj.mediencampus.h-da.de Twitter: @lorenzlm
Jutta Witte Journalistenbüro Surpress GbR, Tübingen Telefon: (07472) XXXX http://www.surpress.org
Stefan Michaelsen Entrepreneurship Fellow Media Lab Bayern, München Telefon: (089) XXXX http://www.linkedin.com/in/ stefan-michaelsen-844062102
Vanessa Wormer Datenjournalistin Süddeutsche Zeitung, München Telefon: (089) XXXX [email protected] http://www.sueddeutsche.de Twitter @remrow
7
Informatikforschung // Participants from Computer Science
Projektleitung // Principal Investigators
Judith Eckle-Kohler UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX eckle-kohler (ät) ukp.informatik.tu-darmstadt.de
Anette Frank Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX frank (ät) cl.uni-heidelberg.de
Johannes Fürnkranz Knowledge Engineering, TU Darmstadt Telefon: (06151) 16–XXXX info (ät) ke.tu-darmstadt.de
Iryna Gurevych (Sprecherin) UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX gurevych (ät) ukp.informatik.tu-darmstadt.de
Christian M. Meyer UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX meyer (ät) ukp.informatik.tu-darmstadt.de
Michael Strube Natural Language Processing Group, HITS Telefon: (06221) 533–XXXX michael.strube (ät) h-its.org
Karsten Weihe Algorithmik, TU Darmstadt Telefon: (06151) 16–XXXX weihe (ät) algo.informatik.tu-darmstadt.de
Promovierende // Doctoral Researchers
Thomas Arnold Netzwerkanalyse, TU Darmstadt Telefon: (06151) 16–XXXX arnold (ät) aiphes.tu-darmstadt.de
Avinesh PVS Informationsmanagement, TU Darmstadt Telefon: (06151) 16–XXXX avinesh (ät) aiphes.tu-darmstadt.de
Benjamin Heinzerling Computerlinguistik, HITS Telefon: (06221) 533–XXXX heinzerling (ät) aiphes.tu-darmstadt.de
Gerold Hintz Sprachtechnologie, TU Darmstadt Telefon: (06151) 16–XXXX hintz (ät) aiphes.tu-darmstadt.de
Tobias Falke Sprachtechnologie, TU Darmstadt Telefon: (06151) 16–XXXX falke (ät) aiphes.tu-darmstadt.de
Andreas Hanselowski Informationsmanagement, TU Darmstadt Telefon: (06151) 16–XXXX hanselowski (ät) aiphes.tu-darmstadt.de
Ana Marasovic Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX marasovic (ät) aiphes.tu-darmstadt.de
8
Promovierende // Doctoral Researchers
Teresa Martin Maschinelles Lernen, TU Darmstadt Telefon: (06151) 16–XXXX martin (ät) aiphes.tu-darmstadt.de
Todor Mihaylov Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX mihaylov (ät) aiphes.tu-darmstadt.de
Maxime Peyrard Sprachtechnologie, TU Darmstadt Telefon: (06151) 16–XXXX peyrard (ät) aiphes.tu-darmstadt.de
Markus Zopf Maschinelles Lernen, TU Darmstadt Telefon: (06151) 16–XXXX zopf (ät) aiphes.tu-darmstadt.de
Projektmitarbeiter // Project staff
Christopher Tauchmann Informationsmanagement, TU Darmstadt Telefon: (06151) 16–XXXX tauchmann (ät) aiphes.tu-darmstadt.de
Weitere Beteiligte: http://www.aiphes.tu-darmstadt.de
Assoziierte // Associated Researchers
Lisa Beinborn UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX beinborn (ät) ukp.informatik.tu-darmstadt.de
Chris Biemann Sprachtechnologie, Universität Hamburg Telefon: (040) 42883 XXXX biemann (ät) informatik.uni-hamburg.de
Richard Eckart de Castilho UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX eckart (ät) ukp.informatik.tu-darmstadt.de
Eneldo Loza Mencía Knowledge Engineering, TU Darmstadt Telefon: (06151) 16–XXXX eneldo (ät) ke.tu-darmstadt.de
Margot Mieskes Forschungs- und Wirtschaftsdaten, Hochschule Darmstadt Telefon: (06151) 16–XXXX margot.mieskes (ät) h-da.de
Éva Mújdricza-Maydt Computerlinguistik, Universität Heidelberg Telefon: (06221) 54–XXXX mujdricz (ät) cl.uni-heidelberg.de
Daniil Sorokin UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX sorokin (ät) ukp.informatik.tu-darmstadt.de
Christian Stab UKP Lab, TU Darmstadt Telefon: (06151) 16–XXXX stab (ät) ukp.informatik.tu-darmstadt.de
9
Forschungsprojekte // Research projects
Session 1 (09:30–10:45)
1. Entity Linking: Automatically Grounding Text in a Knowledge Base (Benjamin Heinzerling)
2. Automatic Text Summarization with Concept Maps (Tobias Falke)
3. Argument Retrieval through Real-Time Analysis on Big Data (Christian Stab)
4. Motif analysis of text-based graphs (Thomas Arnold)
Session 2 (11:15–12:30)
5. Deep Learning with Sentiment Inference for Discourse-oriented Opinion Analysis (Ana Marasovic)
6. Argumentation Analysis Techniques for Fact Checking (Andreas Hanselowski)
7. Computer Assisted Multi-document Summarization and Evaluation (Avinesh PVS)
8. Machine Learning for Information Importance Estimation (Markus Zopf)
Session 3 (13:45–15:00)
9. Content Selection as an Optimization Problem (Maxime Peyrard)
10. Fact Extraction via combined SRL and RE (Teresa Martin)
11. Quantitative assessment of text quality (Christopher Tauchmann)
12. Extracting Event Structures from Text (Todor Mihaylov)
10
Poster #1
Entity Linking: Automatically Grounding Text in a Knowledge Base Benjamin Heinzerling
Entity linking (EL) is the task of automatically linking mentions of entities such as persons, loca-tions, or organizations to their corresponding entry in a knowledge base (KB). EL grounds a given document to a KB and thus makes the rich data contained in the KB available for use cases such as automatic semantic indexing, text enrichment, and entity-aware search.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
Entity Linking: Automatically
Grounding Text in a Knowledge Base
Benjamin Heinzerling
What is this text about?
Ilham Anas, a 40-year-old from Jakarta, Indonesia, works asObama’s doppelgaenger.
Human Answer
Ilham Anas, a 40-year-old from Jakarta, Indonesia,works as Obama’s doppelgaenger.
Entity Linking Answer
�������������� ���������������������������������
�����������������������������
�! "
���������������#����#�������
���������������#����#������
���������������#����#$���%�&�����
���������������#����#'������(���
Why? Why is this hard?
� Ambiguity of language:Obama Barack Obama, Michelle Obama, MaliaObama, Sasha Obama, Barack Obama Sr, Obama(Fukui, Japan), Mt. Obama, Obama Burmeisteri?
� Variability of language:Barack Obama Barack Obama II Barack Obama Jr. Barack
Obama Junior Barack Obama, Jr. Barack Obama, Junior Barack Hussein
Barack Hussein Obama Barack Hussein Obama II Barack Hussein Obama
Jr. Barack Hussein Obama Junior Barack Hussein Obama, Jr. Barack Hussein
Obama, Junior Barack Hussein obama Barack H. Obama Barack H. Obama II Barack H.
Obama Jr. Barack H. Obama Junior B. H. Obama B. Hussein Obama B. Obama Pres. Obama
President Barack H. Obama President Barack Hussein Obama II President Barack Obama President
Obama Sen. Obama Senator Barack Obama US President Barack Obama United States President
Barack Obama 2008 Democratic Presidential Nominee 44th President of the United States Barack
Obana Barack Obbama Barack Ubama Barack OBama Barack obma BarackObama Barak
Obamba Barck Obama Barock obama Borack Obama Borrack Obama Brack Obama Brock
Obama Burack obama Hussein Obama Obamma 0bama Barack O’Bama O’Bama O’bama
How?
� Look at context words:� The Japanese city of Obama. . .
� Obama’s last day in office as President
� Look at context entities:� Obama, located in Fukui prefecture, is a. . .
� Obama’s wife Michelle. . .
Use cases
� Automatic semantic indexing of documents� Which persons, locations, organizations, etc. are mentioned?
� Ties rich data in knowledge base to documents
� Automatically enrich text, display info boxes
� Entity-aware search, no naive string match
� Complex queries based on knowledge base relations
Get in touch
Questions or comments: [email protected]
Acknowledgements: This work has been supported by the German Research Founda-tion as part of the Research Training Group “Adaptive Preparation of Information fromHeterogeneous Sources” (AIPHES) under grant No. GRK 1994/1.
12
Poster #2
Automatic Text Summarization with Concept Maps Tobias Falke
Concept maps are labeled graphs in which every node represents a concept and every edge a rela-tionship that holds between the two connected nodes. Originally developed by psychologists for applications in the education domain, this formalism generally allows to represent information in a structured way. In our work, we aim to automatically create concept maps for document collections such that the map is a summary containing the most important content from the collection. Meth-odologically, this challenge is tackled in several steps, in which potential labels for concepts and relations are extracted from the documents, ranked by importance and then combined into a map. Besides their usage as summaries, we see several other applications of these maps: In an interactive text exploration system, generated concept maps could be used as navigation structures, allowing to explore the document collection, to focus on subsets of the documents and to navigate to locations of interest in them.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
Automatic Text
Summarization with Concept Maps
Tobias Falke
Concept Map Summaries
Generation Approach
Application: Interactive Text Exploration
Munich shooting
Olympia
shopping mall
Moosach
district
10 people
David Sonboly
suicide
36 people
were
injured
atwere
killed
at
is located at
took place at
is
responsible
for
comitted
Munich shooting
behavior
modification
alternative ADHD
treatments
alt. ADHD treatments
bio feedback
multiple studies
Feingold diet
good information
medical
professionalscan provide
dismiss
is
is kind of
include
support
Concept
Extraction &
Grouping
Relation
Extraction &
Grouping
Concept
Ranking
Relation
Ranking
Map
Construction
gesture recognition Idea:
Enhance search engine with
generated concept maps
Possible usage:
Navigate to occurrences of
concepts in the documents
Filter documents by a selected
concepts, build new map
Navigate through map from
concept to conccept
13
14
Poster #3
Argument Retrieval through Real-Time Analysis on Big Data Christian Stab
We present a cognitive system designed to help journalists to quickly get an overview of opinions around a disputed topic. The system presents a customized summary of potential arguments rele-vant to a user-specified query. The summary is created in real-time from arbitrary large web sources. For example, the system could help to find a balanced set of options around the question "What are the most important reasons for and against TTIP?". Rather than having to read through many (potentially) non-relevant documents retrieved using a common web search engine, the sys-tem will select and summarize the most relevant standpoints with respect to the given topic. It does so by aggregating automatically preselected claims in a customizable manner (e.g. for and against a standpoint). The system makes use of recent advances in the fields of natural language processing and machine learning, by abstracting meaningful insights from a small body of human-created ar-gumentative structures.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
Argument Retrieval
through Real-Time Analysis on Big Data
Christian Stab, Johannes Daxenberger, Chris Stahlhut, Can Diehl and Iryna Gurevych
Ubiquitous Knowledge Processing Lab (UKP),
Department of Computer Science, Technische Universität Darmstadt
• High application potential
• Change of search paradigm
• Novel way to use big data • Quick and easy to use
Currently we are searching for collaborators
Contact us!
The Vision
Unstructured Web Data Structured System Output Benefits
� Arguments comprehensively summarized
� Grouped into pro and con arguments
� No time-consuming web searches
� Sources are directly accessible for further investigation
Use Cases
Scenario: Journalism� A journalist is interested in news and arguments related to a common topic
� Instead of using the web, the journalist first uses this system� This way the journalist gets an instantaneous overview over arguments about
the topic� Real-time analysis ensures that the overview includes the latest news
� Browsing the results, links to all web resources are available for further analysis
Summary
Scenario: Purchase Decision
� A user is interested in purchasing a product � Instead of searching through numerous product reviews, the user can search for
the product with the system � This way the user gets an instantaneous overview over pros and cons of this
product
Migration in Europe
⊕ Migration is a chance for development (15
Hits)
⊕ Religious pluralism strengthens society (12
Hits)
⊕ Prosperity will result (8 Hits)
Economy needs skilled workers (4 Hits)
→ Not enough apprentices [Economist]
→ Germany lacks workers [Huffpost]
→ Number of students drops [Sun]
→ Orders cannot be processed [Economist]
Argument 5-10 of 75
⊕ Woman face discrimination (9 Hits)
Crime rates are rising (4 Hits)
→ Several detained in Cologne [Independent]
→ Right-wing violence raised by 65%
[Spiegel]
→ More gun licenses issued [Telegraph]
→ Arson attack on gymnasium [Twitter]
⊕ Extremist parties gaining popularity (3 Hits)
⊕ Walls raised at the EU border (2 Hits)
Argument 1-5 of 22
Goal
• Extract arguments from heterogenous web
content for both German and English
• Aggregate arguments based on user-defined queries
• Display arguments in a structured way which helps to gain an instant overview over the topic
System Architecture
User interface� User requests a topic� System searches in
annotated documents� Live Aggregation of
arguments� Structured results are
presented to the user
Argument Extraction� Crawling heterogeneous
sources from the web� System preprocesses data� Arguments are extracted
using Deep Learning
CrawlerPre-
processingArgument Extraction
User-interface
Information Retrieval Aggregration
15
16
Poster #4
Motif analysis of text-based graphs Thomas Arnold
Motif analysis deals with recurrent patterns in graph structures and networks. We transform docu-ments into different graph representations of nodes and edges, using language features and charac-teristics of the source text. This allows us to apply principles and algorithms of graph theory to search for statistically significant patterns – so-called motifs. These motifs can be used not only to classify, but also learn about hidden properties of the texts. Possible applications include quality assessment of text, categorization, or using motif signatures to create author / genre profiles.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
Motif analysis ofText-based GraphsThomas Arnold
Main Idea
Text
Graph Motifs
1
4
2
3
Our father in heaven. Your kingdom come.
On earth as in heaven.
Graph Representation - Example
For the kingdomof the father.
Our father in heaven. Your
kingdom come. For the kingdom
of our father. On earth as in
heaven.
Graph Motifs - Examples
FrequentSubgraphs
CentralNodes
LongPaths
1 2
3
1 2
3Featured article Nonfeatured article
1 2
3
1 2
4 3
We identified motifs that are positively or negatively correlated with Wikipedia article quality.
Done: Predict Article Quality in Wikipedia
Current Research Ideas
Motif based timeline analysis.
Sentence ASentence BSentence CSentence DSentence E
Sentence A
Sentence B
Sentence C
Sentence D
Sentence E
Motifs for sentence ordering to improveautomatic summarization.
Text
Improve automatic evaluation of text quality.
Use motif signatures for author / genre profiles.
Profit!
17
18
Poster #5
Deep Learning with Sentiment Inference for Discourse-oriented Opinion Analysis Ana Marasović
Fine-grained opinion analysis is important for a variety of NLP tasks including opinion-oriented question answering and opinion summarization. A neural network-based method for joint extrac-tion of opinion entities, i.e. opinion expressions and their holders and targets, and relations among them will be proposed. To combat the problem of scarcity of labelled data for languages other than English, we will exploit an adversarial framework for the proposed model, and evaluate it on Ger-man data. We will approach the sentiment analysis task from a discourse perspective. We apply sentiment inference beyond the sentence level, with the aim of obtaining a denser, fine-grained rep-resentation of sentiment across the entire discourse. We address the sentiment analysis as a knowledge base completion task, using matrix factorization or distillation as a method. This ap-proaches lead to dense, structured discourse representation, enriched with inference rules that ap-ply within and beyond sentences. The importance of an integrated dense discourse representations and sentiment inference rules will be tested in multi-document and multi-perspective scenarios for both English and German. A challenge for the applicability of inference rules are occurrences of propositional anaphors that refer to situations or events. We will investigate the impact of proposi-tional anaphora resolution on sentiment analysis, and vice versa. To this end, we develop an ac-count for abstract anaphora resolution, and investigate its impact on sentiment analysis, in a pipe-line and in a joint modelling approach.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
Deep Learning with Sentiment Inference
For Discourse-oriented Opinion Analysis
Ana Marasovic
What needs to be addressed
1. Fine-grained opinion analysis: detect explicit opinion expressions and their sentiment, identify targets (entities or propositions at which thesentiment is directed), holders (entities that express the opinion) and relations among extracted entities.
⇒
holders
targets
⇒
2. Detect implicit sentiment via inference on explicit sentiment and events that positively or negatively affect entities.
Example: Mexico’s president criticised U.S. gun laws, that enable weapons flow from the U.S. into the hands of Mexican drug cartels.
AND ⇒
3. Connect pieces of texts to obtain meaningful information.
Example: U.S. gun laws enable weapons flow from the U.S.
into the hands of Mexican drug cartels. [...] Mexico’s president
criticised this issue.
Methods
Deep learning methods:� a deep bi-directional LSTM as a baseline for labelling opinion entities
� improvement: multi-task learning with structurally related low-level SRL task
� progressive neural networks as a MTL framework
� the adversarial framework for cross-lingual labelling
� a distillation method that transfers the structured information of logic rulesinto the weights of neural networks
Use-case
� highlight extracted entities to follow a text more easily
� find groups of people with similar viewpoints
� detect conflicting opinions
� opinion-oriented summarization
� non-factual question answering
Get in touch
Questions or comments: [email protected] Acknowledgements: This work has been supported by the German Research Founda-tion as part of the Research Training Group “Adaptive Preparation of Information fromHeterogeneous Sources” (AIPHES) under grant No. GRK 1994/1.
20
Poster #6
Argumentation Analysis Techniques for Fact Checking Andreas Hanselowski
With the enormous number of articles which are daily published on the web, it is difficult to keep up with the amount of available information and to verify whether it is reliable and can be utilized for further use. As a result, unverified rumors spread quickly through social media and lead to disin-formation of the public. To be able to verify the confidence of the articles as they are appearing on the web quickly, we propose a tool which is able to identify the argumentative structure in the arti-cles and verifies the claims automatically. This is done by identifying the claims and aggregating evidence which support or contradict the claims. On the basis of the evidence, the validity of the claims is assessed by leveraging state-of-the-art natural-language-processing techniques. In a subse-quent step, the validity of the major claim is validated so a confidence score for the article as a whole can be determined.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
������
���������� ������� ���������� ��� �� �������������� ����������
���������� ��������������������� ������������������������������������������������������� ��� � ��� �� ������������ ����������������� ���������� ����������������� !�"�������#��������� � ��� $�%�������& ���� ��������������������� �����������'���� !�"���������#����������������������()�**� *�����+�*�� �� ,�������-�&���������"+�.���#������/���� ���� ,0������12������ ���������������)�**������� ��� ����� )��)������ ������������������0���� !�"%����#
���������� �������� !"����������� �������������������������������������� ���� ������ ����������������
��������������������"�#$��������%
��������������������� ���������� �* 1����� .��&�)�)���������� �� �� $��� %������ )������� �� ���"���������#
#� ������������#���������
����2 34�5 �6789:;<�5 �/5��� :��� =�� ��� ��� ������ ���� $� ������$ ������ ��� ���� #� � %��&� $� ��� ��������� �� ���
%�������� �� ��� �
1���*���������2���+��������*��������������������� 1���2��+��������*���������������� 9����+���������+��������*� 1����������������+����������*� >�*4���������+����+���������+����������*� <�����������+����*�?�������*�������������
'(��� ��� #���� �
1 *�� ���� �� *������ *������ ������ �� @�*��2 ���� ��� 2��� �% ���� ���� ��,���� @�*�� �+� �����/ (
&�'������"�(�)"���"��"����"���� *����������+"���
#��������� ���������;�����**���������� @�*����������
1�*�����������*�������*��������������������/
��,�����@�*����+�������/
������������ ����������������� ����������
����� ������/�"�������%�
���/���� ���� ,0������12���������� �����������)�**� A ������� ����� )��)������ ������������
"%����#
��� ���������� �* 1�����.��&�)�)���� ������ (��* B��� ���/�"���������#
%������ ���)������*
1���*������� ����2�
<����� ����������
<�����������+��������*��<�����������+��������*��
���� ���
���� �
������
����������� ����
C�� -����0 ���
C�� -����� ��,
C�� -����� ��,�
'�� �������������� ����;�����**���������� @�*����������
1�*�����������*�������*��������������������/
��,�����@�*����+�������/
���� �
������ ����
21
22
Poster #7
Computer-assisted Multi-document Summarization and Evaluation Avinesh PVS
For the past couple of decades there has been intensive work in the area of automatic summariza-tion, but unfortunately, the quality of the resulting summaries is still low. In view of this problem, we propose computer-assisted summarization (CAS) systems incorporating user feedback as an al-ternative to existing fully automatic summarization systems. CAS systems have the potential to pro-duce high-quality, human-like summaries, as they allow the user to post-edit an automatic summary to draft according to their requirements.
We propose a novel methodology by capitalizing on the recent advances in machine learning, par-ticularly in the area of online learning and active learning. The resulting approach will enable the exploitation of the user feedback in a novel summarization framework. The proposed method can be used as a journalistic writing aid in multiple ways, such as to ease the process of writing an arti-cle, to analyze the quality of the article and to increase the attractiveness of the article to cater vast audience.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
������������ ������ �������
����������� ��������������������� ������������������ ���������������� ������������
����������������� �������������������
�����
���������� ���������� ��� ������������ ����� ���������
������������� ������������������������������������ ����!�����������"#������������
��������� �����������$����������� �������%���������������&������'�$������� (�����������!��������������������
�������������������!�����)�� ���#���!�������#���������
��������� *���#������+,������������
������ ����������������������-���� '��.�����
'��������������/��������������0+�
1����#�����������
2������������2�
*������������
�����-�������
(��������34"#���� -����������
��"#���� (�+�������
��"#����-����������
������������������������*���!������������������ 0+����&����
(��������54"�-�������3/#���6�� ������$���������5������3�������6�� 7����������� /���������52+����63.����3�������
7���������������� ��)�����87�����������#�������+�
1
���#���!�������
���#����������/������������
������/������
2�������*���*�������80����
�����
0����/������
����0����
/������
9 9
0�����
����!4���
0�
�
��������4����0�����
:;*������80��������
����������
��&���<������
2�������*���*�������!/������
�����
/������������
����/�������
9 9
/������
����!4���
/
��������4����/������
:;*������8/����������
�
�����������
23
24
Poster #8
Machine Learning for Information Importance Estimation Markus Zopf
We develop machine learning methods which are able to rank information according to importance and to avoid redundancy jointly. This is crucial in all automatic summarization scenarios. Our ap-proach for multi-document summarization (MDS) learns which information is in general considered to be important and applies the knowledge to summarization tasks. Prior work only analyzes the given source documents and estimates which information is most central in the documents which might be misleading in non-newswire texts. Prior work for incremental update summarization (IUS) used either a pipeline, which is fast but inaccurate, or clustering, which is slow but more accurate. Our method combines best of both worlds and detects important information fast and accurately. Our work can help journalists to detect important information in vast amounts of data and to write informative articles with less effort. We discuss the famous Panama Papers repository as an example for MDS as well as the Munich Shooting as an example for IUS.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
�������������� ���� ����������������������������
���������
� ������"��"����,��0����������
� "�$���"��������������������
� 0�����������������5����,�60��������"������ $��"�"�"���5����$��"��5����"�"���5�������������5��$����
�����5�������"���5���06���5
������������ �� ������������������� �������� ������ �������� �����
�����������������������
7������"������89;7������"������<�����7���������"������<�����
� =����0�����$����"���������>�# ���?�����0�����"������ �"���6�"����"�"���"������$�����"�����0�0����6�"�������"��
� ���"�"��������������,"�����@"���
������������� ���� ��
��������������� �������� ������ �
$�"���0��,(� "�$�������7��������H���"���H��������� ��������"���"������,������"�������"��
���0��,(� ������ �������� �� ���������0����"��"�$�������
������������������#�����6�$�"���J�>��$���K
���������������������������������������
������������"������������������"�����������6(
���� ������������������������������������������������������������������������������������������������������
���� ��������������� !�� �
����(�<�����������������������"�"���"�������"���������
"�#$����������������%�#$�����%�&$�����������������''�'(
7�8���T�V����6�=���?��������������*&X���7������"��$������������"��������������"���������"��Z�>����7�[�$��"��\����]������������$������"��?�&�����0�^���?��"��
X��,(�_"�����0�`�"�$�������"�������"��������������$���"���
)
*����������������� ������ !�� �
����� ���$������������� ������+��� �����
���,�7����"�����"�������"���"�$�������
��� ���,���� ��
'"�#('-�($ '%�$" '%�'/'%�$0
�
�
\"$��"��
)�����`��$�������"�"���
[�����"��
����0�`�)��������"�"���
���������5 ���� ��
67�����8�����5 ���� ��
��������5 ���� �� '���
<�@���"���[�����"��
)�����`�)��������"�"���
���,(��������������������"���J7���,"��K�����������
����+�����9��� ���
� � � � ��
� � � � ��
�
�
: � : � : � : � : � :�����7 ����������"j��������>��$���( �����0�������"������"���������65
�������7 ���������"��(����$��"�"�����������������������"�������"��
�
�
)
) )
����+�����
9��� ���
25
26
Poster #9
Content Selection as an Optimization Problem Maxime Peyrard
Every day, large digital document collections about any particular topic are produced which contain nuggets of important information hidden among many pages of content. The automatic discovery of relevant and important information nuggets in such collections is a task of great urgency, but it is very challenging. We approach this task by casting the discovery and selection of important and relevant information nuggets as an optimization problem. However, the definition of importance and relevance for a set of information nuggets is subjective and domain-dependent. We investigate several different definitions of relevance, and solve the corresponding optimization problems with various techniques from the field of optimization.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
���������������
� �������������������������������������
���������������
Task:- Start with a collection of documents.- Extract a small set of ‘relevant’ sentences.
Optimization Problem:- Define properties that the selected set
should have.- Find the best set!
� ����������������������������������
����� ������������������ ��������������
��������� ���������������
������ ������ ����������
����������� ���������
�������������������������
����� �� ����������
����������������������
�������
��������������������������
������������������ ��
���� ����� ����������� �
������������� ���
����������
Goal:- Representing the content of the documents.- Extract sentences and generate text that maximizes the information span.
Semantic Graph:- Nodes are entities.- Edges are relations.- Encodes all the information in the source.
!�������"�������#��������������"�������#���������������������������������� ���������������
27
28
Poster #10
Fact Extraction via combined SRL and RE Teresa Martin
We are combining two different annotation schema for sentences in order to obtain a richer annota-tion. The one is Semantic Role Labeling (SRL) which answers the question of “Who does What to Whom, When and Where?” for a given sentence; this is annotating a sentence with the right frame for the predicate and the corresponding roles for the arguments. The other is Relation Extraction (RE) for the construction of Knowledge Bases (KB) where KBs store facts about entities and RE is the process of finding facts in form of entities connected via relations in raw text. Combining these two schema brings the advantages of both, the semantic annotation as well as the concrete fact-focused annotation. We can see applications for Journalism when facts or information about specific topics of interest need to be extracted automatically.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
�������������� ������������������������������������������������������������ ����������������������������� ������ ����� ��������� ���� �����
�����������������������������
���������������������� ��������
!������"�������� �������� ���������������� ������� �� �������������� �� �� �������!��������������" �#�� ���� �������� �������� ����
"������#���������� �����������������������������������!��� �� ��������" �#��!����� � �������������������� ����������� �������� ������ ��� ��������������� ��������� ���������������� ����������������
"������#����� ���� ��� ���������� ���������� ����������������������� � ��" �#���������� ��������������� ���������������������$��� �� ������������������� ��������� �!�������� ������� ���
%��� ��!��& ����'%&�(������� �����������������������)� �����*��� ������')*(��������)*�*� �� ������� ����� ������ ��#��!�������������������������� ��������������������� ������������������������������� ����������
+�������������������� �����������������������
������������������
�����
�����
�$$$�
�����
%��&���'� ,� -� � -�
%��&���'� -� -� � ,�
$� ���.������� )� ��.�������
/�� ����)� ��. �� ��!�'/).(����������������������$� ����������� �����'$� ���+(�������)� ��. �� ��!������/).�*� �� ��������������� !���� ���� �����������������������������
0�����������1���������23��������� ���������45�
����
/).������������� ���� ����� ��� ���#������ �!������� %&�)*���
������������������ ���#�������� ������
�������������(�)���������������������
��������������������*�����+�����,��-���*��.�������������
�#������������������/���0������������
�������� �! "�#��! $������! %�
%�&��'������'� ������$���������()*+����%�
%�$������������&��'��������()*+�%�
�",�-� �! .�������! /������! %�
��������������0�����1�������������/�� ��"�#���
����6��!������
)���������
29
30
Poster #11
Quantitative assessment of text quality Christopher Tauchmann
Manually evaluating the quality of aggregated data is a laborious and time consuming task. Besides assessing source documents, evaluation and monitoring of written articles e.g. during journalist education or in the scope of traineeships can substantially benefit from machines supporting experts in their work. Therefore, the work of journalists can be supported with the help of measures from different disciplines such as linguistics, information theory and automatic summary evaluation that we offer within the scope of our research.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
����������� ����� ������������
�������������������
<=�>?���>
���������������
����� ����������������� ����������
���
@����>�?����=H�?I��J�?���K�=���?>L
M>�?�����J���H?����J����>�=��K�=���?>��N���K�����I��>>HIL
QNH=H?�����J�?�R?��=H�?I�J���
�=��H�>?��K=H?����H�K�?�H����>���U
� <=H�?�?H?�N��H�HI>�>�>=����?>��R���?���N����H�K�>HN�>�?����?�
� V>>�>>�?����=H�?I��J�>�=��>
� W����?����=H�?I��J�����>�K�
?�R?
X�?�NH?���
����������������������������� �� �����������
�� ����������������
M�>���?����?�R?��=H�?I���?��
�=H�?�?H?�N����H>=��>
����������������� ��������� ��� �������������������
!���������� ����������������������"�"�A!������
�������������������������������������������������
#�������$������ ��� �������� ���� �����������
#������ ��
%��������� &������������
"""
'� ����������������������
�����������������
'� ���� ������
VKH�?H?���
�
(�����
A�����A��
)�����
31
32
Poster #12
Extracting Event Structures from Text Todor Mihaylov
Large number of events are happening every day. They are usually discussed in news articles in dif-ferent media websites or social media. Extracting information about events and arguments (“Who did what to whom”) is desired in order to keep track of them and disambiguate the articles about different events. In our work we develop system for automatic event extraction and classify event-event relations (temporal, causal, subevent and reporting relations). This allows us to extract rich event descriptions and schemas that can later be used for visualization or summarization.
Notizen
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
.........................................................................................................................................................
����� �!"#��$%"�&��' �'�%(�)�*+�,%��,*-*��.!/�01*$
��������� � 2*+2%-�� � �� �������� �� ������� ��� ���� 3���� 3!11%-���� ����������� !"4'�%-���� ��������� !��� �" �� ��#���� $�������� -!%-��%� �� �� ��� � ����$���� ��� ���� 3��&� ��� ���� 51�""%-��'� "�� & (���� ����� (��� ��$��$������ -!%��)� �� ������ ���� 3(��*� �� +��,�
6%�% ���$%"���"-�,/%!��7���! !5�"�(
�$%"��,05%��"- 8%�1!(�6%�% �!*"
�� �-��(���-� .��"��/���/01-�/���
����-��/0-� .��"��/���/01-�/���
����-0�����-� 2�"��3��1-�/���
����-��#���� � 2�"��+�#���1-�/���
��)�-���-� 2�"��3��1-4�����/
��*�-��/0-� .��"��/���/01-4�����/
�$%"�(�9*�%)%�%" % :"-�8%1��!*"�6%�% �!*"
!"4'�%-����
3!11%-����
-!%-��%�
2*+2%-�� �
���� 3����
���� 3��&�51�""%-��'�
5�!67�
5�!67�5�!67�.�89�
5�!67�.�89�
5�!67�.�89�
�!��7
;�����!$%�& /%+��& �!5������� �!*"
51�"
���� 3
3!11!"4'�%
-!%
1**3�)*�
������
$��$��
$���/�
4'-#%
4�!1
�5�'�%
/���
(�������:/�����;�������������
.%�/*-(
� 3��$-2���;9���-���(-��(���-<����0
� 2����/-��������-���-"�����-����-��-����-�(�������
� .�������
:551! ��!*"(�!"�<*'�"�1!(+
� �=��/-��"��(����-����-����1-��������-$���/�$��-���-�������-������-����-"��(-������-���/�
� ��(�����-���-�$���-�((���>����-��-�-�����-�$�/
� �=��/-/��$-"��-�����$�/��-����
=%�%�*#%"%*'(�(*'� %(
� <��-���/��� .�((����-"���(� 9�/���-(����?-�����1-7���� �/�
� -@-��-@-�&
33
34
Anfahrt // Directions
…mit dem Auto: Nach Darmstadt kommen Sie über die Autobahnen A5 (von Frank-furt/M. bzw. Heidelberg/Basel) und A67 (von Köln/Wiesbaden bzw. Mannheim). Verlassen Sie die Autobahn an der Ausfahrt Darmstadt Stadtmitte und folgen Sie der Rheinstraße geradeaus in den Cityring-Tunnel. Nach der Rechtskurve im Tunnel fahren Sie am Ende des Tunnels links in die Hügelstraße.
Anschließend fahren Sie an der nächsten gro-ßen Ampelanlage wieder links in die Kirch-straße. An der nächsten größeren Kreuzung geradeaus, das Darmstädter Schloss liegt dann zu Ihrer Linken. An der nächsten Ampel rechts in die Alexanderstraße einbiegen, die im weiteren Verlauf zur Dieburger Straße wird. Das Georg Christoph Lichtenberg-Haus liegt in der Dieburger Straße in Fahrtrichtung rechts, Hausnummer 241. Parkmöglichkeiten sind vorhanden.
…mit der Bahn: Darmstadt wird von ICE-, IC- und EC-Zügen auf vielen Nord-Süd-Verbindungen angefahren. Im Nahverkehr ist Darmstadt an die Bahnlinien zwischen Frank-furt und Heidelberg bzw. Mannheim, Wiesba-den/Mainz und Aschaffenburg sowie Darm-stadt und Erbach bzw. Eberbach (Odenwald) angeschlossen.
Die Bus-Linie F (Richtung „Oberwaldhaus“) fährt das Georg Christoph Lichtenberg-Haus direkt an. Die Die Haltestelle „Fasanerie“ ist direkt vor dem Haus. Aus der Innenstadt kön-nen Sie die Busse ab Luisenplatz, Schloss und Alexanderstraße/TU leicht erreichen. Am Hauptbahnhof benutzen Sie bitte den West-ausgang (Europaplatz). Dort finden Sie den F-Bus an Haltestellenplatz 22.
Adresse: Dieburger Straße 241, 64287 Darmstadt
35
Tagungsort // Venue
Georg-Christoph-Lichtenberg-Haus
Das Georg-Christoph-Lichtenberg-Haus ist das Gästehaus der TU Darmstadt für internationa-le Gastwissenschaftlerinnen und Gastwissen-schaftler, Doktorandinnen und Doktoranden, Postdocs sowie Forschungsstipendiatinnen und -stipendiaten.
1898 erbaut und 1910 im Jugendstil restau-riert, ist es ein Beispiel jenes architektoni-schen Stils, der zum Markenzeichen Darm-stadts wurde und der Stadt zu weltweiter Be-kanntheit verholfen hat.
The Georg Christoph Lichtenberg-Haus is a TU Darmstadt guest house for international guest professors, doctoral candidates, postdocs and visiting research fellows.
Built in 1898 and restored in Art Nouveau style in 1910, the house is an example of this unique architectural style that has become a landmark in Darmstadt, which has helped the city gain international prominence.
Foto: Katrin Binner
36
Kontakt // Contact
Graduiertenkolleg AIPHES
Iryna Gurevych (Sprecherin) Telefon: (06151) 16–25290 gurevych (ät) ukp.informatik.tu-darmstadt.de
Christian M. Meyer (Wissenstransfer) Telefon: (06151) 16–25293 meyer (ät) ukp.informatik.tu-darmstadt.de
Hochschulstraße 10 64289 Darmstadt http://www.aiphes.tu-darmstadt.de
Veranstaltungsorganisation
Forum interdisziplinäre Forschung (FiF) Telefon: (06151) 16–22130 fif (ät) fif.tu-darmstadt.de http://www.fif.tu-darmstadt.de
Pressestellen
Technische Universität Darmstadt Jörg Feuck Telefon: (06151) 16–20017 presse (ät) tu-darmstadt.de
Universität Heidelberg Ute Müller-Detert Telefon: (06221) 54-19017 kum (ät) uni-heidelberg.de
HITS gGmbH, Heidelberg Peter Saueressig Telefon: (06221) 533 245 peter.saueressig (ät) h-its.org