Joint Multi-Label Attention Networks for Social Text ...[3] ZichaoYang, DiyiYang, Chris Dyer,...

Joint Multi-Label Attention Networks for Social Text Annotation

JMAN (Joint Multi-Label Attention Network)

Hang Dong1,2, Wei Wang2, Kaizhu Huang3, Frans Coenen1, 1. University of Liverpool; 2. Xi'an Jiaotong-Liverpool University

HangDong@liverpool.ac.uk; Wei.Wang03@xjtlu.edu.cn; Kaizhu.Huang@xjtlu.edu.cn; Coenen@liverpool.ac.uk

• Social annotation, or tagging, is a popularfunctionality allowing users to assign “keywords” toonline resources for better semantic search andrecommendation. In practice, however, only a limitednumber of resources is annotated with tags.

• We propose a novel deep learning architecture forautomated social text annotation with cleaneduser-generated tags.

IntroductionThe automated social text annotation task can be formally transformed into a multi-label classification problem.

Results https://github. com/acadTags/Automated-Social-Annotation

Title-Guided Attention MechanismsSemantic-based Loss Regularizers

Conclusions & Future Studies

Word-level attention mechanisms (for the title) [3-4]:

Title-guided sentence-level attention mechanisms:

The final document representation is the concatenationof the title and the content representation.

The whole joint loss to optimize:

References[1] Mike Schuster and Kuldip K Paliwal. 1997. Bidirectional recurrent neural networks. IEEE Transactions on Signal Processing, 45(11):2673–2681.[2] Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning phrase representations using rnn encoder–decoder for statistical machine translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1724–1734.[3] Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1480–1489.[4] Hebatallah A. Mohamed Hassan, Giuseppe Sansonetti, Fabio Gasparetti, and Alessandro Micarelli. 2018. Semantic-based tag recommendation in scientific bookmarking systems. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys ’18, pages 465–469, New York, NY, USA. ACM.

Similarly we can obtain �� (sentence representation)and �� (content representation based on the originalsentence-level attention mechanism in [3-4]).

Research Questions• How to model the impact of the title on social

annotation? (see Title-Guided Attention Mechanisms)

• How to leverage both similarity and subsumptionrelations among labels in neural networks to furtherimprove the performance of multi-label classification?(see Semantic-Based Loss Regularizers)

�� constrains similar labels to have similar outputs.

�� enforces each co-occurring subsumption pair to satisfy thedependency of the parent label on the child label.

�� ∈ (0,1) � ∗|�| is a pre-computed label similarity matrix based onembeddings pre-trained from the label sets.

�� ∈ {0,1} � ∗|�| can be obtained by grounding labels to knowledge bases(e.g. Microsoft Concept Graph, for the Bibsonomy dataset) or from crowd-sourced relations (for the Zhihu dataset).

�() is rounding function, � �� = 1 when �_�� ≥ 0.5, otherwise � �� = 0.

Our code and datasets are available at https://github. com/acadTags/Automated-Social-Annotation

• Experiments show the effectiveness of JMAN with superior performance and training speed over the state-of-the-art models, HAN and Bi-GRU.

• It is worth to explore other types of guided attention mechanisms and to adapt the regularizers to pre-trained transferable models like BERT.

Users tend to annotate documents collectively with tags of varioussemantic forms and granularities.

ℎ� and ℎ� denote the hidden state of word andsentence, respectively; The ��, ��, ��, �� are weightsto be learned in training.

��, �� and �� are global context vectors, i.e. “whatis the informative word [or sentence]” to be learned.

|�|, document size; � , label size; |�|, vocabulary size; ��, averagenumber of labels per document; ∑ ��, number of subsumption relations.

Attention visualization of a document in Bibsonomy with the JMANmodel: purple blocks show word-level attention weights; red blocksin “ori” (original) and “tg” (title-guided) show sentence-level attentionweights. Predicted labels and ground truth labels are also presented.

Baselines (tested with models from 10-fold cross-validation):Bi-GRU: Bidirectional Gated Recurrent Unit [1-2].HAN: Hierarchical Attention Network [3-4].JMAN-s: without semantic-based loss regularisers.JMAN-s-tg: without semantic-based loss regularisers. and the title-guided sentence-level attention mechanism.JMAN-s-att: without semantic-based loss regularisers and the original sentence-level attention mechanism.

Joint Multi-Label Attention Networks for Social Text ...[3] ZichaoYang, DiyiYang, Chris Dyer,...

Documents

Transcript of Joint Multi-Label Attention Networks for Social Text ...[3] ZichaoYang, DiyiYang, Chris Dyer,...

Summarization Evaluation Using Transformed Basic Elements Stephen Tratz and Eduard Hovy Information Sciences Institute University of Southern California.

RING panel discussion, Coling 2010 ( E. Hovy + M. Zock)

Chapter 4 Attention Attention Defining Attention

Kernel Methods - Alexander J. Smola

Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus ...11-14...Yuyang Wang, Alex Smola, Danielle C. Maddix, Jan Gasthaus, Dean Foster, Tim Januschowski. Time Series Prediction

One More Time: Hovy Do You Motivate Employees?

Data disaggregated by migratory status By Bela Hovy · Data disaggregated by migratory status By Bela Hovy . Expert Group Meeting on Data Disaggregation Session 8: Data disaggregated

Mu)Li,)Dave)Andersen,)Junwoo)Park,)Alex)Smola,)Amr)Ahmed…alex.smola.org/data/poster.pdf · 2014-09-03 · Mu)Li,)Dave)Andersen,)Junwoo)Park,)Alex)Smola,)Amr)Ahmed,)VanjaJosifovski,)James)Long,)Eugene)Shekita,)BorSYiing)Su

Bela Hovy - Presentation at PGD Expert Meeting - OECD

Vlasta Smola, Fabulous Feathers

My Visual Resume - Filip Smola

The Role of Machine Learning in NLP Eduard Hovy USC Information Sciences Institute hovy Confessions of an Addict: Machine Learning as the.

Hovy, Lin, Marcu USC INFORMATION SCIENCES INSTITUTE SIGIR'99 Tutorial Automated Text Summarization, August 15, 1999, Berkeley, CA1 Tutorial: Automated.

International migration and development: Global policies and implications for data collection Bela Hovy, Chief Migration Section Population Division/DESA.

Attention, Attention, Attention!!!

Graphical Models for the Internet Amr Ahmed and Alexander Smola Yahoo Research, Santa Clara, CA.

Geraldine Smola - Fulkerson - SGeraldine Smola Place and Date of Birth Rural Rosetown, Saskatchewan, Canada ~ May 20, 1914 Place and Date of Death Stanley, North Dakota ~ November

Alex Smola, Professor in the Machine Learning Department, Carnegie Mellon University at MLconf SF - 11/13/15

Answering on Chess Games Chess Q&A : Question Volkan Cirik Louis-Philippe Morency Eduard Hovy Chess Q&A : Question Answering on Chess Games Reasoning, Attention, Memory NIPS Workshop

Toward a ‘Science’ of Annotation - University of North ...fbaum/papers/August_2007_Conference/Hovy_slides_Aug...Toward a ‘Science’ of Annotation Eduard Hovy ... –Integrate