Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the...

27
Rep the Set: Neural Networks for Learning Set representations K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), ´ Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ ´ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962 April 26, 2019 1 / 27 K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Rep the Set: Neural Networks for Learning Set representations

Transcript of Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the...

Page 1: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Rep the Set: Neural Networks for Learning Setrepresentations

K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis

Data Science and Mining group (DaSciM),Laboratoire d’Informatique (LIX), Ecole Polytechnique, France

http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, FrancePreprint available at: https://arxiv.org/abs/1904.01962

April 26, 2019

1 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 2: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

2 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 3: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

Research Topics

Machine Learning and AI

AI and Data Science methods (degeneracy, similarity, deep learning,multi-label classification)

Applications to: Text Mining/NLP, Social nets, Web marketing/advertising,Time Series

J. Read, M. Vazirgiannis

Operations Research and Mathematical programming

Optimization for Energy apps

Distance Geometry, protein conformation

C. d’Ambrosio, L. Liberti

3 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 4: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

Graph of Words: graph based text/NLP

4 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 5: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

Graph of Words: graph based text/NLP

5 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 6: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

Machine/Deep Learning methods for Graphs

6 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 7: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

Machine/Deep Learning methods for Graphs

7 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 8: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Data Science and Mining @ Ecole Polytechnique, France

Industrial Collaborations and Projects

8 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 9: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Machine Learing on Sets

Typical ML algorithms (i.e. regression or classification) designed for fixeddimensionality objects.

s imalrity learning between sets should be invariant to permutation:challenging task

supervised tasks: set output label invariant or equivariant to thepermutationi its elements.

population statistics estimation, giga-scale cosmology, nano-scale quantumchemistry.

unsupervised tasks, “set” representation needs to be learned.

set expansion - assume a set of similar objects - find similar to the setextensions, i.e. extend the set {lion, tiger, leopard} with cheetah

web marketing extend a set high-value customers with similar people.

astrophysics: assuming set of interesting celestial objects, find similar ones insky surveys.

9 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 10: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Background and State of the art

NNs for sets became very popular isnpired by computer vision problems suchas the automated classification of point clouds. Proposed architectures haveachieved state-of-the-art results on many different tasks.

Base approaches: PointNet [Qi et.al., CVPR2017] and DeepSets [Zaheeret.al., NIPS2017]

transform sets’ elements vectors using several NN layers into newrepresentations

apply some permutation-invariant function to the emerging vectors togenerate representations for the sets.

Pointnet: max pooling, DeepSets: vector sum

representation of the set is then passed on to a standard architecture (e.g.,fully connected layers,nonlinearities, etc).

Other efforts: PointNet++, SO-Net

10 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 11: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Motivation and Contribution

Data objects decomposed into sets of simpler objects: natural to representeach object as the set of its components or parts.

Conventional ML algorithms operate on vectors / sequences. Thus unable toprocess sets as

sets may vary in cardinality

set elements lack a meaningful ordering

Challenge: Sets as input to Neural Network Architectures

Contribution: RepSet: a new neural network architecture, handlingexamples as sets of vectors.

computes the correspondences between an input set and some hidden sets bysolving a series of network flow problems.

resulting representation fed to a NN architecture to produce the output.

allows end-to-end gradient-based learning.

Experimental evaluation: favorable on classification (text, graph) tasksoutperforming satet of the art

11 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 12: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset architecture - Permutation Invariance

Assume an example X represented as a set X = {v1, v2, . . . , vn} ofd-dimensional vectors, vi ∈ Rd . (i.e the embeddings of X ′s elements)Objective: design architecture whose output is invariant for all n!permutations of X elements => permutation invariant function.

propose a novel permutation invariant layercontains m “hidden sets” Y1,Y2, . . . ,Ym of d-dimensional vectors (samedim as X elements)based on bipartite graph matchingits components are trainable,elements of a hidden set Yi correspond to the columns of a trainable matrixW(i).each column of matrix W(i) is a vector u ∈ Yi .

12 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 13: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset architecture - Similarity via graph matching

to measure the similarity between X and each one of the hidden sets Yi :comparing their components.

capitalize on network flow algorithms - specifically bipartite matching:compute optimal mapping between the elements of X and the elements ofeach hidden set Yi .

Each edge e connects a vertex in X to one in Yi .

Matching M: subset of edges - each node in X connects to one in Yi .

optimal solution is interpreted as similarity between node sets X and Yi .

The bipartite graph is a matrix |X | × |Y |, cell values from {0,1}13 / 27

K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 14: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset architecture - bipartite matching Optimization

Assume a set of vectors, X = {v1, v2, . . . , v|X |} and a hidden setY = {u1,u2, . . . ,u|Y |}, the bipartite matching between the elements of thetwo sets is solving the optimization problem:

|X |∑i=1

|Y |∑j=1

xij f (vi ,uj)

subject to:

|X |∑i=1

xij ≤ 1 ∀j ∈ {1, . . . , |Y |}

|Y |∑j=1

xij ≤ 1 ∀i ∈ {1, . . . , |X |}

xij ≥ 0 ∀i ∈ {1, . . . , |X |},∀j ∈ {1, . . . , |Y |}

(1)

f (vi ,uj) differentiable function, and xij = 1 if component i of X assigned tocomponent j of Yi , 0 otherwise.we defined f (vi ,uj) = ReLU(v>i uj).

14 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 15: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset architecture - Learning and output

Given input set X and the m hidden sets Y1,Y2, . . . ,Ym, formulate mbipartite matching problems,

solving we end up with an m-dimensional vector vX : hidden representationof set X .

This m-dimensional vector can be used as features for different machinelearning tasks such as set regression or set classification. For instance, in thecase of a set classification problem with |C| classes, the output is computedas follows:

pX = softmax(W(c) vX + b(c)) (2)

where W(c) ∈ Rm×|C| is a matrix of trainable parameters and b(c) ∈ R|C| isthe bias term. We use the negative log likelihood of the correct labels astraining loss:

L = −∑X

log pXi (3)

where i is the class label of set X . Note that we can create a deeperarchitecture by adding more fully-connected layers.

15 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 16: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset architecture - Learning and output

Given input set X and the m hidden sets Y1,Y2, . . . ,Ym, formulate mbipartite matching problems,end up with an m-dimensional vector vX : hidden representation of set X .Can be used as features for different machine learning tasks such as setregression or set classification. For set classification with |C| classes, theoutput is computed as:

pX = softmax(W(c) vX + b(c)) (4)

We use the negative log likelihood of thecorrect labels as training loss:

L = −∑X

log pXi (5)

where i is the class label of set X .* The architecture supports permutationinvariance (proof in the paper)

16 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 17: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset architecture - Tackling the complexity of the bipartite matching

major weakness the computational complexity: maximum cardinalitymatching in a weighted bipartite graph with n vertices and m edges takestime O(mn + n2 log n), with the classical Hungarian algorithm.Prohibitive for very large datasets.ApproxRepSet: approximation of bipartite matching problem involvingoperations that can be performed on a GPUAssuming an input set of vectors, X = {v1, v2, . . . , v|X |} and a hidden setY = {u1,u2, . . . ,u|Y |}. Assume |X | ≥ |Y |, optimization becomes:

max

|X |∑i=1

|Y |∑j=1

xij f (vi ,uj)

subject to:

|X |∑i=1

xij ≤ 1 ∀j ∈ {1, . . . , |Y |}

xij ≥ 0 ∀i ∈ {1, . . . , |X |},∀j ∈ {1, . . . , |Y |}

(6)

relaxed formulation of the problem - constraint has been removed.17 / 27

K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 18: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Synthetic Data

18 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 19: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Text Categorization

19 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 20: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Text Categorization

Classification test error of the proposed architecture and the baselines on the 8text categorization datasets.

20 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 21: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Text Set Extension

Terms of the employed pre-trained model that are most similar to the elementsand centroids of elements of5 hidden sets

21 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 22: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Graph classification

22 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 23: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Graph Classification

Classification accuracy (± standard deviation) of proposed architecture(s) and thebaselines. For MU-TAG, PROTEINS ( bioinformatics datasets ) the nodeembeddings that we generated do not incorporateinformation about them.

23 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 24: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Runtimes

Runtimes with respect to the number of hidden setsm, the size of the hiddensets—Yi—(left)and embeddings with different dimensions (right).

24 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 25: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Experimental Evaluation - Runtimes

Runtimes with respect to the number of input setsN(left) and the size of theinput sets—Xi—(right).

25 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 26: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

Repset - Conclusion

Machine learning with sets is increasingly important

Sets may vary in cardinality and their elements lack a meaningful ordering:

standard machine learning algorithms fail to learn high-quality representations.

We proposed RepSet, a neural network approach for learning set representations.

exhibits powerful permutation invarianceproperties.

computes mappings between input sets andsome hidden sets by solving a graphmatching/network flow problems.

Since matching/network flow algorithms aredifferentiable, we can use standardbackpropagation for learning the parameters ofthe hidden sets.

for large sets we introduced a relaxedversion(ApproxRepSet) - fast matrix operations andscales to very large datasets.

Repsets performs favorably on text/ graph

classification.

Future Work

: apply Repset on Group

Recommentation (i.e.

gaming)

26 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations

Page 27: Rep the Set: Neural Networks for Learning Set representationsRepset architecture - Tackling the complexity of the bipartite matching major weakness the computational complexity: maximum

THANK YOU !

AcknowledgementsDr. I. Nikolentzos, Dr. K. Skianis, Dr. P. Meladianos

http://www.lix.polytechnique.fr/dascim/

Software and data sets:http://www.lix.polytechnique.fr/dascim/software datasets/

Repset preprint available at: https://arxiv.org/abs/1904.01962

27 / 27K. Skianis, G. Nikolentzos, S. Limnios, M. Vazirgiannis Data Science and Mining group (DaSciM), Laboratoire d’Informatique (LIX), Ecole Polytechnique, France http://www.lix.polytechnique.fr/dascim/ Ecole Polytechnique, France Preprint available at: https://arxiv.org/abs/1904.01962Rep the Set: Neural Networks for Learning Set representations