Paula Chocron arXiv:1703.02367v1 [cs.MA] 7 Mar 2017 · Vocabulary Alignment in Openly Speci ed...

Vocabulary Alignment in Openly Specified Interactions

Paula Chocron1,2 and Marco Schorlemmer1

1Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Catalonia, Spain2Universitat Autonoma de Barcelona, Bellaterra, Catalonia, Spain

Abstract

The problem of achieving common understanding be-tween agents that use different vocabularies has beenmainly addressed by designing techniques that explic-itly negotiate mappings between their vocabularies,requiring agents to share a meta-language. In this pa-per we consider the case of agents that use differentvocabularies and have no meta-language in common,but share the knowledge of how to perform a task,given by the specification of an interaction protocol.For this situation, we present a framework that letsagents learn a vocabulary alignment from the expe-rience of interacting. Unlike previous work in thisdirection, we use open protocols that constrain pos-sible actions instead of defining procedures, makingour approach more general. We present two tech-niques that can be used either to learn an alignmentfrom scratch or to repair an existent one, and weevaluate experimentally their performance.

1 Introduction

Addressing the problem of vocabulary heterogeneityis necessary for the common understanding of agentsthat use different languages, and therefore crucial forthe success of multi-agent systems that act jointly bycommunicating. This problem has been tackled sev-eral times in the past two decades, in general fromtwo different perspectives. Some approaches [20, 8]consider the existence of external contextual elementsthat all agents perceive in common, and explore howthose can be used to explain the meaning of words.

0 1 3

2

5

6

7W: to drink? C: wine

C: beer

W: color?

C: red

C: white

0 1 3

2

5

6

7W: da bere? C: vino

C: birra

W: tipo?

C: rosso

C: bianco

Figure 1: Protocols for Waiter (W) and Customer(C)

A second group of techniques [15, 18] (and also [8],where both approaches are combined) consider thesituation, reasonable in agents that communicate re-motely, in which this kind of context is not available.They do so by providing explicit ways of learning oragreeing on a common vocabulary (or alignment be-tween vocabularies). These techniques require agentsto share a common meta-language that they can useto discuss about the meaning of words and their align-ments. The complex question of how to communi-cate with heterogeneous interlocutors when neithera physical context nor a meta-language are availableremains practically unexplored.

The work by Atencia and Schorlemmer [1] ap-proaches this situation by considering a different kindof context, given by the interactions in which agentsengage. Agents are assumed to share the knowl-edge of how to perform a task, or, more concretely,the specification of an interaction, given by a finitestate automaton. For example, they consider the in-

1

arX

iv:1

703.

0236

7v1

[cs

.MA

] 7

Mar

201

7

teraction in Figure 1, in which an English speakingcustomer and an Italian waiter communicate to or-der drinks. The authors show how agents can pro-gressively learn which mappings lead to successfulinteractions from the experience of performing thetask. After several interactions, agents converge toan alignment that they can use to always succeed atordering and delivering drinks with that particularinterlocutor. The interesting aspect of this idea isthat the only shared element, namely the interactionspecification, is already necessary to communicate.However, using finite state automata as specificationsimplies that agents need to agree on the exact order inwhich messages should be sent, which is unnecessar-ily restrictive. In addition, agents learn an alignmentthat is only useful for one task, and how it can beextrapolated to further interactions is not explored.

In this paper we present a general version of theinteraction as context approach to vocabulary align-ment in multi-agent systems. Instead of automata,we consider agents that specify interactions withconstrained-based open protocols that define rulesabout what can be said instead of forcing one par-ticular execution. In particular, we use ConDec pro-tocols [14] that use linear temporal logic constraints.One ConDec protocol may not define completely themeaning of all words in a vocabulary, so we studyhow agents can learn mappings from performing dif-ferent tasks. For this reason, the learning process issubstantially different to the one in [1], where a map-ping that works in one task is considered correct. Ifenough experiences are available, agents that use thetechniques we present converge to an alignment thatis useful in the general case, even for future interac-tions that they do not yet know. Since learning isdone gradually as different interaction opportunitiesappear, agents can use what they learned early, evenif they still do not know the alignment completely.

After presenting open interaction protocols, we de-fine a framework for interacting with partners thatuse different vocabularies. We later present two dif-ferent techniques to learn an alignment from theexperience of interacting. The first relies only onanalysing if a message is allowed or not in a particularmoment, and it can be used in any constraint-basedprotocol. The second technique uses the semantics

of the protocols we present to improve the perfor-mance of the first one. Both methods can be usedto learn alignments from scratch when there is no in-formation, as well as to repair alignments obtainedwith other methods. We evaluate experimentally thetechniques when used for both purposes, and showhow different factors affect their performance.

2 Open Interaction Protocols

The question of what should be expressed by an in-teraction protocol has been extensively discussed inthe multi-agent systems community. Traditional ap-proaches such as finite state automata and Petri Netsare simple to design and read, but also too rigid.More flexible alternatives constrain possible actionsinstead of defining a fixed procedure. ConstraintDeclarative Protocols (commonly known as ConDecprotocols) are an example of these approaches. Con-Dec protocols were first proposed by Pesic and vander Aalst [14] as a language to describe business pro-tocols, and then used as specifications for agent inter-actions by Montali [13], who presented an extensionof ConDec and tools for its verification, and by Bal-doni et al. [2, 3], who integrated ConDec constrainsand commitment protocols, a framework to specifyinteractions with social semantics first proposed bySingh [19]. An important advantage of ConDec pro-tocols is that they use linear temporal logic, a well-known logic for which many tools are available.

Linear temporal logic (LTL from now on) is a nat-ural choice to express constraints about actions thatoccur in time. The syntax of LTL includes the one ofpropositional logic, and the additional temporal op-erators {�,♦, d, U}, that can be applied to any LTLformula. LTL formulae are interpreted over paths inKripke structures, which are sequences of states asso-ciated to a truth-valuation of propositional variables.Temporal operators are interpreted in these paths asfollows: �p means that p must be true in the truth-valuation of all states, ♦p means that p must be trueeventually, dp means that p must be true in the nextstate, and pU q means that p must be true until q isvalid. A set of LTL formulae, called a theory, is satis-fiable if there exists a path for which all the formulae

2

are true. In that case, the path is a model of the the-ory. The satisfiability problem in LTL is decidable,as well as the model checking problem, which consistsin deciding if a given path is a model of a theory.

A ConDec protocol is a tuple 〈M,C〉, where Mis a set of action names and C is a set of constraintsabout how actions can be performed. Constraints areLTL sentences renamed conveniently. We use a min-imal version1 of the constraints introduced originallyby Pesic and van der Aalst that we show in Table 1,where n ∈ N and a, b ∈ M . These constraints aregenerally divided into three classes. Existential con-straints (existence and absence) predicate over theamount of times some action can be performed, rela-tional constraints describe binary relations betweentwo actions, and negation constraints (preceded bya ‘!’ sign) are relations that do not hold. Given aset M of actions, Cons(M) is the set of all possibleconstraints over M . In a protocol 〈M,C〉, necessarilyC ⊆ Cons(M).

2.1 Open Protocols as InteractionProtocols

In the rest of this section we present the technicalnotions that are necessary to use ConDec protocolsas specifications of interactions between agents thatmay use different vocabularies. We first define in-teraction protocols, that constrain the way in whichagents can utter messages. We introduce the notionof bound to capture the knowledge of agents abouthow many steps they have to finish the interaction.If this knowledge is not available it can be omittedand replaced for the fact that the interaction mustfinish in finite steps.

Definition 1 Given a set A of agent IDs and apropositional vocabulary V , an interaction protocol isa tuple P = 〈M,C, b〉, where the set of actions M =A × V is a set of messages formed by the ID of thesender agent and the term it utters, C ⊆ Cons(M),and b ∈ N is the bound.

1 Since we are not interested in the usability of the protocolshere, we do not include some constraints that work as syntacticsugar, such as exactly(n, a), that can be replaced by includingexistence(n, a) and absence(n, a).

Constraint LTL meaning

existence(1, a) ♦a

existence(n+ 1, a) ♦(a ∧ dexistence(n, a))

absence(0, a) ¬existence(1, a)

absence(n, a) ¬existence(n+ 1, a)

correlation(a, b) ♦a =⇒ ♦b

!correlation(a, b) ♦a =⇒ ¬♦bresponse(a, b) �(a =⇒ ♦b)

!response(a, b) �(a =⇒ ¬♦b)before(a, b) ¬bUa!before(a, b) �(♦b =⇒ ¬a)

premise(a, b) �( db =⇒ a)

!premise(a, b) �( db =⇒ ¬a)

imm after(a, b) �(a =⇒ db)!imm after(a, b) �(a =⇒ d¬b)

Table 1: LTL definitions of constraints

The semantics of an interaction protocol is definedover interactions that represent a sequence of utteredmessages.

Definition 2 Given a set of messages M , an in-teraction i ∈ M∗ is a finite sequence of messagesm ∈ M . The length of an interaction (len(i)) andthe append operation (i .m) are defined in the sameway as for sequences.

An interaction i can be encoded into a Kripkestructure path by including one propositional vari-able for each message m ∈ M and considering thefollowing sequence of states with truth-valuations. Instates with index 0 ≤ j < len(i), let the proposi-tional variable for the j-th message in i be true, andall other propositional variables be false. For statesafter len(i), let all propositional variables be false.With this construction, we can define the semanticsof interaction protocols.

Definition 3 An interaction i is a model of an in-teraction protocol P = 〈M,C, b〉 (noted i |= P) if

3

len(i) ≤ bound, and the Kripke path that encodes iis a model of C. An interaction i′ is a partial model(noted i′ |=p P) of P if it is a prefix of a model of P.

Definition 3 implies that checking satisfiability ofan interaction protocol is equivalent to checking LTLsatisfiability.

As already mentioned, we are interested in agentsthat use different vocabularies, but share the knowl-edge of how to perform a task. In the rest of thissection we define more precisely what that means.From now on, V1 and V2 are two possibly differentvocabularies. Let us start by defining the notion ofvocabulary alignment.

Definition 4 An alignment is a function betweenvocabularies α : V2 → V1. Given a set of agentsA, α can be extended homomorphically to a functionbetween:

- messages M1 = A × V1 and M2 = A × V2 (α :M2 →M1)

- constraints (α : Cons(M2) → Cons(M1)) andsets of constraints (α : 2Cons(M2) → 2Cons(M1))

- interactions (α : M∗2 → M∗1 ) and sets of inter-actions (α : 2M

∗2 → 2M

∗1 )

To capture the idea of sharing the knowledge of howto perform a task, we define notion of compatibilitybetween protocols, which consists simply on havingthe same models modulo an alignment. Given a pro-tocol P = 〈M,C, b〉, we define the set of its modelsas Int(P) = {i ∈M∗ such that i |= P}.

Definition 5 Two interaction protocols P1 =〈M1, C1, b〉 and P2 = 〈M2, C2, b〉 with sets of mes-sages M1 = A×V1 and M2 = A×V2 are compatibleif there exists an alignment α : V2 → V1 such that

Int(P1) = α(Int(P2))

If we know the alignment α for which the conditionholds, we can say they are compatible under α.

In this paper, for simplicity, we will restrict to us-ing only bijective alignments over vocabularies of thesame size.

An Example Let us extend the example of Figure1 by considering again a waiter W and a customerC and the ordering drinks situation. The vocabularyof the customer is VC = {to drink, beer, wine, water,size, pint, half pint}, while the Waiter uses VW = {dabere, birra, vino, acqua, tipo, media, piccola}. Con-sider the bijective alignment α : VW → VC obtainedby mapping each word in VW with the word in thesame position in VC . We consider the following pro-tocols to specify the ordering drinks interactions:

PW = 〈{W,C} × VW ,{existence(〈W, da bere〉, 1, 1),

premise(〈C, birra〉, 〈W, da bere〉),premise(〈C, vino〉, 〈W, da bere〉),premise(〈C, acqua〉, 〈W, da bere〉),!correlation(〈C, birra〉, 〈W, vino〉),response(〈C, birra〉, 〈W, tipo〉),premise(〈C, piccola〉, 〈W, tipo〉),premise(〈C,media〉, tipo〉)}, 5〉

PC = 〈{W,C} × VC ,{existence(〈W, to drink〉, 1, 1),

premise(〈C,beer〉, 〈W, to drink〉),

premise(〈C,water〉, 〈W, to drink〉),premise(〈C,wine〉, 〈W, to drink),response(〈C, beer〉, 〈W, size〉),premise(〈C, halfpint〉, 〈W, size〉),premise(〈C, pint〉, 〈W, size〉)}, 5〉

The two protocols above are not compatible underany bijective alignment, since the customer protocolhas as model an interaction in which the customer or-ders wine and beer, while the waiter protocol has asmodels interactions in which the waiter only acceptsone beverage. If !correlation(beer,wine) is added toPC , the resulting protocols are compatible, in par-ticular under α.

3 Communicating with Hetero-geneous Partners

We focus on interactions between two agents a1 anda2 with vocabularies V1 and V2 respectively. Duringan interaction, an agent can send messages composed

4

of words in its vocabulary and receive others fromits interlocutor, or finish the communication if cer-tain conditions hold. We assume messages are neverlost and always arrive in order, and more strongly,that each message is received before the followingone is uttered. This requires that agents agree inwho speaks at each time, although we do not forcethem to follow any particular turn-taking pattern.

Agents interact to perform some task together,that each of them specifies with an interaction proto-col. We assume a1 and a2 agree on a set of tasks thatthey can perform and on how they are performed:their respective protocols for each task are compat-ible. Our agents use their vocabulary consistentlythroughout different tasks, i.e., the protocols for alltasks are compatible under the same alignment.

From now on, we will use sets of messages M1 ={a1, a2} × V1 and M2 = {a1, a2} × V2. We assumethere exists a bijective alignment α : V2 → V1 suchthat, for each task that a1 and a2 can perform, theyhave protocols P1 : 〈M1, C1, b〉 and P2 : 〈M2, C2, b〉respectively, and P1,P2 are compatible under α. Weassume that whenever agents interact, they use pro-tocols that correspond to the same task.

We present a general approach to learn the align-ment α from the experience of interacting to performdifferent tasks sequentially. The methods we presentare used by one agent alone, and do not require thatits interlocutor uses them as well. To explain thetechniques, we adopt the perspective of agent a1.

Briefly, we propose to learn α by taking into ac-count the coherence of messages. When a1 receivesa message, it learns from analysing which interpreta-tions are allowed by the protocol and which are not.This information, however, is not definitive. An al-lowed word can be an incorrect interpretation for areceived message, because protocols do not restrictcompletely all possible meanings. Moreover, a for-bidden message can still be a correct interpretation.Since some rules express relations between messages,a previously misinterpreted word (by the agent or byits interlocutor) can make a correct mapping be im-possible.

To handle this uncertainty, we propose a simpleprobabilistic learning technique. Agents maintain aninterpretation distribution over possible meanings for

foreign words, and update their values with the ex-perience of interacting. Formally, for each v2 ∈ V2that a1 knows about (because it received it in a mes-sage at some point) and for each v1 ∈ V1, agent a1has a weight ω(v2, v1) that represents its confidencein that α(v2) = v1. Using the bijectivity, we will keep∑

v∈V1ω(v2, v) = 1, but we do not require the same

in the other direction since the foreign vocabulary isunknown a priori.

The techniques we propose can incorporate exist-ing alignments, in the same way as it is done in [5] forthe case when protocols are automata. These align-ments represent a priori knowledge about the foreignvocabulary, that can have been obtained from previ-ous experience, from a matching tool, or with othertechniques such as a syntactic similarity measure.Since these techniques are never fully correct, pre-vious alignments should not be trusted completely,and the techniques we propose can work as methodsfor repairing them. A previous alignment for agenta1 is a partial function A : V ′2 × V1 → N, where V ′2 isa set of terms. Note that A is not necessarily definedover V2, since the vocabulary that a2 uses may be un-known a priori. However, the information it providescan be used even if V ′2 is a subset of V2 or they over-lap. We interpret the confidences always positively,meaning that a mapping with low confidence has stillmore confidence than one that does not exist. If a1has a previous alignment A, it initializes the inter-pretation distribution using that information. Theyfirst assign raw values:

ω(v2, v1) =

A(v2, v1) if (v2, v1) ∈ dom(A)

0 otherwise

Then they normalize the values using an exponen-tial method such as softmax. This is necessary tostart the technique with the values for all mappings inthe interval (0, 1), since the alignment can be wrong.If there is no previous alignment, they always initial-ize the interpretation values with an uniform distri-

bution: ω(v2, v1) =1

|V1|for all v1 ∈ V1.

We explain in detail the learning techniques thatwe propose in Section 4. In the rest of this section wefocus on the dynamics of the interaction: how agents

5

choose which messages to say, finish the interaction,and decide interpretations for the messages they re-ceive.

Choosing messages to send The choice of mes-sages to utter is internal of each agent and depends onits interests while interacting. We do not impose anyrestriction on this, besides respecting the constraintsin the protocol. Formally, a1 can utter a message v1only if 〈a1, v1〉 is a possible message as defined below.

Definition 6 Consider an agent a1 following proto-col P1 = 〈M1, C1, b〉 and suppose interaction i hashappened so far (and i |=p P1). The possible mes-sages at that point are all messages m ∈ M1 suchthat when they are performed the interaction remainsa partial model, that is i .m |=p P1.

Finishing the interaction An interaction can fin-ish in three situations. First, if there is a bound,reaching it implies the automatic end of the conver-sation. The interaction also finishes if an agent re-ceives a message that has no possible interpretation.An agent can also finish the conversation whenever itwants if it considers that the interaction is successful,i.e., if it is a model of the protocol when it ends. Inthis case, it simply stops talking, producing a time-out that let the other agent realize the interactionfinished.

Choosing interpretations When agent a1 re-ceives a message v2 ∈ V2 from a2, it needs to in-terpret it in V1 to be able to continue the interaction.To this aim, agents use the information in the in-terpretation distribution. Since agents assume thereexists a bijective alignment, they always choose thesame interpretation for a word during one interactionand they do not choose words that have been alreadychosen as interpretations.

Consider agent a1 receives word v2 after interactioni. Let µ : V2 → V1 be the mappings made function,where µ(v2) = v1 if and only if v1 was chosen as amapping for v2 before in the same interaction. Thedomain dom(µ) are the v2 ∈ V2 that a1 received sincethe current task started, and its image img(µ) are the

v1 ∈ V1 that were chosen as mappings. The set W ofpossible interpretations for v2 is the set of words v1in V1 such that 〈a2, v1〉 is a possible message after i,and such that v1 6∈ img(µ).

If rnd(S) is a function that chooses an elementrandomly in the set S, the agent will choose an in-terpretation as follows:

µ(v2) =

µ(v2) if v2 ∈ dom(µ) ∧ µ(v2) ∈W

rnd(argmaxv1∈W ω(v2, v1)) if v2 6∈ dom(µ)

If either v2 ∈ dom(µ) but µ(v2) 6∈ W , or W = ∅,the interaction is in one of the failure cases, and itfinishes because a1 stops talking.

4 Learning Alignments fromInteractions

To learn an alignment from the experience of inter-acting, agents make the following well behaviour as-sumptions about their interlocutor. Essentially, theseassumptions imply that the dynamics of the interac-tion described in the previous section are respected.

1. Compliance: An agent will not utter a messagethat violates the constraints, or that makes itimpossible to finish successfully the interactionin the steps determined by the bound (or in finitesteps if there is no bound).

2. Non Abandonment: An agent will not fin-ish intentionally the conversation unless the con-straints are fulfilled.

Agents learn when they decide how to interpret areceived word. The overall method is simple: if a1receives v2, it updates the value of ω(v2, v1) for allthe words v1 in a set U ⊆ V1. The set U , unlikethe set of possible interpretations W used to decidethe mapping, can have interpretations that are notpossible messages in that particular moment, and thevalues in the interpretation distribution are updatedaccording to that.

A first decision is how to choose the set U . Sincechecking satisfiability is computationally expensive,

6

this can impact considerably the overall performanceof the methods. A first option is to update thevalue of all possible words (U = V1), which can beslow for large vocabularies. Another option is touse only the options that are considered until an in-terpretation is chosen, or all those words that havemore value than the first possible interpretation. Inthis case, if v1 is chosen as an interpretation for v2,U = {v ∈ V1 such that ω(v2, v) > ω(v2, v1)}. A thirdpossibility is a midpoint between these two, in whichU = {v ∈ V1 such that ω(v2, v) ≥ ω(v2, v1)}. Thisoption updates also the words that have the samevalue as the chosen interpretation. If the distribu-tion is uniform when the learning starts, this updatesmany possible interpretations in the beginning andfewer when there is more information. This is theapproach we choose, and the one used for the evalu-ation.

We present two methods to update the values inthe interpretation distribution. The first one is a gen-eral technique that only takes into account whethermessages are possible. This method is not exclusivelydesigned for protocols that use LTL, and can be usedfor constraints specified in any logic for which agentshave a satisfiability procedure. In a second methodwe show how the particular semantics of the protocolswe introduced can be taken into account to improvethe learning process.

4.1 Simple Strategy

The first approach consists in updating the value ofinterpretations for a received message, according towhether it is coherent or not with the protocol andthe performed interaction. When a new message v2is received, the values for all interpretations are ini-tialized as described in the last section. We use asimple learning method in which updates are a pun-ishment or reward that is proportional to the valueto update. We chose this method because it incorpo-rates naturally previous alignments, it is very simpleto formulate, and the proportional part updated canbe manipulated easily to consider different situations,something that will be relevant for the second tech-nique. However, it is not necessarily the most efficientchoice, and other methods, such as one that records a

history, can converge faster. Consider a reward and apunishment parameter rr, rp ∈ [0, 1]. We use the no-tion of possible messages in Definition 6 and updatethe values for v1 ∈ U as follows:

ω(v2, v1) :=

ω(v2, v1) + rr · ω(v2, v1) if 〈a2, v1〉 is possible

ω(v2, v1)− rp · ω(v2, v1) if 〈a2, v1〉 otherwise

After all the updates for a given message aremade, the values are normalised in such a way that∑

v∈V1ω(v2, v) = 1. Either simple sum-based or soft-

max normalization can be used for this.

4.2 Reasoning Strategy

The simple strategy does not make use of some eas-ily available information regarding which constraintswere violated when a message is considered impos-sible. To illustrate the importance of this infor-mation, consider an interaction in which the Cus-tomer said water but the Waiter interpreted it asvino. If the Customer says beer, the Waiter maythink that interpreting it as birra is impossible,since ordering two alcoholic beverages is not allowed(!correlation(birra, vino) would be violated). How-ever, this is only due to misunderstanding water inthe first place. In the reasoning approach, agentsmake use of the semantics of violated rules to performa more fine-grained update of the values. Before pre-senting the method, we need to divide constraints intwo kinds, that will determine when agents can learnfrom them.

Definition 7 A constraint c is semantically non-monotonic if there exist interactions i and i′ such thati is a prefix of i′, and i |= c but i′ 6|= c. A constraint cis semantically monotonic if this cannot happen, andi |= c implies that also all its extensions are modelsof c.

The following proposition divides the constraintsin our protocols between monotonic and non-monotonic.

Proposition 1 The constraints existence,coexistence and response are semantically mono-tonic. All the rest of the constraints defined aresemantically non-monotonic.

7

Violated Constraint Action

- ω(v2, v1) := ω(v2, v1)− rp · ω(v2, v1)

absence(〈a2, v1〉, n) ω(v2, v1) := 0

!correlation(〈a2, v1〉, 〈a2, v′1〉)!response(〈a2, v1〉, 〈a2, v′1〉)!before(〈a2, v1〉, 〈a2, v′1〉)

!premise(〈a2, v1〉, 〈a2, v′1〉)!imm after(〈a2, v1〉, 〈a2, v′1〉)

ω(v2, v1) := ω(v2, v1)− rp · ω(µ(v′1), v′1)

ω(µ(v′1), v′1) := ω(µ(v′1), v′1)− rp · ω(v2, v1)

premise(〈a2, v1〉, 〈a2, v′1〉)imm after(〈a2, v′1〉, 〈a2, v1〉)

ω(v2, v1) := ω(v2, v1)− rp · ω(µ(v′1), v′1)

ω(µ(v′1), v′1) := ω(µ(v′1), v′1)− rp · ω(v2, v1)

before(v′, v) ω(v2, v1) := ω(v2, v1)− rp · ω(v2, v1)

relation(〈a1, v1〉, 〈a2, v′1〉) ω(v′1, v′1) := ω(v′1, v

′1)− rp · ω(v2, v1)

Table 2: Updates for each violated constraint when a message is received

Proof 1 To prove non-monotonicity, it is enough toshow an example that violates the constraint. Forexample, consider before(m,m′) and suppose an in-teraction I such that m 6∈ i and m′ 6∈ i. Then i |=before(m,m′) and i .m′ 6|= before(m,m′). Mono-tonicity can be proven by counterexample, showingthat for a constraint c, if i .m′ 6|= c then necessarilyi 6|= c. For example, if i .m violates existence(m′, n),there are two options, either m = m′ or m 6= m′.Let #(m, i) and #(m, i .m′) be the number of occur-rences of c in i and i .m′ respective. In both cases,#(m, i .m′) ≥ #(m, i), so if #(m, i .m′) ≤ n, thennecessarily #(m, i) ≤ n.

Non-monotonic constraints can be used to learnwhile interacting, using the principle of compliance(if a constraint is violated, something must be wrongin the interpretations). Monotonic constraints couldbe used when the interaction ends, using the prin-ciple of non-abandonment (if not all constraints aresatisfied, there must be an error). However, since ouragents cannot communicate in any way, they ignorewhy the interaction ended, and therefore they do notknow if their interlocutor considers the constraintswere satisfied or not, making it very difficult to de-cide how to update values. In this work we focus on

handling non-monotonic constraints, leaving mono-tonic ones for future work where an ending signal isintroduced. To start, let us define formally when anon-monotonic constraint is violated.

Definition 8 Given an interaction i and an inter-pretation v1 for a received message, a constraint c isviolated if i |= c but i . 〈a2, v1〉 6|= c.

If agent a1 decides that v1 is not a possible interpre-tation for v2, let V iol be all c ∈ C1 that are violatedby v1. It is possible that V iol = ∅. This can happenwhen there are no violated constraints because a sub-set of constraints becomes unsatisfiable, for example,if the waiter has not said size and a protocol includesthe rules {!response(〈C,wine〉, 〈W, size〉),existence(〈W, size〉, 1)}, the customer cannot saywine even when V iol would be empty. Another situ-ation in which a message can be impossible withoutany constraint being violated is when it makes the in-teraction impossible to finish successfully in the num-ber of utterances indicated by the bound, or in finitetime if there is no bound.

To explain the update in the reasoning strategy,consider again reward and a punishment parameterrr, rp ∈ [0, 1]. If an interpretation v1 is possible for a

8

received message v2, then ω(v2, v1) := ω(v2, v1) + rr ·ω(v2, v1). If it is not, it is updated as indicated in toTable 2, according to which rule was violated. Theseactions are performed iteratively for each broken rule.After the updates, all values are normalized to obtain∑

v∈V1ω(v2, v) = 1. In this case, we need to use

a method that maintains the values that are 0, sosoftmax is not a possibility.

Let us explain the motivation under each update.Violating the existential non-monotonic constraint isdifferent to violating a relation constraint, because itexpresses a condition over only one action, so if thechosen interpretation violates it, it must be wrong.For this reason, the value of the mapping is set to 0.

Negation constraints express a relation over twoactions, and if their are broken, necessarily both ofthem happened in the interaction, and if a constraintis violated, it could be because either of them wasmisinterpreted. To take this into account, we pro-pose an update inspired in the Jeffrey’s rule of con-ditioning [17], that modifies the Bayes Rule of condi-tional probability for the cases when evidence is un-certain. If Q is a probability distribution for a parti-tion E1 . . . En of possible evidence, then Jeffrey’s rulestates that the posterior probability of A is computedas:

Q(A) =

n∑i=0

P (A|Ei) ·Q(Ei)

Back to our problem, let v2 be a received messagesuch that µ(v2) = v1. Suppose a1 receives a newmessage v′2, and it discovers that mapping it withv′1 violates a constraint relation(v1, v

′1). This means

that P (α(v2) = v1 ∧ α(v′2) = v′1) = 0, and thereforeP (α(v2) = v1|α(v′2) = v′1) = 0. Then, if Pos is theset of possible mappings, we can write:

ω(v2, v1) := ω(v2, v1) ·∑

v′1∈Pos

ω(α(v′2, v′1)

or, what is equivalent, if Impos is the set of incom-patible mappings, since

∑v∈v1 ω(v2, v1) = 1:

ω(v2, v1) := ω(v2, v1)−∑

v′1∈Impos

ω(v′2, v′1)

We implement this by subtracting for each map-ping a value that is proportional to the confidencein the other mapping. Since agents do not exploreall the possible interpretations, the value is not com-puted exactly, but approximated considering only theinterpretations that have more confidence.

When the interpretation depends on the interpre-tation of many other mappings, there are too manypossibilities that difficult reasoning about which onecan be wrong. In this case, the agents use a defaultpunishment rp. This occurs when the positive ver-sions of premise and imm after are violated, whenV iol = ∅, and when and a violated constraint de-pends on a message that a1 sent, since they haveno information about how their messages were inter-preted by v2.

Lastly, let us discuss the value of rp. Choosingrp = 1

|V1| , the technique has the effect of subtracting

a larger value when the agent is confident in the cor-rectness of the previous interpretation. This is thevalue that we use.

5 Experimental Evaluation

In this section we present the results of experimentsthat show how the methods we propose perform ex-perimentally when used by agents with different vo-cabularies. Before presenting the results, let us dis-cuss the generation of data for experimentation.

Due to the lack of existing datasets, we performedall the experimentation with randomly generateddata. We created protocols with different sizes ofvocabulary and of set of constraints. To generate aprotocol from a vocabulary, we randomly chose con-straints and added them to the protocol if it remainedsatisfiable with the new rule. In the implementation,we express constraints over words instead of over mes-sages, so a constraint are valid for any combination ofsenders. We used bounds in our experimentation tolimit the possible constraints and to eventually finishthe interactions between agents, that in all cases hadthe value of the vocabulary size. We created pairsof compatible protocols P1,P2, with vocabularies V1and V2 respectively by building a bijective alignmentα : V2 → V1 and then using it to translate the con-

9

Figure 2: Results for a vocabulary of size 10

straints in a protocol P2, obtaining P1. We used theNuSMV model checker [6] to perform all the neces-sary satisfiability checks.

To evaluate how agents learn α from the experi-ence of interacting, we first need to define a mea-sure of how well an agent knows an alignment. Sinceagents always choose the possible mapping with high-est weight, we can easily extract from the interpre-tation distribution an alignment that represents thefirst interpretation choice for each foreign word in thefollowing interaction. This alignment, that we callαi for agent ai, is one of the mappings with highestweight for each foreign word. Formally, for a1, the do-main of α1 is v2 ∈ V2 such that ω(v2, v) is defined forall v ∈ V1, and α1(v2) = rnd(argmaxv∈V1

ω(v2, v)).Note that there are multiple possibilities with thesame value, α1(v2) takes a random word betweenthem.

To compare this alignment with α, we used thestandard precision and recall measures, and their har-monic mean combination, commonly known as F-score. To use the standard definitions directly, weonly need to consider alignments as relations insteadof functions: given an alignment α : V2 → V1, thecorresponding relation is the set of pairs 〈v2, v1〉 such

that α(v2) = v1.

Definition 9 Given two alignments β and γ ex-pressed as relations, the precision of β with respectto γ is the fraction of the mappings in β that are alsoin γ:

precision(β, γ) =| β ∩ γ || β |

while its recall is the fraction of the mappings in γthat were found by β:

recall(β, γ) =| β ∩ γ || γ |

Given an alignment β and a reference alignment γ,the F-score of the alignment is computed as follows:

F − score(β, γ) = 2 · precision(β, γ) · recall(β, γ)

precision(β, γ) + recall(β, γ)

We performed experiments parametrized with aprotocol and vocabulary size. In all experiments,agents are sequentially given pairs of compatible pro-tocols, and after each interaction we measure the F-score of their alignments with respect to α. The sameprotocols can appear repeatedly in the sequence, but

10

we consider that eventually new ones appear alwaysin the interaction, which implies that at some pointthe meaning of all words is fixed. Our agents alsoused the NuSMV model checker for both checkingsatisfiability and finding violated constraints. We fistintended to perform experiments with the same vo-cabulary sizes that are used for testing in [1], whichare 5, 10, 15, 40, 80. However, the interactions for size80 were too slow to be able to perform a reasonableamount of repetitions for each technique. This prob-lem is intrinsic to the protocols we use (since agentshave to decide if the messages they want to send arepossible), and should be taken into account in fu-ture work. Since varying the vocabulary size did notprovide particularly interesting results, we show herethe experiments for a vocabulary of 10 words andfour different protocol sizes. We used punishmentand rewards parameters of rp, rr = 0.3 for the simplestrategy, which were best in a preliminary test. Eachexperiment was repeated 10 times.

In a first experiment, we let agents interact 200times and measured the way they learned the align-ment for different protocol sizes, shown in Figure 2.The F-score if computed by averaging the F-score ofeach agent that participates in the interaction. Thecurves show that the smart strategy is always bet-ter than the simple one, but this difference is moredramatic when the protocols are smaller. This is be-cause there is less information, so using it intelligentlymakes a great difference. The curves are sharp in thebeginning, when agents do not have much informa-tion about α, and they make many mistakes fromwhich they learn fast. When agents reach a reason-able level of precision and recall, mistakes are lessfrequent, and therefore the pace of the learning slowsconsiderably. Although this affects the convergenceto an F-score of 1.0, it also means that, after a cer-tain number of interactions, agents have learn enoughto communicate successfully most of the times. Thesparse mistakes that they will make in future conver-sations make them learn more slowly the remainingmappings. Table 3 shows how fast agents reach a rea-sonable level of F-score, that we chose of 0.8 basedon [10].

In a second experiment, we studied the perfor-mance of our methods when used to repair existing

6 8 10 12

simple - 187 101 47

smart 139 87 57 33

Table 3: 0.8 convergence

alignments. To this aim, we created alignments withdifferent values of precision and recall with respect toα, that we divided into three categories. Low qual-ity alignments had precision and recall 0.2, mediumquality 0.5, and high quality 0.8. We only studiedalignments with the same value of precision and re-call to reduce the number of combinations and havea clear division of quality levels. We made agentswith this alignments interact, using the simple andthe smart technique. The results can be seen in Fig-ure 3. The number next to each agent class in thereference is the precision and recall of the alignmentthat was given to that agent. Again, the reasoningstrategy performs particularly better when only lit-tle information is available; in this case, the repair ismuch better than when using the simple strategy foralignments of low quality.

Figure 3: Results for different alignment qualities

11

6 Related Work

As we already mentioned, a large portion of theexistent work that tackles the problem of vocabu-lary alignment for multi-agent communication con-sists on techniques to let agents discuss and agreeon a common vocabulary or alignment. Some ofthese approaches use argumentation or negotiationtechniques [15, 12], performing an offline negotiationthat takes place before actual interactions can start.Other approaches, such as the three-level schema de-veloped by van Diggelen [8], discuss the meaning ofworks on a by-demand basis, that is, only when it isnecessary to use them to perform some task.

Other approaches use different kinds of groundingto learn alignments or the meaning of words from in-teractions. The well established research by Steels[20] considers a situation in which learners share aphysical environment, and have the ability of point-ing to things to communicate if they are talkingabout the same things. Goldman et al. [11], inves-tigated how agents can learn to communicate in away that maximises rewards in an environment thatcan be modelled as a Markov Decision Process. In[4], the authors study a version of the multiagent,multiarmed bandit problem in which agents can com-municate between each other with a common lan-guage, but message interpretations are not know.Closer to our approach is the work by Euzenat oncultural alignment repair [9], and particularly the al-ready mentioned work by Atencia and Schorlemmer[1], where the authors consider agents communicat-ing with interaction protocols represented with finitestate automata, and use a shared notion of task suc-cess as grounding. Our work, instead, considers onlythe coherence of the interaction as grounding. Tothe best of our knowledge, the work presented in thispaper is the first approach in which agents learn lan-guages dynamically from temporal interaction rules,taking into account only the coherence of the utter-ances.

A very well studied problem consists on learningdifferent structures from experience, such as gram-mars [7] or norms [16]. Our approach can be seen asthe reverse of these approaches, where some structureis considered shared and used to learn from.

7 Conclusions and FutureWork

The techniques we propose allow agents to learnalignments between their vocabularies only by inter-acting, assuming that their protocols share the sameset of models, without requiring any common meta-language or procedure. The assumption of sharingthe models can be removed, but the learning will beslower because an extra component of uncertainty isadded. Although our methods converge slowly, theyuse only very general information, and can be easilycombined with other methods that provide more in-formation about possible mappings, as we show withthe integration of previous alignments. Our agentslearn dynamically while interacting and the tech-niques achieve fast a reasonable level of knowledgeof the alignment. Therefore, a possible way of usingour techniques would be to go first through a trainingphase in which agents learn many mappings fast, andthen start performing the real interactions, and keeplearning to discover the remaining mappings. Thetechnique that uses particular semantics of the proto-cols, as expected, improves the learning. These tech-niques can be particularly developed for each kind ofprotocols. Agents do not need to use the same tech-niques or even the same logic, as long as they sharethe models.

There exist many possible directions of researchderived form this work. First, we could relax theassumption that agents do not share any meta-language, considering agents that can in some wayexchange information about the interaction. For ex-ample, considering only that agents can communi-cate whether they finished a task successfully wouldmake possible to reason about monotonic rules whenan interaction ends. An approach like this would re-late our work to the one by Santos [15] about di-alogues for meaning negotiation. Another possibleextension consists in considering agents that priori-tize the language learning to performing the task. Inthis case, agents would have to decide what utter-ances it is more convenient to make in order to getmore information about the alignment. An aspectthat should be improved in future work is the per-

12

formance in terms of runtime per interaction, since itwas very slow for larger vocabularies.

References

[1] M. Atencia and M. Schorlemmer. Aninteraction-based approach to semantic align-ment. Journal of Web Semantics, 12-13:131–147,2012.

[2] M. Baldoni, C. Baroglio, and E. Marengo.Behavior-oriented commitment-based protocols.In ECAI 2010 - 19th European Conference onArtificial Intelligence, Lisbon, Portugal, August16-20, 2010, Proceedings, pages 137–142, 2010.

[3] M. Baldoni, C. Baroglio, E. Marengo, andV. Patti. Constitutive and regulative specifi-cations of commitment protocols: A decoupledapproach (extended abstract). In Proceedingsof the Twenty-Fourth International Joint Con-ference on Artificial Intelligence, IJCAI 2015,Buenos Aires, Argentina, July 25-31, 2015,pages 4143–4147, 2015.

[4] S. Barrett, N. Agmon, N. Hazon, S. Kraus,and P. Stone. Communicating with unknownteammates. In T. Schaub, G. Friedrich, andB. O’Sullivan, editors, ECAI 2014 — 21st Euro-pean Conference on Artificial Intelligence, 18-22August 2014, Prague, Czech Republic — Includ-ing Prestigious Applications of Intelligent Sys-tems (PAIS 2014), volume 263 of Frontiers inArtificial Intelligence and Applications, pages45–50. IOS Press, 2014.

[5] P. Chocron and M. Schorlemmer. Attuning on-tology alignments to semantically heterogeneousmulti-agent interactions. In ECAI 2016 - 22ndEuropean Conference on Artificial Intelligence,The Hague, The Netherlands, pages 871–879,2016.

[6] A. Cimatti, E. Clarke, E. Giunchiglia,F. Giunchiglia, M. Pistore, M. Roveri, R. Se-bastiani, and A. Tacchella. Nusmv 2: Anopensource tool for symbolic model checking.

In International Conference on Computer AidedVerification, pages 359–364. Springer, 2002.

[7] C. de la Higuera. Grammatical Inference: Learn-ing Automata and Grammars. Cambridge Uni-versity Press, New York, NY, USA, 2010.

[8] J. V. Diggelen, R. Beun, F. Dignum, R. M. V.Eijk, and J. J. Meyer. Ontology negotiation:Goals, requirements and implementation. Int.J. Agent-Oriented Softw. Eng., 1(1):63–90, Apr.2007.

[9] J. Euzenat. First Experiments in Cultural Align-ment Repair. In Semantic Web: ESWC 2014Satellite Events, volume 8798, pages 115–130,2014.

[10] J. Euzenat, C. Meilicke, H. Stuckenschmidt,P. Shvaiko, and C. Trojahn. Ontology align-ment evaluation initiative: Six years of expe-rience. Journal on Data Semantics XV, pages158–192, 2011.

[11] C. V. Goldman, M. Allen, and S. Zilberstein.Learning to communicate in a decentralized en-vironment. Autonomous Agents and Multi-AgentSystems, 15(1):47–90, Aug. 2007.

[12] L. Laera, I. Blacoe, V. A. M. Tamma, T. R.Payne, J. Euzenat, and T. J. M. Bench-Capon.Argumentation over ontology correspondencesin MAS. In E. H. Durfee, M. Yokoo, M. N.Huhns, and O. Shehory, editors, 6th Interna-tional Joint Conference on Autonomous Agentsand Multiagent Systems (AAMAS 2007), Hon-olulu, Hawaii, USA, May 14-18, 2007, page 228,2007.

[13] M. Montali. Specification and verification ofdeclarative open interaction models: A logic-based approach, volume 56 of Lecture Notesin Business Information Processing. Springer-Verlag New York Inc, 2010.

[14] M. Pesic and W. M. P. van der Aalst. A declar-ative approach for flexible business processes

13

management. In Proceedings of the Interna-tional Conference on Business Process Manage-ment Workshops, BPM, pages 169–180, Berlin,Heidelberg, 2006. Springer-Verlag.

[15] G. Santos, V. Tamma, T. R. Payne, andF. Grasso. A dialogue protocol to support mean-ing negotiation. (extended abstract). In Pro-ceedings of the 2016 International Conferenceon Autonomous Agents and Multiagent Systems,AAMAS ’16, pages 1367–1368, Richland, SC,2016. International Foundation for AutonomousAgents and Multiagent Systems.

[16] S. Sen and S. Airiau. Emergence of normsthrough social learning. In IJCAI, volume 1507,page 1512, 2007.

[17] G. Shafer. Jeffrey’s rule of conditioning. Philos-ophy of Science, 48(3):337–362, 1981.

[18] N. Silva, G. I. Ipp, and P. Maio. An approach toontology mapping negotiation. In Proceedings ofthe Workshop on Integrating Ontologies, pages54–60, 2005.

[19] M. P. Singh. A social semantics for agent com-munication languages. In Issues in Agent Com-munication, pages 31–45, London, UK, 2000.Springer-Verlag.

[20] L. Steels. The origins of ontologies and com-munication conventions in multi-agent systems.Autonomous Agents and Multi-Agent Systems,1(2):169–194, Oct. 1998.

14

Paula Chocron arXiv:1703.02367v1 [cs.MA] 7 Mar 2017 · Vocabulary Alignment in Openly Speci ed...

Documents

Transcript of Paula Chocron arXiv:1703.02367v1 [cs.MA] 7 Mar 2017 · Vocabulary Alignment in Openly Speci ed...