Link Discovery Tutorial Part II: Accuracy

81
Link Discovery Tutorial Part II: Accuracy Axel-Cyrille Ngonga Ngomo (1) , Irini Fundulaki (2) , Mohamed Ahmed Sherif (1) (1) Institute for Applied Informatics, Germany (2) FORTH, Greece October 18th, 2016 Kobe, Japan Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54

Transcript of Link Discovery Tutorial Part II: Accuracy

Link Discovery TutorialPart II: Accuracy

Axel-Cyrille Ngonga Ngomo(1), Irini Fundulaki(2), Mohamed Ahmed Sherif(1)

(1) Institute for Applied Informatics, Germany(2) FORTH, Greece

October 18th, 2016Kobe, Japan

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 1 / 54

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 2 / 54

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 3 / 54

IntroductionLink Discovery as Classification Task

Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54

IntroductionLink Discovery as Classification Task

Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54

IntroductionLink Discovery as Classification Task

Definition (Declarative Link Discovery)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : R(s, t)Here, find M ′ = (s, t) ∈ S × T : δ(s, t) ≥ τ

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Classical machine learning problem [Ngo+11; NL12]Dedicated techniques perform betterUnsupervised, active and unsupervised techniques possible

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 4 / 54

IntroductionChallenge

Challenges1 Creation of labeled training data tedious2 Need automated means for automatic class and property matching3 Need for efficient execution of link specifications4 Dedicated machine learning approaches necessary

Solutions1 Use active learning approach for link discovery2 Rely on hospital/resident algorithm3 See previous section4 Topic of this section

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54

IntroductionChallenge

Challenges1 Creation of labeled training data tedious2 Need automated means for automatic class and property matching3 Need for efficient execution of link specifications4 Dedicated machine learning approaches necessary

Solutions1 Use active learning approach for link discovery2 Rely on hospital/resident algorithm3 See previous section4 Topic of this section

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 5 / 54

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 6 / 54

RAVENApproach

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn

AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54

RAVENApproach

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn

AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54

RAVENApproach

Definition (Classification perspective)Given sets S and T of resources and relation RFind M = (s, t) ∈ S × T : C(s, t) = +1Here, C(s, t) = +1↔ σ(s, t) ≥ θ

Learning classifier C involves learning1 Two sets of restrictions that specify the sets S resp. T,2 the components σ1 . . . σn of a complex similarity measure σ3 a set of thresholds θ1, ..., θn for σ1, . . . , σn

AssumptionsRestrictions are class restrictionsClassifier shape is given (e.g., linear combination)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 7 / 54

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 8 / 54

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 9 / 54

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 10 / 54

RAVENApproach

Class and Property RestrictionsDefine class similarity functionSolve corresponding hospital-resident problemBased on extension of stable marriage problem

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 11 / 54

RAVENApproach

Class RestrictionsSimilarity function

String similarityNumber of shared property values amongst instances. . .

Solve corresponding hospital-resident problem

Source Target S TDrugbank Disesome Targets GenesSider Diseasome Side-Effect Diseases

DBpedia Dailymed Organization OrganizationSider Dailymed Drugs Offer

Drugbank DBpedia Targets ProteinProperty mapping similarLeads to σ1 . . . σn

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54

RAVENApproach

Class RestrictionsSimilarity function

String similarityNumber of shared property values amongst instances. . .

Solve corresponding hospital-resident problem

Source Target S TDrugbank Disesome Targets GenesSider Diseasome Side-Effect Diseases

DBpedia Dailymed Organization OrganizationSider Dailymed Drugs Offer

Drugbank DBpedia Targets ProteinProperty mapping similarLeads to σ1 . . . σn

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 12 / 54

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Guess initial classifier

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Guess initial classifier

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 13 / 54

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Pick most informative examples, i.e., unclassified and closest to boundary

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 14 / 54

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Ask for classification from oracle

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 15 / 54

RAVENApproach

Learning ThresholdActive perceptron learningBegin with educated guess, e.g., θi = 0.9Update thresholds based on most informative examples

Update classifier

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 16 / 54

RAVENEvaluation

Evaluation on Diseases (Diseasome to DBpedia)Learning rate = 0.0210 questions/iterationF-measure of up to 92%

1 3 5 7 9 11 13 15 17 19 21 23 25

Number of iterations

0

10

20

30

40

50

60

70

80

90

100

P (%)R (%)F (%)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 17 / 54

RAVENEvaluation

Learning rate = 0.0210 questions/iteration

1 3 5 7 9 11 13 15 17 19 21 23 25

Number of iterations

10

100

1000

Run

time

(ms)

DiseasesDrugsSide Effects

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 18 / 54

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 19 / 54

EagleEfficient Active Learning of Link Specifications using Genetic Programming

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 20 / 54

EagleEfficient Active Learning of Link Specifications using Genetic Programming

EagleProvides means for automatic class and property matchingMinimizes human labeling effort through active learningAllow for learning generic specs (limitation of RAVEN)Similar approaches [NIK+12; ISE+12]

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 21 / 54

EagleFormal Definition

Same formal setting as RAVENTwo sets of restrictions resp. that specify the sets S resp. T ,a specification of mapping properties (p1, q1), . . . , (pn, qn) for the elements ofS and T anda specification of a complex similarity measure σ as the combination of severalatomic similarity measures σ1, . . . , σn and of a set of thresholds θ1, . . . , θn suchthat θi is the threshold for σi .

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 22 / 54

EagleLS example

Can learn generic classifier type

f (levenshtein(:title, :title), 0.53)

f (cosine(:venue, :year), 1.00)\

f (jaccard(:title, :authors), 0.43)

f (trigrams(:title, :year), 1.00)

ut

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 23 / 54

EagleIdea & Goal

EagleIdea: Specifications are treesGoal: Learn elements of trees through genetic operations until best LS isfound

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 24 / 54

Eagle AlgorithmStep 1: Generate initial population

Random process (property pairs, thresholds)Compute fitnessFitness = F-Measure w.r.t known data

(m1, θ1)

p1 q1

(m2, θ2)

p2 q2

(m3, θ3)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 25 / 54

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

(m1, θ1)

p1 q1

(m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

(m1, θ1)

p1 q1

(m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

(m1, θ1)

p1 q1

(m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

p1 q1

(m1, θ1 + α) (m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4) (m5, θ5)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Eagle AlgorithmStep 2: Evolve population

Tournament between two individualsTwo operators: Mutation and crossover

p1 q1

(m1, θ1 + α) (m3, θ3)

p2 q2

(m2, θ2)

p3 q3

u

(m4, θ4)

p3 q3

(m2, θ2)

p3 q3

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 26 / 54

Eagle AlgorithmStep 3: Computation of most informative links

Previous approaches define amount of information of link as closeness to thedecision boundaryHere, use disagreement amongst elements of population of size n

δ((s, t)) = (n − |Mti : (s, t) ∈Mi)|)(n − |Mt

i : (s, t) /∈Mi |)

Function is maximal when n2 count (s, t) as positive and n

2 as negativeCan be modeled with other functions such as entropy

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 27 / 54

Eagle AlgorithmStep 4: Active Learning

Compute d((s, t)) for all (s, t) returned by a LSPick k most informativeRequire labeling from userUpdate list of positive and negative examples

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 28 / 54

Eagle AlgorithmStep 5: Remove least fit elements

Fitness = F-Measure w.r.t known data

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

If termination conditions not met, goto Step 2Else terminate and pick fittest LS

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54

Eagle AlgorithmStep 5: Remove least fit elements

Fitness = F-Measure w.r.t known data

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

If termination conditions not met, goto Step 2Else terminate and pick fittest LS

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54

Eagle AlgorithmStep 5: Remove least fit elements

Fitness = F-Measure w.r.t known data

(m1, θ1 + α)

p1 q1

(m2, θ2)

p3 q3

(m3, θ3)

p2 q2

u

(m4, θ4) (m2, θ2)

p3 q3 p2 q2

If termination conditions not met, goto Step 2Else terminate and pick fittest LS

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 29 / 54

Eagle AlgorithmUnsupervised learning

Measure degree of monogamy of links [NIK+12]Only works for 1-1 relations, e.g., owl:sameAs

P(M) = |s|∃t : (s, t) ∈ M|∑s|t : (s, t) ∈ M| ,R(M) = |t|∃s : (s, t) ∈ M|∑

t|s : (s, t) ∈ M| . (1)

Fβ(M) = (1 + β2) Pd(M)Rd(M)β2Pd(M) +Rd(M) (2)

s1

s2

s3

t1

t2

t3

t4

linklinklinklink

P = 3/4, R = 2/4, F = 3/5.Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 30 / 54

EagleExperiments and Results

Experimental Setup:Compared batch learning and genetic programmingUsed 3 different data sets

1 Dailymed-Drugbank (LATC)2 DBpedia-LinkedMDB (LATC)3 DBLP-ACM

Compared different sizes of population (20,100)Compared random annotation with active learningMutation and crossover rates = 0.6Maximal number of iterations = 50

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 31 / 54

EagleExperiments and Results

Dailymed-Drugbank

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 32 / 54

EagleExperiments and Results

DBpedia-LinkedMDB

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 33 / 54

EagleExperiments and Results

DBLP-ACM

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 34 / 54

EagleExperiments and Results

Larger population leads toBetter results, yetLonger runtimes

For most datasets, population size of 100 seems sufficient for most linkeddata setsEAGLE is more time-efficient than state of the art

337s for ACM-DBLP (n=100) vs.1553s for Marlin (ADTree)2196s for Marlin (SVM)4320s for Febrl (SVM)

Active learning clearly outperforms random annotation

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 35 / 54

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 36 / 54

CoalaCorrelation-Aware Active Learning of Link Specifications

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 37 / 54

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 38 / 54

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54

CoalaLearning Complex Specifications

Supervised (mostly active, e.g., RAVEN, EAGLE, SILK)Unsupervised (e.g., KnoFuss, EUCLID, EAGLE)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 39 / 54

CoalaLearning Complex Specifications

InsightChoice of right example is key for learningSo far, only use of informativeness

QuestionCan we do better by using more information?Higher F-measureOften slower

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54

CoalaLearning Complex Specifications

InsightChoice of right example is key for learningSo far, only use of informativeness

QuestionCan we do better by using more information?

Higher F-measureOften slower

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54

CoalaLearning Complex Specifications

InsightChoice of right example is key for learningSo far, only use of informativeness

QuestionCan we do better by using more information?Higher F-measureOften slower

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 40 / 54

Coala ApproachBasic Idea

Use similarity of link candidates when selecting most informative examples

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54

Coala ApproachBasic Idea

Use similarity of link candidates when selecting most informative examples

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54

Coala ApproachBasic Idea

Use similarity of link candidates when selecting most informative examples

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 41 / 54

CoalaSimilarity of Candidates

Link candidate x = (s, t) can be regarded as vector(σ1(x), . . . , σn(x)) ∈ [0, 1]n.Similarity of link candidates x and y :

sim(x , y) = 1

1 +√

n∑i=1

(σi(x)− σi(y))2

. (3)

Allows exploiting both intra- and inter-class similarity

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 42 / 54

CoalaGraph Clustering

Rationale: Use intra-class similarityApproach

Cluster elements of S+ and S− independentlyChoose one element per cluster as representativePresent oracle with most informative representatives

0.8

0.9

0.8

S+

S-

0.8

0.9

0.8

0.25

0.25

0.9

0.80.8

0.8

0.25a

b

c

d

e

d

f g

hi

k

l

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 43 / 54

CoalaBorderFlow

G = (V ,E , ω) with V = S+ or V = S−

ω(x , y) = sim(x , y)Keep best ec edges for each x ∈ V

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 44 / 54

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 45 / 54

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54

CoalaBorderFlow

Seed-based algorithmGoal: Maximize borderflow ratio bf (X ) = Ω(b(X),X)

Ω(b(X),n(X))

http://sourceforge.net/projects/cugar-framework/Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 46 / 54

CoalaSpreading Activation

Rationale: Use both inter- and intra-class similarityApproach

M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2

A0 : ai = ifm(xi )

At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©

t−1 (weight decay)

3 iterations 3.9*10-3

0.97

0.691

0.73 1.5*10-5

3.9*10-3

1.5*10-5

3.9*10-3

S+ S-

0.8

0.80.9

0.90.25

0.5 0.5

0.25

0.5

S+ S-

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54

CoalaSpreading Activation

Rationale: Use both inter- and intra-class similarityApproach

M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2

A0 : ai = ifm(xi )At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©

t−1 (weight decay)

3 iterations 3.9*10-3

0.97

0.691

0.73 1.5*10-5

3.9*10-3

1.5*10-5

3.9*10-3

S+ S-

0.8

0.80.9

0.90.25

0.5 0.5

0.25

0.5

S+ S-

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54

CoalaSpreading Activation

Rationale: Use both inter- and intra-class similarityApproach

M0 : mij = sim(xi , xj) with (xi , xj) ∈ (S+ ∪ S−)2

A0 : ai = ifm(xi )At = At−1 + Mt−1At−1 (spread activation)At = At/max(At) (normalize)Mt = M r©

t−1 (weight decay)

3 iterations 3.9*10-3

0.97

0.691

0.73 1.5*10-5

3.9*10-3

1.5*10-5

3.9*10-3

S+ S-

0.8

0.80.9

0.90.25

0.5 0.5

0.25

0.5

S+ S-

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 47 / 54

Coala EvaluationExperimental Setup

Used EAGLE as active learning approachMutation and crossover rate = 0.6Selection rate = 0.7Not deterministic ⇒ Ran each experiment 5 times5 queries to oracle per iteration10 iterations overall2 populations sizes: 20 and 10050 generations between iterations

Two real-world and three synthetic datasetsSingle thread of a server (JDK1.7, Ubuntu 10.0.4, AMD Opteron 2GHz,2GB/Experiment)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 48 / 54

Coala EvaluationParameters for WD

Ran experiments on DBLP-ACMPopulation = 20r ∈ 2, 4, 8, 16, 32

10 20 30 40 50 60 70 80 90 1000.5

0.6

0.7

0.8

0.9

1

F-sc

ore

0

200

400

600

800

1,000

runti

me in s

eco

nds

f(2) f(4) f(8) f(16) f(32)d(2) d(4) d(8) d(16) d(32)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 49 / 54

Coala EvaluationParameters for CL

Ran experiments on DBLP-ACMPopulation = 20ec ∈ 1, 2, 3, 4, 5

10 20 30 40 50 60 70 80 90 1000.5

0.6

0.7

0.8

0.9

1

F-sc

ore

0

500

1,000

1,500

2,000

runti

me in s

eco

nd

s

f(1) f(2) f(3) f(4) f(5)d(1) d(2) d(3) d(4) d(5)

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 50 / 54

Coala EvaluationF-Scores

Population = 100, final valuesBetter results, yet unclear when to use WD or CL

DataSet EAGLE WD CLAbt 0.19±0.04 0.25±0.04 0.23±0.04DBLP 0.91±0.03 0.96±0.01 0.96±0.02Person1 0.86±0.02 0.89±0.01 0.81±0.18Person2 0.74±0.03 0.71±0.08 0.77±0.03Restaurant 0.89±0.0 0.86±0.02 0.89±0.0

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 51 / 54

Table of Contents

1 Introduction

2 Raven

3 Eagle

4 Coala

5 Summary and Conclusion

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 52 / 54

Summary and Conclusion

Large number of challenges tolearning accurate specifications

1 Reduce labeling effort⇒ Active learning

2 Learn complex specifications⇒ Genetic programming

3 Learn specifications efficienty⇒ See previous slides

Challenges include1 Determinism2 Deep learning3 Self-checking4 . . .

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 53 / 54

Acknowledgment

This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).

Ngonga Ngomo et al. (InfAI & FORTH) LD Tutorial: Accuracy October 17, 2016 54 / 54