Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1...
-
Upload
malcolm-barnett -
Category
Documents
-
view
218 -
download
2
Transcript of Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005 AIFB Slide 1...
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 1
AIFB
Connectionist Knowledge Representation and Reasoning
(Part I)
Neural Networks and Structured Knowledge
SCREECH
Fodor, Pylyshin: What’s deeply wrong with Connectionist architecture is this: Because it acknowledges neither syntactic nor semantic structure in mental representations, it perforce treats them not as a generated set but as a list.[Connectionism and Cognitive Architecture 88]
Our claim: state-of-the-art connectionist architectures do adequately deal with structures!
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 2
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 3
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 4
AIFB
The good old days – KBANN and co.
fw :ℝn ℝo
x y
x1
x2
xn
w1
w2
wn
…θ σ(wtx - θ)
σ(t) = sgd(t) = (1+e-t)-1
feedforward neural network
neuron
1. black box2. distributed
representation3. connection to
rules for symbolic I/O ?
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 5
AIFB
The good old days – KBANN and co.
Knowledge Based Artificial Neural Networks [Towell/Shavlik,
AI 94]:• start with a network which represents known rules• train using additional data• extract a set of symbolic rules after training
(partial)rules
data
train
(complete)rules
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 6
AIFB
The good old days – KBANN and co.
(partial)rules
propositional acyclic Horn clauses such as A :- B,C,DE :- A
B
C
D
A
1
1
1
2.5
FNN with one neuron per boolean variable
E1
0.5
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 7
AIFB
The good old days – KBANN and co.
data
train
use some form of backpropagation,add a penalty to the error e.g. for changing the weights
1. The initial network biases the training result, but2. There is no guarantee that the initial rules are preserved3. There is no guarantee that the hidden neurons maintain their
semantic
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 8
AIFB
The good old days – KBANN and co.
1. There is no exact direct correspondence of a neuron and a single rule, although each neuron (and the overall mapping) can be approximated by a set of rules arbitrarily well
2. It is NP complete to find a minimum logical description for a trained network [Golea, AISB'96]
3. Therefore, a couple of different rule extraction algorithms have been proposed, and this is still a topic of ongoing research
(complete)rules
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 9
AIFB
The good old days – KBANN and co.
(complete)rules
rules
decompositional approach:
rulesx y
pedagogical approach:
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 10
AIFB
The good old days – KBANN and co.
Decompositional approaches:• subset algorithm, MofN algorithm describe single neurons by
sets of active predecessors [Craven/Shavlik, 94] • local activation functions (RBF like) allow an approximate direct
description of single neurons [Andrews/Geva, 96]• MLP2LN biases the weights towards 0/-1/1 during training and
can then extract exact rules [Duch et al., 01]• prototype based networks can be decomposed along relevant
input dimensions by decision tree nodes [Hammer et al., 02]
Observation:• usually some variation of if-then rules is achieved• small rule sets are only achieved if further constraints
guarantee that single weights/neurons have a meaning• tradeoff between accuracy and size of the description
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 11
AIFB
The good old days – KBANN and co.
Pedagogical approaches:• extraction of conjunctive rules by extensive search [Saito/Nakano
88]
• interval propagation [Gallant 93, Thrun 95]
• extraction by minimum separation [Tickle/Diderich, 94]
• extraction of decision trees [Craven/Shavlik, 94]
• evolutionary approaches [Markovska, 05]
Observation:• usually some variation of if-then rules is achieved• symbolic rule induction required with a little (or a bit more) help
of a neural network
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 12
AIFB
The good old days – KBANN and co.Where is this good for?• Nobody uses FNNs these days • Insertion of prior knowledge might be valuable. But efficient training
algorithms allow to substitute this by additional training data (generated via rules)
• Validation of the network output might be valuable, but there exist alternative (good) guarantees from statistical learning theory
• If-then rules are not very interesting since there exist good symbolic learners for learning propositional rules for classification
• Propositional rule insertion/extraction is often an essential part of more complex rule insertion/extraction mechanisms
• Demonstrates a key problem, different modes of representation, in a very nice way
• Some people e.g. in the medical domain also want an explanation for a classification
• There are at least two application domains where if-then rules are very interesting and not so easy to learn: fuzzy-control and unsupervised data mining
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 13
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 14
AIFB
Useful: neurofuzzy systems
processinput observation
control
Fuzzy control:
if (observation є FMI) then (control є FMO)
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 15
AIFB
Useful: neurofuzzy systems
Fuzzy control:
if (observation є FMI) then (control є FMO)
Neurofuzzy control:
observation control
implementation of the fuzzy-if-then rules
Benefit: the form of the fuzzy rules (i.e. neural architecture) and the shape of the fuzzy sets (i.e. neural weights) can be learned from data!
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 16
AIFB
Useful: neurofuzzy systems
• NEFCON implements Mamdani control [Nauck/Klawonn/Kruse, 94]
• ANFIS implements Takagi-Sugeno control [Jang, 93]
• and many other …• Learning
– of rules: evolutionary or clustering– of fuzzy set parameters: reinforcement learning or
some form of Hebbian learning
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 17
AIFB
Useful: data mining pipeline
Task: describe given inputs (no class information) by if-then rules
• Data mining with emergent SOM, clustering, and rule extraction [Ultsch, 91]
rules
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 18
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 19
AIFB
State of the art: structure kernels
SV
Mkernelk(x,x’)
data
sets, sequences, tree structures, graph structures
just compute pairwise distances for this complex data using structure information
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 20
AIFB
State of the art: structure kernels
• Closure properties of kernels [Haussler, Watkins]
• Principled problems for complex structures: computing informative graph kernels is at least as hard as graph isomorphism [Gärtner]
• Several promising proposals - taxonomy [Gärtner]:
count common substructures
derived from a probabilistic model
derived fromlocal transformations
syntax
semantic
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 21
AIFB
State of the art: structure kernels
Count common substructures:
GAGAGA
GAT
GA AG AT
3 2 0
1 0 13
Efficient computation:dynamic programmingsuffix trees
locality improved kernel [Sonnenburg et al.], bag of words [Joachims]string kernel [Lodhi et al.], spectrum kernel [Leslie et al.]word-sequence kernel [Cancedda et al.]
convolution kernels for language [Collins/Duffy, Kashima/Koyanagi, Suzuki et al.]kernels for relational learning [Zelenko et al.,Cumby/Roth, Gärtner et al.]
graph kernels based on paths or subtrees [Gärtner et al.,Kashima et al.]kernels for prolog trees based on similar symbols [Passerini/Frasconi/deRaedt]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 22
AIFB
State of the art: structure kernels
Derived from a probabilistic model:
describe byprobabilistic model P(x)
compare characteristics of P(x)
Fisher kernel [Jaakkola et al., Karchin et al., Pavlidis et al., Smith/Gales, Sonnenburg et al., Siolas et al.]tangent vector of log odds [Tsuda et al.]marginalized kernels [Tsuda et al., Kashima et al.]
kernel of Gaussian models [Moreno et al., Kondor/Jebara]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 23
AIFB
State of the art: structure kernels
Derived from local transformations:
is similar
to
local neighborhood, generator H
expand to aglobal kernel
diffusion kernel [Kondor/Lafferty, Lafferty/Lebanon, Vert/Kanehisa]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 24
AIFB
State of the art: structure kernels
• Intelligent preprocessing (kernel extraction) allows an adequate integration of semantic/syntactic structure information
• This can be combined with state of the art neural methods such as SVM
• Very promising results for– Classification of documents, text [Duffy, Leslie, Lodhi, …]
– Detecting remote homologies for genomic sequences and further problems in genome analysis [Haussler, Sonnenburg, Vert, …]
– Quantitative structure activity relationship in chemistry [Baldi et al.]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 25
AIFB
Conclusions: feedforward networks
• propositional rule insertion and extraction (to some extend) are possible
• useful for neurofuzzy systems, data mining • structure-based kernel extraction followed by learning
with SVM yields state of the art results but: sequential instead of fully integrated neuro-
symbolic approach
• FNNs itself are restricted to flat data which can be processed in one shot. No recurrence
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 26
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: partially recurrent networks
– Lots of theory: principled capacity and limitations
– To do: challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 27
AIFB
The basics : partially recurrent networks
xt+1 = f(xt,It)
[Elman, Finding structure in time, CogSci 90]very natural architecture for processing speech/temporal signals/control/robotics
1. can process time series of arbitrary length
2. interesting for speech processing [see e.g. Kremer, 02]
3. training using a variation of backpropagation [see e.g. Pearlmutter, 95]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 28
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 29
AIFB
Lots of theory: principled capacity and limitations
RNNs and finite automata [Omlin/Giles, 96]
input
state
output
dynamics of the transition function of a DFA
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 30
AIFB
Lots of theory: principled capacity and limitations
DFA RNN
unary input
unary state representation
implement (approximate) the boolean formula corresponding to the state transition within a two-layer network
RNNs can exactly simulate finite automata
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 31
AIFB
Lots of theory: principled capacity and limitations
RNN DFA
unary input
in general: distributed state representation
cluster into disjoint subsets corresponding to states and observe their behavior approximate description
approximate extraction of automata rules is possible
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 32
AIFB
The principled capacity of RNNs can be characterized exactly:
Lots of theory: principled capacity and limitations
RNNs with small weights or Gaussian noise= finite memory models [Hammer/Tino, Maass/Sontag]
RNNs with limited noise = finite state automata[Omlin/Giles, Maass/Orponen]
RNNs with rational weights = Turing machines[Siegelmann/Sontag]
RNNs with arbitrary weights = non uniform Boolean circuits(super Turing capability) [Siegelmann/Sontag]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 33
AIFB
However, learning might be difficult:• gradient based learning schemes face the problem of long-term-
dependencies [Bengio/Frasconi]
• RNNs are not PAC-learnable (infinite VC-dim), only distribution dependent bounds can be derived [Hammer]
• there exist only few general guarantees for the long term behavior of RNNs, e.g. stability [Suykens, Steil, …]
•
Lots of theory: principled capacity and limitations
error
tatata tatatatatatatatatatatatatatatatatata
tatatatatatatatatatatatatatatatatatatatatatatatatata
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 34
AIFB
RNNs • naturally process time series • incorporate plausible regularization such as a bias towards
finite memory models • have sufficient power for interesting dynamics (context free,
context sensitive; arbitrary attractors and chaotic behavior) but: • training is difficult • only limited guarantees for the long term behavior and
generalization ability
symbolic description/knowledge can provide solutions
Lots of theory: principled capacity and limitations
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 35
AIFB
RNN recurrent symbolic system
real numbers;iterated function systems give rise to fractals/attractors/chaos, implicit memory
discrete states;crisp boolean function on the states + explicit memory
correspondence?e.g. attractor/repellor for counting anbncn
Lots of theory: principled capacity and limitations
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 36
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 37
AIFB
To do: challenges
Training RNNs: • search for appropriate regularizations inspired by a focus on
specific functionalities – architecture (e.g. local), weights (e.g. bounded), activation function (e.g. linear), cost term (e.g. additional penalties) [Hochreiter,Boden,Steil,Kremer…]
• insertion of prior knowledge – finite automata and beyond (e.g. context free/sensitive, specific dynamical patterns/attractors) [Omlin,Croog,…]
Long term behavior:• enforce appropriate constraints while training• investigate the dynamics of RNNs – rule extraction,
investigation of attractors, relating dynamics and symbolic processing [Omlin,Pasemann,Haschke,Rodriguez,Tino,…]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 38
AIFB
To do: challenges
Some further issues:• processing spatial data
• unsupervised processing
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
[bicausal networks, Pollastri et al.,contextual RCC, Micheli et al.]
x1,x2,x3,x4,…[TKM, Chappell/Taylor, RecSOM, Voegtlin, SOMSD, Sperduti et al., MSOM, Hammer et al., general formulation, Hammer et al.]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 39
AIFB
Conclusions: recurrent networks
• the capacity of RNNs is well understood and promising e.g. for natural language processing, control, …
• recurrence of symbolic systems has a natural counterpart in the recurrence of RNNs
• training and generalization faces problems which could be solved by hybrid systems
• discrete dynamics with explicit memory versus real-valued iterated function systems
• sequences are nice, but not enough
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 40
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 41
AIFB
The general idea: recursive distributed representations
How to turn tree structures/acyclic graphs into a connectionist representation?
vector
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 42
AIFB
The general idea: recursive distributed representations
vector
recursion!
inp.cont.
output
cont.
f:ℝixℝcxℝcℝc
yields
fenc wherefenc(ξ)=0fenc(a(l,r))=f(a,fenc(l),fenc(r))
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 43
AIFB
f:ℝn+2cℝc g:ℝcℝo
decoding:hdec: ℝo (ℝn)2*hdec(0) = ξhdec(x) = h0(x) (hdec(h1(x)), hdec(h2(x)))
inp.cont.
cont.
encoding
image
h:ℝoℝn+2o
label
leftright
encoding:fenc:(ℝn)2*ℝc
fenc(ξ) = 0fenc(a(l,r)) = f(a,fenc(l),fenc(r))
The general idea: recursive distributed representations
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 44
AIFB
The general idea: recursive distributed representations
• recursive distributed description [Hinton,90]
– general idea without concrete implementation • tensor construction [Smolensky, 90]
– encoding/decoding given by (a,b,c) a⊗b⊗c– increasing dimensionality
• Holographic reduced representation [Plate, 95]
– circular correlation/convolution– fixed encoding/decoding with fixed dimensionality (but potential loss of
information) – necessity of chunking or clean-up for decoding
• Binary spatter codes [Kanerva, 96]
– binary operations, fixed dimensionality, potential loss– necessity of chunking or clean-up for decoding
• RAAM [Pollack,90], LRAAM [Sperduti, 94]
– trainable networks, trained for the identity, fixed dimensionality– encoding optimized for the given training set
inp.
con
t.co
nt.
code
lab
el
left
right
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 45
AIFB
Nevertheless: results not promising
Theorem [Hammer]:• There exists a fixed size neural network which can
uniquely encode tree structures of arbitrary depth with discrete labels
• For every code, decoding of all trees up to height T requires Ω(2T) neurons for sigmoidal networks
encoding seems possible, but no fixed size architecture exists for decoding
The general idea: recursive distributed representations
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 46
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 47
AIFB
One breakthrough: recursive networks
Recursive networks [Goller/Küchler, 96]:
• do not use decoding• combine encoding and mapping • train this combination directly for the given task with
backpropagation through structure
efficient data and problem adapted encoding is learned
inp.cont.
output
cont.
encodingtransformation
y
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 48
AIFB
One breakthrough: recursive networks
Applications:• term classification [Goller, Küchler, 1996]• automated theorem proving [Goller, 1997]• learning tree automata [Küchler, 1998]• QSAR/QSPR problems [Schmitt, Goller, 1998; Bianucci, Micheli,
Sperduti, Starita, 2000; Vullo, Frasconi, 2003] • logo recognition, image processing [Costa, Frasconi, Soda, 1999,
Bianchini et al. 2005]• natural language parsing [Costa, Frasconi, Sturt, Lombardo, Soda,
2000,2005]• document classification [Diligenti, Frasconi, Gori, 2001]• fingerprint classification [Yao, Marcialis, Roli, Frasconi, Pontil, 2001]• prediction of contact maps [Baldi, Frasconi, Pollastri, Vullo, 2002]• protein secondary structure prediction [Frasconi et al., 2005]• …
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 49
AIFB
One breakthrough: recursive networks
Desired: approximation completeness - for every (reasonable) function f and ε>0 exists a RecNN which approximates f up to ε (with appropriate distance measure)
Approximation properties can be measured in several ways:
given f, ε, probability P, data points xi, find fw such that
• P(x | |f(x)-fw(x)| > ε ) small (L1 norm) or
• |f(x)-fw(x)| < ε for all x (max norm) or
• f(xi) = fw(xi) for all xi (interpolation of points)
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 50
AIFB
One breakthrough: recursive networks
Approximation properties for RecNNs and tree-structured data:
… capable of approximating every continuous function in max-norm with restricted height, every measurable function in L1-norm (σ:squashing) [Hammer]
… capable of interpolating every set {f(x1),…,f(xm)} with O(m2) neurons (σ:squashing, C2 in environment of t s.t. σ‘‘(t)≠0) [Hammer]
… can approximate every tree automaton for arbitrary large inputs [Küchler]
... cannot approximate every f:{1}2*{0,1} (for realistic σ) [Hammer]
… fairly good results - 3:1
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 51
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 52
AIFB
Going on: towards more complex structures
More general trees:
arbitrary number of not positioned children
fenc=f(1/|ch| · ∑ch w · fenc(ch) · labeledge,label)
approximation completefor appropriate edge labels [Bianchini et al. 2005]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 53
AIFB
Going on: towards more complex structures
Planar graphs:
…
…
[Baldi,Frasconi,…,2002]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 54
AIFB
Going on: towards more complex structures
Acyclic graphs:
q-1
q-1
q-1
q+1
Contextual cascade correlation[Micheli,Sperduti,03]Approximation complete (under a mild structural restriction)even for structural transduction [Hammer,Micheli,Sperduti,05]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 55
AIFB
Going on: towards more complex structures
Cyclic graphs:
neighbor
[Micheli,05]
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 56
AIFB
Conclusions: recursive networks
• Very promising neural architectures for direct processing of tree structures
• Successful applications and mathematical background
• Connections to symbolic mechanisms (tree automata)
• Extensions to more complex structures (graphs) are under development
• Only few approaches which achieve structured outputs
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 57
AIFB
Tutorial Outline (Part I)Neural networks and structured knowledge
• Feedforward networks– The good old days: KBANN and co.
– Useful: neurofuzzy systems, data mining pipeline
– State of the art: structure kernels
• Recurrent networks– The basics: Partially recurrent networks
– Lots of theory: Principled capacity and limitations
– To do: Challenges
• Recursive data structures– The general idea: recursive distributed representations
– One breakthrough: recursive networks
– Going on: towards more complex structures
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 58
AIFB
Conclusions (Part I)
Overview literature:• FNN and rules: Duch,Setiono,Zurada,Computational intelligence methods
for understanding of data, Proc. of the IEEE 92(5):771- 805, 2004 • Structure kernels: Gärtner,Lloyd,Flach, Kernels and distances for structured
data, Machine Learning, 57, 2004. (new overview is forthcoming)• RNNs: Hammer,Steil, Perspectives on learning with recurrent networks, in:
Verleysen, ESANN'2002, D-side publications, 357-368, 2002• RNNs and rules: Jacobsson, Rule extraction from recurrent neural
networks: a taxonomy and review, Neural Computation, 17:1223-1263, 2005• Recursive representations: Hammer, Perspectives on Learning Symbolic
Data with Connectionistic Systems, in: Kühn, Menzel, Menzel, Ratsch, Richter, Stamatescu, Adaptivity and Learning, 141-160, Springer, 2003.
• Recursive networks: Frasconi,Gori,Sperduti, A General Framework for Adaptive Processing of Data Structures, IEEE Transactions on Neural Networks, 9(5):768-786,1998
• Neural networks and structures: Hammer,Jain, Neural methods for non-standard data, in: Verleysen, ESANN'2004, D-side publications, 281-292, 2004
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 59
AIFB
Conclusions (Part I)
• There exist networks which can directly deal with structures (sequences, trees, graphs) with good success: kernel machines, recurrent and recursive networks
• Efficient training algorithms and theoretical foundations exist
• (Loose) connections to symbolic processing have been established and indicate benefits
Now: towards strong connections PART II: Logic and neural networks
Hammer & Hitzler ● KI2005 Tutorial ● Koblenz, Germany ● September 2005
Slide 60
AIFB