Petacat: Applying ideas from Copycat to image understanding.

85
Petacat: Applying ideas from Copycat to image understanding
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    222
  • download

    2

Transcript of Petacat: Applying ideas from Copycat to image understanding.

Petacat: Applying ideas from Copycat

to image understanding

How Streetscenes Works(Bileschi, 2006)

1. Densely tile the image withwindows of different sizes.

2. HMAX C2 features are computed in each window.

3. The features in eachwindow are given as inputto each of five trained support vector machines (“pedestrian”, “car”, “bicycle”, “building”, “tree”)

4. If any return a classification with score above a learned threshold, that object is said to be “detected” .

Object detection (here, “car”) with HMAX model (Bileschi, 2006)

Limitations of Streetscenes approach for “image understanding”

Limitations of Streetscenes approach for “image understanding”

• Exhaustive search – not scalable

• Does not recognize spatial and abstract relationships among objects for whole scene understanding

• Has no prior knowledge about object categories and their place in “conceptual space”

• HMAX model is completely feed-forward; no feedback to allow context to aid in scene understanding. – Where should feedback come in?

Person Dog

leash attached to

walking

actionaction

holds

Representation of High-Level Knowledge: A Simple Semantic Network (or “Ontology”)

“Dog walking”

But...

Person Dog

leash attached to

walking

actionaction

holds

Modified Ontology

Dog Group

running

“Dog walking”

Person Dog

leash attached to

walking

actionaction

holds

Modified Ontology

running

Allowing “conceptual slippage”

“Dog walking”

Dog Group

But...

Person

leash attached to

walking

actionaction

holds

“Dog walking”

Modified Ontology

running

Cat

Iguana

Dog

Dog Group

But...

But...

But...

But...

Person Dog

leash attached to

walking

actionaction

holds

Modified Ontology

running

Cat

Iguana

Bicycle

Car

Helicopter

“Dog walking”

Dog Group

But...

PersonDog

Leash

Outside

Ground

Walking

RunningStanding

Tree

Inside

Stick

Close to

Far from

Beach

Sidewalk

Attached to

Grass Lawn mower

Gasoline

Runway

Airplane

Helicopter

Above

Left of

Holding

Dog walking

Dog grooming

Car

Sky

ArmyTrack

Fanny pack

Backpack

Need dynamical process of constructing representation.

Need dynamical process of constructing representation.

Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.

Need dynamical process of constructing representation.

Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.

– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search

Need dynamical process of constructing representation.

Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.

– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search

– Prior, higher-level knowledge interacts with lower-level vision in both directions (bottom-up and top-down).

Need dynamical process of constructing representation.

Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.

– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search

– Prior, higher-level knowledge interacts with lower-level vision in both directions (bottom-up and top-down).

– Concepts are “fluid”, allowed to “slip” in certain contexts.

Need dynamical process of constructing representation.

Information gained during the unfolding of perception feeds back to guide the directions the perceptual process takes.

– Ongoing perception of “context” brings in appropriate concepts and conceptual slippages, and avoids exhaustive search

– Prior, higher-level knowledge interacts with lower-level vision in both directions (bottom-up and top-down).

– Concepts are “fluid”, allowed to “slip” in certain contexts.

• This allows perception of essential similarity in the face of superficial differences—i.e., analogy-making.

Active Symbol Architecture(Hofstadter et al., 1995)

Active Symbol Architecture(Hofstadter et al., 1995)

• Basis for – Copycat (analogy-making), Hofstadter & Mitchell

– Tabletop (anlaogy-making), Hofstadter & French

– Metacat (analogy-making and self-awareness),

Hofstadter & Marshall

and many others…

Semantic network

Temperature

Workspace

Active Symbol Architecture(Hofstadter et al., 1995)

Perceptual agents (codelets)

Petacat:

(Descendant of Copycat)

Integration of Active Symbol Architecture and HMAX

Initial task:

Decide if image is an instance of “taking a dog for a walk”, and if so, how good an instance it is.

taking a dog for a walk

outdoors

has location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leashsidewalk

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

is onSpatial

Relation

Semantic Network

Property links

Slip links

taking a dog for a walk

outdoors

has location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leashsidewalk

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

is onSpatial

Relation

Semantic Network

Property links

Slip links

taking a dog for a walk

outdoors

has location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leashsidewalk

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

is onSpatial

Relation

Semantic Network

Properties of nodes

Workspace

Semantic network

Workspace

Semantic network

Perceptual Agents (Codelets)

Codelets as active symbols

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

is on

Spatial Relation

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

horse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

is on

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

Spatial Relation

cat

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

is on

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

Spatial Relation

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

is on

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

Spatial Relation

Dog?

Illustration of what we plan to have happen – not a real run of Petacat

Dog? Dog?

Person?

Illustration of what we plan to have happen – not a real run of Petacat

Dog? Dog?

Sidewalk?

Person?

Illustration of what we plan to have happen – not a real run of Petacat

Dog? Dog?

Sidewalk?

Person?

Dog?

Outdoors?

Illustration of what we plan to have happen – not a real run of Petacat

Dog? Dog?

Sidewalk?

Person?

Dog?

Outdoors?

Scout codelets: Send C1 features in window to corresponding SVM.If positive result, post builder codelet with urgency equal to SVM’sconfidence.

Illustration of what we plan to have happen – not a real run of Petacat

Dog?negative Dog?

negative

Sidewalk?positive: 0.4

Person?negative

Outdoors?positive: 0.7

Scout codelets: Send C1 features in window to corresponding SVM.If positive result, post builder codelet with urgency equal to SVM’sconfidence.

Dog?positive: 0.8

Illustration of what we plan to have happen – not a real run of Petacat

Builder codelets: Ask HMAX to compute C2 features using prototypes specific to the object (or scene), and send them to corresponding SVM. If positive, decide to build structure with probability equal to SVM confidence. Break competing structures if necessary.

Dog?negative Dog?

negative

Sidewalk?positive: 0.4

Person?negative

Outdoors?positive: 0.7

Dog?positive: 0.8

Illustration of what we plan to have happen – not a real run of Petacat

Builder codelets: Ask HMAX to compute object-/scene-specific C2 features, and send them to corresponding SVM. If positive, decide to build structure with probability equal to SVM confidence. Break competing structures if necessary.

Outdoors

Dog

Illustration of what we plan to have happen – not a real run of Petacat

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

is on

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

Spatial Relation

Dog? Dog

Leash?

OutdoorsLeash?

Sidewalk?

Person?

Person?

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonStrength: 0.75

Outdoors

Sidewalk

PersonStrength: 0.6

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Sidewalk

Illustration of what we plan to have happen – not a real run of Petacat

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

is on

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

Spatial Relation

Dog

PersonOutdoors

Sidewalk

Leash?

Leash?

Dog?

Sidewalk?

Dog?

Rope?

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Sidewalk

Leash

Dog(weak)

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Sidewalk

Leash

Dog(weak)

Dog(strong)

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Sidewalk

Leash

Dog

Illustration of what we plan to have happen – not a real run of Petacat

taking a dog for a walkhas location

persondog

has action

is on

is touching

has component

aroad

abeach

trail

drives

runsflies

cathorse

swims

ropebelt

leash

string

walkswalks

is in front of

has location

has action

has component

has componenthas component

stands

is on

sits

is in front of

is touching

is behind

is next to

is on

agrass

is touching

Object

Action

indoors

sidewalk

outdoors

Spatial Relation

Dog

PersonOutdoors

Sidewalk

Leash

Dog

Once objects begin to be built, relation and grouping codelets can run on them.

is next to

is in front of

is next to

is in front of

Dog group

Illustration of what we plan to have happen – not a real run of Petacat

Once objects begin to be built, relation and grouping codelets can run on them.

Dog

PersonOutdoors

Sidewalk

Dog

is next to

is next to

Dog group

Leash

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Sidewalk

Dog

is next to

is next to

Dog group

is next to

Leash

Illustration of what we plan to have happen – not a real run of Petacat

How codelets decide where to look

System starts out with weak

segmentation (e.g., “normalized cuts”

algorithm)

How codelets decide where to look

System starts out with weak

segmentation (e.g., “normalized cuts”

algorithm)

System creates “heat maps” for location and

scale of objects in general

(at each pixel, probability of finding

an object at this location and at a

particular height/width of bounding

box.

++++

How codelets decide where to look

System starts out with weak

segmentation (e.g., “normalized cuts”

algorithm)

System creates “heat maps” for location and

scale of objects in general

(at each pixel, probability of finding

an object at this location and at a

particular height/width of bounding

box.

Object scout codelets choose location and

scale probabilisitically from these heat maps. +++

+

How codelets decide where to look

When codelets look for individual

object categories (e.g., dog), object-

specific heat maps are created

+

Dog

Person heat map

+

How codelets decide where to look

When codelets look for individual

object categories (e.g., dog), object-

specific heat maps are created

As codelets build structure, heat maps

are continually updated to reflect prior

(learned) expectations about location

and scale as a function of location and

scale of “built” objects (as well as

original weak segmentation). +

Dog

+

Person heat map

Person?Person?

How Petacat makes a final decision

Temperature

taking a dog for a walk

Dog

PersonOutdoorsLeash

Dog

is next to

is next to

Dog group Sidewalk

is next to

Illustration of what we plan to have happen – not a real run of Petacat

How Petacat makes a final decision

Temperature

taking a dog for a walk

Dog

PersonOutdoorsLeash

Dog

is next to

is next to

Dog group Sidewalk

“Situation” codelet is more likely to run when temperature is low.

is next to

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Leash

Dog

is next to

is next to

Dog group

is next to

Situation codelet tries to match prototypical situation with existing workspace structures, possibly allowing slippages. Sidewalk

Illustration of what we plan to have happen – not a real run of Petacat

Dog

PersonOutdoors

Leash

Dog

is next to

is next to

Dog group

Sidewalk

person

taking a dog for a walk

leash

dog

outdoors

is next to

has componenthas component

has component

has location

is in front of

Situation codelet tries to match prototypical situation with existing workspace structures, possibly allowing slippages.

Dog

PersonOutdoors

Leash

Dog

is next to

is next to

Dog group

person

taking a dog for a walk

leash

dog

outdoors

is next to

has componenthas component

has component

has location

is in front of

is next toDog group

Sidewalk

Dog

PersonOutdoors

Leash

Dog

is next to

is next to

Dog group

person

taking a dog for a walk

leash

dog

outdoors

is next to

has componenthas component

has component

has location

is in front of

is next toDog group

If resulting temperature is low enough, classify scene as positive

Sidewalk

Dog

PersonOutdoors

Leash

Dog

is next to

is next to

Dog group Sidewalk

person

taking a dog for a walk

leash

dog

outdoors

is next to

has componenthas component

has component

has location

is in front of

is next toDog group

If situation codelet fails enough times or does not run for a long time,program has increasing chance of ending with negative classification.

If resulting temperature is low enough, classify scene as positive

If Petacat classifies the picture as positive, the temperature at the end of the run gives a measure of how good an instance the picture is (e.g., of the “dog walking” situation).

Summary:

Summary: How does Petacat avoid exhaustive search?

Summary: How does Petacat avoid exhaustive search?

Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image

• C1, C2 features in windows

• Object categories (e.g., car, pedestrian, tree, etc.)

Summary: How does Petacat avoid exhaustive search?

Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image

In Petacat, codelets choose window size and location based on learned expectations and perceived context, with probabilities continually changing as more information is obtained

• C1, C2 features in windows

• Object categories (e.g., car, pedestrian, tree, etc.)

Summary: How does Petacat avoid exhaustive search?

Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image

In Petacat, codelets choose window size and location based on learned expectations and perceived context, with probabilities continually changing as more information is obtained

• C1, C2 features in windows

Codelets request C2 features only in “relevant” windows, and request only C2 features that are relevant to what the codelet is looking for.

• Object categories (e.g., car, pedestrian, tree, etc.)

Summary: How does Petacat avoid exhaustive search?

Recall Streetscenes system, which, given an image, does exhaustive search over:• Window size and location in the image

In Petacat, codelets choose window size and location based on learned expectations and perceived context, with probabilities continually changing as more information is obtained

• C1, C2 features in windows

Codelets request C2 features only in “relevant” windows, and request only C2 features that are relevant to what the codelet is looking for.

• Object categories (e.g., car, pedestrian, tree, etc.)

Codelets look for object categories that are activated by context, based on prior expectations and currently perceived information.

Summary: How does Petacat avoid exhaustive search?

• Petacat effects a parallel terraced scan (Hofstadter, 1995):

Codelets build structures at a rate (urgency) based on their perceived promise, which is continually updated as new information is perceived.

Temperature allows this (continually changing) rate to depend on the global state of the system.

Relation to neuroscience/psychophysics– Gilbert & Sigman (2007): Emphasis of role to top-down

processing in vision. • “V1 and V2 may work as ‘active blackboards’ that integrate

and sustain the result of computations performed in higher areas.

– Kahneman, Triesman, and Gibbs (1992): Notion of “object files”: temporary and modifiable perceptual structures, created on the fly in working memory, which interact with a permanent network of concepts.

– Churchland, Ramachandran, and Sejnowski: Theory of “interactive vision”

– Treisman and colleagues: Shift between parallel, random, “pre-attentive” bottom-up processing and more deterministic, focused, serial, “attentive” top-down processing.

Does Petacat understand pictures?

Does Petacat understand pictures?

Understanding (MM’s defintion):

- Ability to appropriately use one’s knowledge and make appropriate conceptual slippages in a wide variety of environments/contexts.

- Ability to use one’s existing concepts to learn new concepts