PRINCIPLES OF APPETITIVE CONDITIONING Chapter 6 1.

PRINCIPLES OF APPETITIVE CONDITIONING

Chapter 6

1

Early Contributors2

Thorndike’s Contribution Emphasized Laws of Behavior

Demonstrated trial by trial learning S-R learning

Skinner’s contribution Emphasized Contingency

A specified relationship between behavior and reinforcement in a given situation

The environment “sets” the contingencies S(R->O)

A “Faux” Distinction3

Instrumental conditioningA conditioning procedure in which the

environment constrains the opportunity for reward (discrete trial)

Operant conditioningWhen a specific response produces

reinforcement, and the frequency of the response determines the amount of reinforcement obtained (continuous responding, schedules of reinforcement)

Thorndikes’ Law of Effect

S-R associations are stamped in by reward (satisfiers)

Stimulus Response

4

S R

Thorndike: “What is learned?”

Reinforcement “stamps in” this connection

Habit Learning

5

S R O

Is that it?

?

?

Pavlovian Association

Instrumental Association

6

“O” matters7

The Importance of Past Experience Depression/Negative Contrast

The effect in which a shift from high to low reward magnitude produces a lower level of response than if the reward magnitude had always been low.

Elation/Positive Contrast The effect in which a shift from low to high

reward magnitude produces a greater level of responding than if the reward magnitude had always been high.

Negative and Positive Contrast8

Logic of Devaluation Experiment

Max

Min

R-O or Goal Directed: Controlled by the current value of the reinforced, and so it should be reduced to zero after devaluation.

S-R or Habit: Responding that is not controlled by the current value of the reward, and so it is insensitive to reinforcerdevaluation.

Res

pond

ing

Normal Devalued

9

R-O Association (aka the instrumental association)

Phase 1 Devaluation Test

Push LeftPellet Pellet+LiCl Right?

Push RightSucrose Sucrose+LiCl Left?

Num

ber

of P

ushe

s

Left Pushes

Right Pushes

DevaluedPellet

DevaluedSucrose

10

Summary of Devaluation

Neutered male rats lower but do not eliminate their responding previously associated with access to a “ripe” female rat.

Rats satiated for reward#1 preferentially lower responding to get reward#1 more than reward#2.

Goal devaluation effects tend to shrink with continued training and goal-directed responding is replaced by habit learning.

11

S-O Association (aka Pavlovian Association)

Stage 1

RightPellet TonePellet Tone: Left? Right?

LeftSucrose LightSucrose Light: Left? Right?

Num

ber

of

Pre

sses

Tone Light

Left

Right

TestStage 2

12

Skinner’s Contributions

Automatic Easy measurements

that can be compared across species

13

Three Terms Define the Contingency Three term contingency

Discriminative stimulus (S+ or S-) Operant (R) Consequence (O)

14

Operant Strengthened

Bite

Groom

Lick

Rear

Push Lever

Reinforcer

Light-OnSkinner Box

S+ R O

15

Techniques and Concepts

Shaping: Successive approximations Require closer and closer appoximations to

the target behaviour Secondary Reinforcers:

Stimuli accompanying reinforcer delivery Marking:

Feedback that a response had occurred

16

Shaping17

Shaping (or successive approximation procedure)Select a highly occurring operant behavior,

then slowly changing the contingency until the desired behavior is learned

Training a Rat to Bar Press18

Step 1: reinforce for eating out of the food dispenser

Step 2: reinforce for moving away from the food dispenser

Step 3: reinforce for moving in direction of bar

Step 4: reinforce for pressing bar

Appetive Reinforcers19

Primary reinforcerAn activity whose reinforcing properties are

innate Secondary reinforcer

An event that has developed its reinforcing properties through its association with primary reinforcers

Primary Reward Magnitude20

The Acquisition of an Instrumental or Operant ResponseThe greater the magnitude of the reward,

the faster the task is learnedThe differences in performance may reflect

motivational differences

Magnitude21

Primary Reward and Degraded Contingency

= bar press = food

Perfect contingency

Strong Responding

Degraded contingency

Weak Responding

22

Strength of Secondary Reinforcers

23

Several variables affect the strength of secondary reinforcersThe magnitude of the primary reinforcerThe greater the number of primary-secondary

pairings, the stronger the reinforcing power of the secondary reinforcer

The time elapsing between the presentation of the secondary reinforcer and the primary reinforcer affects the strength of the secondary reinforcer

Primary-Secondary Pairings24

Schedules of Reinforcement25

Schedules of reinforcementA contingency that specifies how often or

when we must act to receive reinforcement

Schedules of Reinforcement

Fixed Ratio Reinforcement is

given after a given number of responses

Short pauses

Variable Ratio After a varying

number of responses

26

Schedules of Reinforcement

Fixed Interval First response after a

given interval is rewarded

FI Scallop

Variable Interval Like FI but varies with

a given average Scallop disappears

27

Fixed Interval Schedule28

Fixed interval scheduleReinforcement is available only after a

specified period of time and the first response emitted after the interval has elapsed is reinforced

Scallop effectExperience - the ability to withhold the

response until close to the end of the interval increases with experience

The pause is longer with longer FI schedules

Variable Interval Schedules29

Variable interval scheduleAn average interval of time between available

reinforcers, but the interval varies from one reinforcement to the next contingency

Characterized by steady rates of respondingThe longer the interval, the lower the response

rateScallop effect does not occur on VI schedulesEncourages S-R habit learning

Some Other Schedules

DRL, Differential reinforcement for low rates of responding

DRH, Differential reinforcement for high rates of responding

DR0, Different reinforcement of anything but the target behavior

30

Compound Schedules31

Compound scheduleA complex contingency where two or more

schedules of reinforcement are combined

$5 today $50 waitVI-30 VI-60

Schedule this….

Concurrent schedules permit the subject Concurrent schedules permit the subject to alternate between different to alternate between different schedules; or to repeatedly choose schedules; or to repeatedly choose between working on different schedules between working on different schedules A B

32

Matching Law

B1/(B1+B2) = R1/(R1+R2)B1/(B1+B2) = R1/(R1+R2) B stands for numbers of a certain behaviorB stands for numbers of a certain behavior R stands for numbers of a reinforcers R stands for numbers of a reinforcers

earnedearned

33

Sniffy the Rat

Schedule Behavior B1/(B1+B2)

R1/(R1+R2)

“1” vs “2”

VI-30 vs VI-10 25% 25%

VI-10 vs VI-30 75% 75%

VI-10 vs VI-50 83.3% 83.3%

VI-50 vs VI-10 16.7% 16.7%

VI-30 vs VI-30 50% 50%

VI-10 vs VI-10 50% 50%

34

Typical Result35

Deviations From Matching

BiasBias represents a preference for represents a preference for responding on one response more than responding on one response more than the other that has nothing to do with the the other that has nothing to do with the schedules programmedschedules programmed one pigeon key requires more force to close one pigeon key requires more force to close

its contact than the other, so that the its contact than the other, so that the pigeon has to peck harderpigeon has to peck harder

one food hopper delivers food more quickly one food hopper delivers food more quickly than another than another

36

Sensitivity

OvermatchingOvermatching -- the relative rate of -- the relative rate of responding is responding is more extrememore extreme than than predicted by matching. The subject predicted by matching. The subject appears to be “too sensitive" to the appears to be “too sensitive" to the schedule differences.schedule differences.

UndermatchingUndermatching -- the relative rate of -- the relative rate of responding on a key is responding on a key is less extremeless extreme than predicted by matching. The subject than predicted by matching. The subject appears to by “insensitive" to the appears to by “insensitive" to the schedule differences.schedule differences.

37

Overmatching

Poor Self-Control

small

LARGE

A B

Direct Choice(Concurrent Schedule)

39

Self-Control and Overmatching

Concurrent ChoiceConcurrent Choice Human and nonhumans often chose a Human and nonhumans often chose a

immediate small reward over a larger immediate small reward over a larger delayed reward (delayed rewards are delayed reward (delayed rewards are “discounted”) “discounted”)

40

Another Example of Impulsivity

“Free” reinforcers given every 20s

Lever press advances delivery of the first pellet, and deletes the second pellet

So, if you press at 2 seconds, you get a pellet immediately, but you get no other pellets until the 60 second pellet is available.

20s 40s 60s

41

Delay of Reinforcement

Delayed reinforcers Delayed reinforcers are steeply are steeply discounteddiscounted

Loss of self-control Loss of self-control and impulsivityand impulsivity

0

10

20

30

40

50

60

70

80

90

100

-9 -6 -3 0

smallimmediate

largedelayed

Rei

nfo

rcer

Po

ten

cy

Delay

42

small

A

LARGE

B

A B

Concurrent Chain(Pre-committment)

43

Behavioral Methods for Self Control

Pre-commitmentPre-commitmentSelf-Exclusion Self-Exclusion ContractsContracts

DistractionDistraction ModelingModeling Shaping WaitingShaping Waiting

Reduce delay for Reduce delay for smallsmall

Increase delay for Increase delay for largelarge

44

The Discontinuance of Reinforcement

45

ExtinctionThe elimination or suppression of a

response caused by the discontinuation of reinforcement or the removal of the unconditioned stimulus

When reinforcement is first discontinued, the rate of responding remains highUnder some conditions, it even increases

Stronger Learning ≠ Slower Extinction Partial Reinforcement Extinction Effect or

PREE

Extinction Paradox

46

Importance of Consistency of Reward

47

Extinction is slower following partial rather than continuous reinforcement

Partial reinforcement extinction effect (PREE): the greater resistance to extinction of an instrumental or operant response following intermittent rather than continuous reinforcement during acquisitionOne of the most reliable phenomena in

psychology

Acquisition with Differing Percentages

Spee

d

Day

100%

80/50/30%

48

Extinction with Differing Percentages

Spee

d

Day

80% 50% 30%

100%

49

Explanations

Mowrer-Bitterman Discrimination Hypothesis

Amsel’s Frustration Theory (Emotional) Capaldi’s Sequential Theory (Cognitive)

50

Theios Experiment (not just discrimination)

PHASE 1 PHASE 2 EXT

G1 100% 0%

G2 100% 100% 0%

G3 50% 100% 0%

G4 50% - 0%

51

Extinction Trials

Spee

d

G1, G2 100%

PHASE 1 PHASE 2 EXT

G1 100% 0%

G2 100% 100% 0%

G3 50% 100% 0%

G4 50% - 0%

G3, G4 50%

52

Amsel’s Frustration Theory53

Amsel’s Frustration Theory

100% Reinforcement Group

54


50% Reinforcement Group

55

Amsel (Percentage Reinforcement)

Extinction Trials

Spee

d

100% 50%

56


EXT

BETWEEN SUBJECT

GROUP 1 T F 100%

T-

GROUP 2 N F 50%

N-

WITHIN SUBJECT

TRIALS 1,3,6….

TF 100%

T-

TRIALS2,4,5….

NF 50%

N-

PREE

ReversedPREE

57

Influence of Reward Magnitude58

The influence of reward magnitude on resistance is dependent upon the amount of acquisition training.

With extended acquisition, a small consistent reward may produce more resistance to extinction than a large reward (absence more frustrating).

Reward Magnitude and Percentage

59

Sequential Theory60

Sequential theory If reward follows nonreward, the animal will

associate the memory of the nonrewarded experience with the operant or instrumental response

During extinction, the only memory present after the first nonrewarded experience is that of the nonrewarded experience

61

Animals receiving continuous reward do not experience nonrewarded responses and so they do not associate nonrewarded responses with later reward

Thus, the memory of receiving a reward after persistence in the face of nonreward becomes a cue for continued responding # of N-R transitions N length Variability of N-length

62

What is the significance of PRE? It encourages organisms to persist even

though every behavior is not reinforced In the natural environment, not every attempt

to attain a desired goal is successful PRE is adaptive because it motivates animals

not to give up too easily

PRINCIPLES OF APPETITIVE CONDITIONING Chapter 6 1.

Documents

Transcript of PRINCIPLES OF APPETITIVE CONDITIONING Chapter 6 1.