The Process by Which a Stimulus Increases the Chances of a Preceding Behavior Occurring Again
Chapter 8. Learning
8.2 Changing Behaviour through Reinforcement and Punishment: Operant Conditioning
Learning Objectives
- Outline the principles of operant conditioning.
- Explain how learning tin be shaped through the apply of reinforcement schedules and secondary reinforcers.
In classical conditioning the organism learns to associate new stimuli with natural biological responses such as salivation or fear. The organism does not learn something new simply rather begins to perform an existing behaviour in the presence of a new signal. Operant conditioning, on the other manus, is learning that occurs based on the consequences of behaviour and can involve the learning of new actions. Operant conditioning occurs when a dog rolls over on command because information technology has been praised for doing so in the past, when a schoolroom slap-up threatens his classmates because doing so allows him to go his way, and when a child gets practiced grades because her parents threaten to punish her if she doesn't. In operant conditioning the organism learns from the consequences of its own actions.
How Reinforcement and Punishment Influence Behaviour: The Enquiry of Thorndike and Skinner
Psychologist Edward L. Thorndike (1874-1949) was the beginning scientist to systematically study operant conditioning. In his research Thorndike (1898) observed cats who had been placed in a "puzzle box" from which they tried to escape ("Video Clip: Thorndike'southward Puzzle Box"). At first the cats scratched, bit, and swatted haphazardly, without any idea of how to go out. Merely eventually, and accidentally, they pressed the lever that opened the door and exited to their prize, a scrap of fish. The next fourth dimension the cat was constrained within the box, it attempted fewer of the ineffective responses before conveying out the successful escape, and after several trials the cat learned to virtually immediately make the correct response.
Observing these changes in the cats' behaviour led Thorndike to develop his constabulary of effect, the principle that responses that create a typically pleasant outcome in a particular situation are more likely to occur again in a similar state of affairs, whereas responses that produce a typically unpleasant result are less probable to occur once again in the situation (Thorndike, 1911). The essence of the law of result is that successful responses, because they are pleasurable, are "stamped in" by feel and thus occur more than ofttimes. Unsuccessful responses, which produce unpleasant experiences, are "stamped out" and subsequently occur less oftentimes.
When Thorndike placed his cats in a puzzle box, he found that they learned to engage in the important escape behaviour faster later each trial. Thorndike described the learning that follows reinforcement in terms of the constabulary of effect.
Watch: "Thorndike's Puzzle Box" [YouTube]: http://world wide web.youtube.com/watch?five=BDujDOLre-eight
The influential behavioural psychologist B. F. Skinner (1904-1990) expanded on Thorndike's ideas to develop a more complete set of principles to explain operant conditioning. Skinner created specially designed environments known as operant chambers (usually called Skinner boxes) to systematically written report learning. A Skinner box (operant sleeping accommodation) is a structure that is big enough to fit a rodent or bird and that contains a bar or fundamental that the organism can press or peck to release food or water. It as well contains a device to record the animal's responses (Figure 8.5).
The most basic of Skinner'due south experiments was quite similar to Thorndike's research with cats. A rat placed in the chamber reacted as i might expect, scurrying almost the box and sniffing and clawing at the floor and walls. Eventually the rat chanced upon a lever, which information technology pressed to release pellets of food. The adjacent fourth dimension effectually, the rat took a niggling less time to press the lever, and on successive trials, the time it took to press the lever became shorter and shorter. Presently the rat was pressing the lever as fast as it could consume the food that appeared. As predicted by the law of effect, the rat had learned to repeat the activity that brought almost the food and cease the deportment that did not.
Skinner studied, in item, how animals changed their behaviour through reinforcement and penalty, and he developed terms that explained the processes of operant learning (Table eight.ane, "How Positive and Negative Reinforcement and Punishment Influence Behaviour"). Skinner used the term reinforcerto refer to any consequence that strengthens or increases the likelihood of a behaviour, and the term punisher to refer to any event that weakens or decreases the likelihood of a behaviour. And he used the terms positive and negative to refer to whether a reinforcement was presented or removed, respectively. Thus, positive reinforcement strengthens a response past presenting something pleasant afterward the response, and negative reinforcement strengthens a response past reducing or removing something unpleasant. For case, giving a child praise for completing his homework represents positive reinforcement, whereas taking Aspirin to reduce the hurting of a headache represents negative reinforcement. In both cases, the reinforcement makes it more likely that behaviour will occur again in the hereafter.
| [Skip Table] | |||
| Operant workout term | Description | Upshot | Case |
|---|---|---|---|
| Positive reinforcement | Add together or increase a pleasant stimulus | Behaviour is strengthened | Giving a student a prize afterwards he or she gets an A on a test |
| Negative reinforcement | Reduce or remove an unpleasant stimulus | Behaviour is strengthened | Taking painkillers that eliminate pain increases the likelihood that you will accept painkillers again |
| Positive punishment | Present or add an unpleasant stimulus | Behaviour is weakened | Giving a pupil actress homework after he or she misbehaves in class |
| Negative penalization | Reduce or remove a pleasant stimulus | Behaviour is weakened | Taking away a teen's computer later he or she misses curfew |
Reinforcement, either positive or negative, works by increasing the likelihood of a behaviour. Punishment, on the other mitt, refers to any effect that weakens or reduces the likelihood of a behaviour. Positive punishmentweakens a response by presenting something unpleasant after the response, whereas negative punishmentweakens a response by reducing or removing something pleasant. A child who is grounded after fighting with a sibling (positive penalization) or who loses out on the opportunity to go to recess after getting a poor grade (negative punishment) is less likely to echo these behaviours.
Although the distinction between reinforcement (which increases behaviour) and penalisation (which decreases it) is usually articulate, in some cases it is hard to determine whether a reinforcer is positive or negative. On a hot twenty-four hour period a absurd breeze could be seen as a positive reinforcer (considering information technology brings in absurd air) or a negative reinforcer (because it removes hot air). In other cases, reinforcement tin can be both positive and negative. One may smoke a cigarette both because it brings pleasure (positive reinforcement) and because it eliminates the craving for nicotine (negative reinforcement).
It is also of import to note that reinforcement and punishment are not simply opposites. The use of positive reinforcement in changing behaviour is almost always more effective than using punishment. This is because positive reinforcement makes the person or brute feel improve, helping create a positive relationship with the person providing the reinforcement. Types of positive reinforcement that are effective in everyday life include verbal praise or approval, the awarding of condition or prestige, and directly financial payment. Punishment, on the other mitt, is more than likely to create but temporary changes in behaviour because it is based on coercion and typically creates a negative and adversarial relationship with the person providing the reinforcement. When the person who provides the punishment leaves the state of affairs, the unwanted behaviour is likely to return.
Creating Complex Behaviours through Operant Conditioning
Peradventure y'all think watching a moving picture or existence at a show in which an animal — maybe a domestic dog, a horse, or a dolphin — did some pretty amazing things. The trainer gave a command and the dolphin swam to the bottom of the puddle, picked upward a ring on its nose, jumped out of the h2o through a hoop in the air, dived again to the bottom of the puddle, picked upward some other band, and then took both of the rings to the trainer at the edge of the pool. The animal was trained to do the trick, and the principles of operant conditioning were used to train it. Merely these circuitous behaviours are a far weep from the simple stimulus-response relationships that we have considered thus far. How can reinforcement be used to create complex behaviours such as these?
One manner to aggrandize the use of operant learning is to modify the schedule on which the reinforcement is applied. To this point we have but discussed a continuous reinforcement schedule, in which the desired response is reinforced every fourth dimension information technology occurs; whenever the domestic dog rolls over, for instance, it gets a biscuit. Continuous reinforcement results in relatively fast learning but also rapid extinction of the desired behaviour one time the reinforcer disappears. The problem is that because the organism is used to receiving the reinforcement after every behaviour, the responder may give up speedily when it doesn't appear.
Most real-world reinforcers are not continuous; they occur on a fractional (or intermittent) reinforcement schedule — a schedule in which the responses are sometimes reinforced and sometimes not. In comparison to continuous reinforcement, fractional reinforcement schedules atomic number 82 to slower initial learning, merely they also lead to greater resistance to extinction. Because the reinforcement does non appear after every behaviour, information technology takes longer for the learner to determine that the reward is no longer coming, and thus extinction is slower. The iv types of partial reinforcement schedules are summarized in Table 8.ii, "Reinforcement Schedules."
| [Skip Table] | ||
| Reinforcement schedule | Caption | Real-world example |
|---|---|---|
| Fixed-ratio | Behaviour is reinforced later on a specific number of responses. | Mill workers who are paid co-ordinate to the number of products they produce |
| Variable-ratio | Behaviour is reinforced after an average, but unpredictable, number of responses. | Payoffs from slot machines and other games of adventure |
| Fixed-interval | Behaviour is reinforced for the first response after a specific amount of time has passed. | People who earn a monthly salary |
| Variable-interval | Behaviour is reinforced for the kickoff response after an average, but unpredictable, corporeality of time has passed. | Person who checks email for messages |
Partial reinforcement schedules are determined past whether the reinforcement is presented on the ground of the time that elapses between reinforcement (interval) or on the ground of the number of responses that the organism engages in (ratio), and past whether the reinforcement occurs on a regular (fixed) or unpredictable (variable) schedule. In a fixed-interval schedule, reinforcement occurs for the first response made after a specific amount of fourth dimension has passed. For case, on a 1-minute fixed-interval schedule the animal receives a reinforcement every infinitesimal, assuming information technology engages in the behaviour at least once during the minute. Every bit you lot can run into in Figure 8.half dozen, "Examples of Response Patterns past Animals Trained nether Dissimilar Partial Reinforcement Schedules," animals under fixed-interval schedules tend to tedious down their responding immediately after the reinforcement but then increase the behaviour once more as the fourth dimension of the next reinforcement gets closer. (Nearly students report for exams the same fashion.) In a variable-interval schedule, the reinforcers appear on an interval schedule, but the timing is varied around the average interval, making the bodily advent of the reinforcer unpredictable. An example might be checking your e-mail: you are reinforced by receiving letters that come up, on average, say, every thirty minutes, simply the reinforcement occurs only at random times. Interval reinforcement schedules tend to produce tedious and steady rates of responding.
In a fixed-ratio schedule, a behaviour is reinforced afterwards a specific number of responses. For instance, a rat's behaviour may be reinforced later information technology has pressed a key 20 times, or a salesperson may receive a bonus after he or she has sold 10 products. Every bit you can encounter in Effigy viii.6, "Examples of Response Patterns by Animals Trained nether Different Fractional Reinforcement Schedules," once the organism has learned to deed in accord with the fixed-ratio schedule, it volition intermission only briefly when reinforcement occurs before returning to a high level of responsiveness. A variable-ratio scheduleprovides reinforcers afterward a specific but average number of responses. Winning money from slot machines or on a lottery ticket is an case of reinforcement that occurs on a variable-ratio schedule. For instance, a slot motorcar (see Figure 8.vii, "Slot Motorcar") may be programmed to provide a win every xx times the user pulls the handle, on average. Ratio schedules tend to produce high rates of responding considering reinforcement increases as the number of responses increases.
Complex behaviours are as well created through shaping, the process of guiding an organism's behaviour to the desired outcome through the utilise of successive approximation to a final desired behaviour. Skinner made extensive utilise of this procedure in his boxes. For instance, he could train a rat to press a bar two times to receive nutrient, by first providing nutrient when the beast moved nearly the bar. When that behaviour had been learned, Skinner would begin to provide food merely when the rat touched the bar. Further shaping limited the reinforcement to just when the rat pressed the bar, to when it pressed the bar and touched it a 2d time, and finally to just when it pressed the bar twice. Although it can take a long fourth dimension, in this way operant conditioning can create chains of behaviours that are reinforced only when they are completed.
Reinforcing animals if they correctly discriminate betwixt similar stimuli allows scientists to examination the animals' power to learn, and the discriminations that they tin make are sometimes remarkable. Pigeons have been trained to distinguish between images of Charlie Dark-brown and the other Peanuts characters (Cerella, 1980), and between different styles of music and art (Porter & Neuringer, 1984; Watanabe, Sakamoto & Wakita, 1995).
Behaviours can too be trained through the use of secondary reinforcers. Whereas a primary reinforcer includes stimuli that are naturally preferred or enjoyed by the organism, such as food, water, and relief from hurting, a secondary reinforcer (sometimes chosen conditioned reinforcer) is a neutral event that has go associated with a principal reinforcer through classical conditioning. An example of a secondary reinforcer would be the whistle given by an animal trainer, which has been associated over fourth dimension with the master reinforcer, food. An example of an everyday secondary reinforcer is money. We enjoy having money, not so much for the stimulus itself, but rather for the primary reinforcers (the things that money can purchase) with which it is associated.
Fundamental Takeaways
- Edward Thorndike adult the law of event: the principle that responses that create a typically pleasant upshot in a particular situation are more probable to occur again in a like situation, whereas responses that produce a typically unpleasant event are less probable to occur again in the situation.
- B. F. Skinner expanded on Thorndike's ideas to develop a set of principles to explain operant workout.
- Positive reinforcement strengthens a response by presenting something that is typically pleasant after the response, whereas negative reinforcement strengthens a response past reducing or removing something that is typically unpleasant.
- Positive punishment weakens a response by presenting something typically unpleasant later the response, whereas negative punishment weakens a response by reducing or removing something that is typically pleasant.
- Reinforcement may be either fractional or continuous. Fractional reinforcement schedules are determined past whether the reinforcement is presented on the basis of the time that elapses between reinforcements (interval) or on the basis of the number of responses that the organism engages in (ratio), and past whether the reinforcement occurs on a regular (stock-still) or unpredictable (variable) schedule.
- Complex behaviours may exist created through shaping, the process of guiding an organism'southward behaviour to the desired outcome through the use of successive approximation to a final desired behaviour.
Exercises and Critical Thinking
- Give an example from daily life of each of the following: positive reinforcement, negative reinforcement, positive penalization, negative penalisation.
- Consider the reinforcement techniques that you might use to train a dog to grab and call back a Frisbee that you throw to it.
- Watch the following two videos from current television shows. Can you determine which learning procedures are being demonstrated?
- The Office: http://www.break.com/usercontent/2009/eleven/the-part-altoid- experiment-1499823
- The Big Bang Theory [YouTube]: http://www.youtube.com/scout?five=JA96Fba-WHk
References
Cerella, J. (1980). The pigeon'due south analysis of pictures.Pattern Recognition, 12, ane–six.
Kassin, Southward. (2003). Essentials of psychology. Upper Saddle River, NJ: Prentice Hall. Retrieved from Essentials of Psychology Prentice Hall Companion Website: http://wps.prenhall.com/hss_kassin_essentials_1/15/3933/1006917.cw/alphabetize.html
Porter, D., & Neuringer, A. (1984). Music discriminations by pigeons.Journal of Experimental Psychology: Animal Beliefs Processes, 10(ii), 138–148.
Thorndike, E. L. (1898).Animal intelligence: An experimental report of the associative processes in animals. Washington, DC: American Psychological Association.
Thorndike, E. L. (1911).Animate being intelligence: Experimental studies. New York, NY: Macmillan. Retrieved from http://www.annal.org/details/animalintelligen00thor
Watanabe, S., Sakamoto, J., & Wakita, M. (1995). Pigeons' discrimination of painting by Monet and Picasso.Journal of the Experimental Analysis of Behaviour, 63(2), 165–174.
Prototype Attributions
Figure eight.5: "Skinner box" (http://en.wikipedia.org/wiki/File:Skinner_box_photo_02.jpg) is licensed under the CC By SA 3.0 license (http://creativecommons.org/licenses/past-sa/3.0/deed.en). "Skinner box scheme" by Andreas1 (http://en.wikipedia.org/wiki/File:Skinner_box_scheme_01.png) is licensed under the CC BY SA iii.0 license (http://creativecommons.org/licenses/by-sa/iii.0/act.en)
Effigy eight.half dozen: Adapted from Kassin (2003).
Effigy viii.7: "Slot Machines in the Hard Stone Casino" by Ted Murpy (http://commons.wikimedia.org/wiki/File:HardRockCasinoSlotMachines.jpg) is licensed nether CC By 2.0. (http://creativecommons.org/licenses/by/2.0/deed.en).
Source: https://opentextbc.ca/introductiontopsychology/chapter/7-2-changing-behavior-through-reinforcement-and-punishment-operant-conditioning/
0 Response to "The Process by Which a Stimulus Increases the Chances of a Preceding Behavior Occurring Again"
ارسال یک نظر