Building on the behaviorism of Watson, B.F. Skinner (1904-1990) emphasized the importance of what happens after a response. Not S-R, but S-R-C (stimulus-response-consequence), He expanded Thorndike’s law of effect to an entire system of reinforcement. He is best known for his schedules of reinforcement, token economies, programmed learning and teaching pigeons to play table tennis.
Skinner’s approach was both inductive and atheoretical. He rejected statistical analyses and built a body of knowledge on replication. Using single subject designs (N=1), Skinner manipulated when a reward was received. Skinner believed that behavior is emitted, not elicited. Where classical conditioning held that a stimulus elicited a response, Skinner’s operant conditioning model maintained that behavior is emitted from the organism, a consequence occurs, and the organism adapts its behavior accordingly. His focus was not S-R (stimulus-response) but S-R-C (stimulus-response-consequence).
In fact, Skinner’s focus was broader than a single response. Rather than an individual behavior, an operant is a class of behavior. Consequently, in operant conditioning, rewards impact an entire class of behavior. You have an operant for the way you answer the phone. When you answer the phone, you might say your name, answer “hi” or give a statement of greeting (e.g., good morning). Phone answering is not a specific behavior as much as it is a class of behavior; an operant.
Consequences can be classified on two dimensions: give-take and good-bad. Giving is better described by the verb “to posit,” which means to place, affirm or put forward. Consequently, Skinner referred to the process of giving as positive, indicating the direction of action. Similarly, to take something away is to negate (invalidate, deny), and its direction is therefore referred to as negative. The good-bad dimension is reinforcement and punishment, respectively. To reinforce is to strengthen or increase, and punishment is to penalize.
From these two dimensions (positive-negative and reinforcement-punishment), Skinner identified 4 conditions: positive reinforcement, positive punishment, negative reinforcement and negative punishment. Rewards and punishments that are posited (given) are called positive reinforcement and positive punishment. Rewards and punishments that involve removing something are negative.
Notice that there is no suggestion that positive punishment is good. Positive punishment is a situation where you are given a punisher (stern look, electric shock, etc.). Negative punishment is also unpleasant; it is having something taken away (car keys, pay check, etc.). Similarly, Positive reinforcement is being given a reinforcer (food, praise, etc.) and negative reinforcement is having something taken away (credit card debt, angry frown, etc.).
According to Skinner, reinforcement (positive and negative) increase the likelihood of an operant reappearing. And punishment (positive and negative) decrease the likelihood an operant reappearing. The entire operant is affected. If you were given money for answering the phone, you would have received a positive reinforcer (given is positive and money is a reinforcer). And the entire class of phone answering behavior would be more likely to occur. Although it would not be possible to predict which way you answered the phone, operant conditioning would predict that your phone answering behaviors would likely increase.
Similarly, if you were positively punished for answering the phone (given an electric shock when you lifted the receiver), your phone answering operant (the entire class of behavior) is less likely to occur.
Positive reinforcement and positive punishment are easy to understand. Negative punishment is also familiar to children who have had their car keys taken away or been “grounded” by their parents. Taking away a reward is a negative punishment.
Surprisingly, much of human behavior can be explained by negative reinforcement. People exhibit long chains of behavior in order to escape punishment. When a police car appears in your rearview mirror or your boss heads toward your cubicle, you might well adjust in your chair, cough several times, shuffle your feet and swallow frequently. And when the police officer or boss turns away without even an acknowledgement of you, the sense of overwhelming relief is very rewarding.
When an instructor calls on you in class and you don’t know the answer, you might well begin a long string of behaviors that have been negatively reinforced. These behaviors need not be tied to situation, they might be purely superstitious, but they are likely to reoccur if you escape impending doom.
Notice that what is impending doom and what is rewarding is very personal. Some people might enjoy being called on in class and others may abhor it. Some children may hate being sent to their rooms and others may find it very rewarding. Although it is impossible to tell ahead of time what is individually rewarding, Skinner relied on a functional analysis of situation. Rewards are not inherent in objects; any object that functions as a reinforcer is a reinforcer. Complementing a child on cleaning their room can be positive punishment if as a result the child stops cleaning. Reinforcement is not in the intent but in the effect.
According to Skinner, rewards should be given deferentially. Parents should reward behaviors they want and ignore (extinguish) behaviors they don’t want. Giving attention to a child (such as when giving a punishment) actually rewards the child with your presence and sends a mixed message. Behavior can be shaped by rewarding successive approximations but practice without reinforcement doesn’t improve performance.
Skinner relied on operational definitions for his experiments. Instead of inferring internal states (such as hunger), he defined hunger in terms of the number of hours since having last eaten. Skinner insisted on clear definitions that are not open to interpretation. He did hypothesize drive, insight or any internal process. Although he didn’t deny their existence, he thought them to be unknowable. For Skinner, like Watson, if it didn’t impact behavior, whatever went on in the black box of the mind was unimportant.
Basing his findings on animal research (mostly rats and pigeons), Skinner identified five schedules of reinforcement: continuous reinforcement, fixed interval (FI), fixed ratio (FR), variable interval (VI) and variable ratio (VR). Continuous reinforcement is used to shape (refine) a behavior. Every time the subject performs the desired behavior, it is rewarded. Continuous reinforcement leads to quick learning and (after the reinforcement is stopped) quick descent.
FI describes the condition where a certain amount of time must past before a correct response is rewarded (e.g., getting paid every two weeks). FI produces a “scalloped” pattern (the closer it gets to pay day the more often the proper response is given).
Fixed Ratio requires a certain number of responses to be made before a behavior is rewarded (e.g., 10 widgets must be made before you are paid). In VI and VR schedules of reinforcement, the required amount of time or the number of responses varied. These partially reinforcement schedules (never quite sure when you’ll be rewarded) are quite resistant to extinction.
In an attempt to apply his research to practical problems, Skinner adapted his operant conditioning chamber (he hated the popular title of “Skinner box”) to child rearing. His “Baby Tender” crib was an air conditioned glass box which he used for his own daughter for two and a half years. Although commercially available, it was not a popular success. Another theoretically successful but practically unaccepted application of operant conditioning occurred during WWII. Skinner designed a missile guidance system using pigeons as “navigators.” Although his system was feasible, the Army rejected it out of hand. The PR problems of pigeon bombers must have been extensive.
Skinner’s also originated programmed instruction. Using a teaching machine (or books with small quizzes which lead to different material), small bits of information are presented in an ordered sequence. Each frame or bit of information must be learned before one is allowed to proceed to the next section. Proceeding to the next section is thought to be rewarding.
Born in Susquehanna, Pennsylvania, Skinner was an English major in college (Hamilton College) and then pursued psychology (at Harvard). He earned his PhD from Harvard in 1931 and then taught at the University of Minnesota (Minneapolis), Indiana University (Bloomington). He returned to Harvard in 1948 and remained until his retirement in 1974.