Home

Random Variables and Probability Distributions

image

Contents

1. X are random variables and aj a a are numerical constants the random variable y defined as YW Choy ar Gipsy ar 909 ar ho is a linear combination of the x s For example y 10x 5x 8x is a linear combination of x x and x with a 10 a 5 and a 8 It is easy to compute the mean of a linear combination of x if the individual means u4 Mo Wn are known The variance and standard de viation of a linear combination of the x are also easily computed if the x are inde pendent Two random variables x and x are independent if any event defined solely by x is independent of any event defined solely by x When the x are not independent computation of the variance and standard deviation of a linear combination of the x is more complicated this case is not considered here Example 7 14 Freeway Traffic Three different roads feed into a particular freeway entrance Suppose that during a fixed time period the number of cars coming from each road onto the freeway is a random variable with mean values as follows Road 1 2 3 Mean 800 1000 600 382 Chapter 7 a Random Variables and Probability Distributions Example 7 15 Example 7 16 With x representing the number of cars entering from road i we can define y x xX x3 the total number of cars entering the freeway The mean value of y is My My x5 x py F My F My 800 1000 600 2400 Combining Exam Subscores
2. d Find a number c such that 95 of all cars of this model have efficiencies exceeding c i e P x gt c 95 7 109 The amount of time spent by a statistical consultant with a client at their first meeting is a random variable having a normal distribution with a mean value of 60 min and a standard deviation of 10 min a What is the probability that more than 45 min is spent at the first meeting b What amount of time is exceeded by only 10 of all clients at a first meeting c If the consultant assesses a fixed charge of 10 for overhead and then charges 50 per hour what is the mean revenue from a client s first meeting 7 110 The lifetime of a certain brand of battery is nor mally distributed with a mean value of 6 hr and a standard deviation of 0 8 hr when it is used in a particular cassette player Suppose that two new batteries are independently selected and put into the player The player ceases to func tion as soon as one of the batteries fails a What is the probability that the player functions for at least 4 hr Bold exercises answered in back Data set available online but not required b What is the probability that the cassette player works for at most 7 hr c Find a number z such that only 5 of all cassette players will function without battery replacement for more than z hr 7 111 A machine producing vitamin E capsules operates so that the actual amount of vitamin E in each capsule is norm
3. mw Binomial Distributions 22 2 0 eee ceeccecccecccecccucccueccuceccceatcecceaeceuceceeeaneeteeeaes Suppose that we decide to record the gender of each of the next 25 newborn children at a particular hospital What is the chance that at least 15 are female What is the chance that between 10 and 15 are female How many among the 25 can we expect to be female These and other similar questions can be answered by studying the bi nomial probability distribution This distribution arises when the experiment of inter est is a binomial experiment that is an experiment having the characteristics listed in the following box Properties of a Binomial Experiment A binomial experiment consists of a sequence of trials with the following conditions Il ue There are a fixed number of observations called trials Each trial can result in one of only two mutually exclusive outcomes labeled success and failure F Outcomes of different trials are independent The probability that a trial results in S is the same for each trial 7 5 a Binomial and Geometric Distributions 387 The binomial random variable x is defined as x number of successes observed when a binomial experiment is performed The probability distribution of x is called the binomial probability distribution The term success here does not necessarily have any of its usual connotations Which of the two possible outcomes is labeled success is determined by the ran
4. 430 Chapter 7 a Random Variables and Probability Distributions Summary of Key Terms and Concepts Term or Formula Random variable discrete or continuous Probability distribution p x of a discrete random variable x Probability distribution of a continuous random variable x W and o Hs xphx ok D x m pla Vo Binomial probability distribution a n x Go ee fy no go Vnan l a Normal distribution Standard normal distribution z critical value Comment A numerical variable with a value determined by the outcome of a chance experiment It is discrete if its pos sible values are isolated points along the number line and continuous if its possible values form an entire inter val on the number line A formula table or graph that gives the probability associated with each x value Conditions on p x are 1 p 0 and 2 p x 1 where the sum is over all possible x values Specified by a smooth density curve for which the total area under the curve is 1 The probability P a lt x lt b is the area under the curve and above the interval from a to b this is also Pia x b The mean and standard deviation respectively of a ran dom variable x These quantities describe the center and extent of spread about the center of the variable s proba bility distribution The mean value of a discrete random variable x it lo cates the center of the variable s p
5. 6 4 4 The probability that three or fewer students must be stopped is P x 3 p 1 p2 p8 6 4 6 4 6 4 4 24 144 784 I Exercises 7 45 7 63 oo 7 45 Consider two binomial experiments a The first binomial experiment consists of six trials How many outcomes have exactly one success and what are these outcomes b The second binomial experiment consists of 20 trials How many outcomes have exactly 10 successes exactly 15 successes exactly 5 successes 7 46 Suppose that in a certain metropolitan area 9 out of 10 households have a VCR Let x denote the number among four randomly selected households that have a VCR so x is a binomial random variable with n 4 and 7 9 a Calculate p 2 P x 2 and interpret this probability b Calculate p 4 the probability that all four selected households have a VCR c Determine P x S 3 7 47 Ww The Los Angeles Times December 13 1992 re ported that what airline passengers like to do most on long flights is rest or sleep in a survey of 3697 passengers al most 80 did so Suppose that for a particular route the actual percentage is exactly 80 and consider randomly selecting six passengers Then x the number among the selected six who rested or slept is a binomial random variable with n 6 and 7 8 a Calculate p 4 and interpret this probability b Calculate p 6 the probability that all six selected pas sengers reste
6. Suppose that it is known that the variable x weight of baggage checked by a 7 4 a Mean and Standard Deviation of a Random Variable 383 randomly selected passenger has a mean and standard deviation of 42 and 16 re spectively Consider a flight on which 10 passengers all traveling alone are flying If we use x to denote the baggage weight for passenger i for i ranging from 1 to 10 the total weight of checked baggage y is then y x tx te xo Note that y is a linear combination of the x The mean value of y is My My F My FF Mro 42 42 42 420 Since the ten passengers are all traveling alone it is reasonable to think that the ten baggage weights are unrelated and that the x are independent This would not be a reasonable assumption if the 10 passengers were not traveling alone Then the vari ance of y is o o o oy w 16 16 16 2560 and the standard deviation of y is Exercises 7 27 7 44 0 7 27 An express mail service charges a special rate for any package that weighs less than 1 Ib Let x denote the weight of a randomly selected parcel that qualifies for this special rate The probability distribution of x is specified by the following density curve Density Density 0 5 x 0 1 Use the fact that area of a trapezoid base average of two side lengths to answer each of the following questions a What is the probability that a randomly selected pack age of this typ
7. 0475 Figure 7 30 P x gt 110 Nomad ewei and corresponding z curve u 100 0 6 area for the height problem of Example 7 25 n Shaded area z curve 0475 Kk 100 110 0 1 67 Example 7 26 IQ Scores Although there is some controversy regarding the appropriateness of IQ scores as a measure of intelligence IQ scores are commonly used for a variety of purposes One commonly used IQ scale has a mean of 100 and a standard deviation of 15 and IQ scores are approximately normally distributed IQ score is actually a discrete variable because it is based on the number of correct responses on a test but its population distribution closely resembles a normal curve If we define the random variable x IQ score of a randomly selected individual then x has approximately a normal distribution with u 100 and 15 One way to become eligible for membership in Mensa an organization purport edly for those of high intelligence is to have a Stanford Binet IQ score above 130 What proportion of the population would qualify for Mensa membership An answer to this question requires evaluating P x gt 130 This probability is shown in Fig ure 7 31 With a 130 _4 amp _ 130 100 E 2 00 j o 15 So see Figure 7 32 P x gt 130 P z gt 2 00 z curve area to the right of 2 00 1 z curve area to the left of 2 00 1 9772 0228 Only 2 28 of the population would qualify for Mensa members
8. Do we mean to include 4 and 7 or do we mean only the probability of get ting a 5 or 6 In this case we wish to be inclusive To evaluate the probability desired we will use the cumulative distribution function for the binomial binomedf Your calculator function may have a different name The logic is elementary as Sherlock Holmes would say The probability that the binomial random variable will assume a value between 4 and 7 inclusive is equal to the probability of assuming a value less than or equal to 7 minus the probability of assuming a value less than or equal to 3 not 4 In symbols p4 x lt 7 px 7 px 3 which is found as follows binomcdf 12 6 7 binomcdf 12 6 3 gives us 0 5465545 Our last binomial random variable calculation problem is finding the probability of a value greater than a given value What for instance is the probability of more than 7 monitors having flat panel displays Using a fundamental property of probability we know that PU lt x pxs7 1 and thus pT lt x 1 p s7 which we translate into 1 binomcdf 12 6 7 giving us 0 438178222 We have gone into some detail to explain how these binomial probability prob lems can be solved using the calculator This detail is justified not only because of the importance of the binomial distribution but also because these same calculator proce dures will be used for finding probabilities involving the geometric and normal
9. P z lt 1 50 or z gt 2 50 mwoaan ory 7 68 Let z denote a variable that has a standard normal distribution Determine the value z to satisfy the follow ing conditions a P z lt z 025 b P z lt z 01 c P z lt z 05 d P z gt z 02 e P z gt z 01 f P z gt z orz lt z 20 7 69 Determine the value z that a Separates the largest 3 of all z values from the others b Separates the largest 1 of all z values from the others c Separates the smallest 4 of all z values from the others d Separates the smallest 10 of all z values from the others 7 70 Determine the value of z such that a z and z separate the middle 95 of all z values from the most extreme 5 b z and z separate the middle 90 of all z values from the most extreme 10 c z and z separate the middle 98 of all z values from the most extreme 2 d z and z separate the middle 92 of all z values from the most extreme 8 7 71 Because P z lt 44 67 67 of all z values are less than 44 and 44 is the 67th percentile of the standard normal distribution Determine the value of each of the following percentiles for the standard normal distribution Hint If the cumulative area that you must look for does not appear in the z table use the closest entry a The 91st percentile Hint Look for area 9100 Bold exercises answered in back Data set available online but not r
10. largest 2 5 and the smallest 2 5 Let s begin by looking at how to accomplish these tasks when the distribution of in terest is the standard normal distribution The standard normal or z curve is shown in Figure 7 21 a It is centered at u 0 and the standard deviation 0 1 is a measure of the extent to which it spreads out Cumulative area z curve area to the left of z value 3 2 1 0 1 2 3 A particular z value a b 3 2 1 0 1 2 3 400 Chapter 7 a Random Variables and Probability Distributions about its mean in this case 0 Note that this picture is consistent with the Empirical Rule of Chapter 4 About 95 of the area probability is associated with values that are within 2 standard deviations of the mean between 2 and 2 and almost all of the area is associated with values that are within 3 standard deviations of the mean be tween 3 and 3 Appendix Table 2 tabulates cumulative z curve areas of the sort shown in Fig ure 7 21 b for many different values of z The smallest value for which the cumula tive area is given is 3 89 a value far out in the lower tail of the z curve The next smallest value for which the area appears is 3 88 then 3 87 then 3 86 and so on in increments of 0 01 terminating with the cumulative area to the left of 3 89 Using the Table of Standard Normal Curve Areas For any number z between 3 89 and 3 89 and rounded to two decimal places Ap
11. to play which game would you choose and why Bold exercises answered in back Data set available online but not required Y Video solution available 7 5 Binomial and Geometric Distributions In this section we introduce two of the more commonly encountered discrete proba bility distributions the binomial distribution and the geometric distribution These dis tributions arise when the experiment of interest consists of making a sequence of di chotomous observations two possible values for each observation The process of making a single such observation is called a trial For example one characteristic of blood type is Rh factor which can be either positive or negative We can think of an experiment that consists of noting the Rh factor for each of 25 blood donors as a se quence of 25 dichotomous trials where each trial consists of observing the Rh factor positive or negative of a single donor We could also conduct a different experiment that consists of observing the Rh factor of blood donors until a donor who is Rh negative is encountered This second experiment can also be viewed as a sequence of dichotomous trials but the total num ber of trials in this experiment is not predetermined as it was in the previous example where we knew in advance that there would be 25 trials Experiments of the two types just described are characteristic of those leading to the binomial and the geometric probability distributions respectively
12. 0 499 in and standard deviation 0 002 in What per centage of the bearings produced will not be acceptable 7 79 Y Suppose that the distribution of net typing rate in words per minute wpm for experienced typists can be approximated by a normal curve with mean 60 wpm and standard deviation 15 wpm Effects of Age and Skill in Typing Journal of Experimental Psychology 1984 345 371 a What is the probability that a randomly selected typist s net rate is at most 60 wpm less than 60 wpm b What is the probability that a randomly selected typ ist s net rate is between 45 and 90 wpm c Would you be surprised to find a typist in this popula tion whose net rate exceeded 105 wpm Note The largest net rate in a sample described in the paper is 104 wpm d Suppose that two typists are independently selected What is the probability that both their typing rates exceed 75 wpm e Suppose that special training is to be made available to the slowest 20 of the typists What typing speeds would qualify individuals for this training 7 80 Consider the variable x time required for a college student to complete a standardized exam Suppose that for the population of students at a particular university the distribution of x is well approximated by a normal curve with mean 45 min and standard deviation 5 min a If 50 min is allowed for the exam what proportion of students at this university would be unable to finish in the allot
13. 10 have oxide emission levels greater than 2 112 a ee Milo Eo A T iP 0 ve oe ne Even ee eet Cee eee ne A en RY 7 64 Determine the following standard normal z curve c Between 1 and 2 areas d To the right of 0 a The area under the z curve to the left of 1 75 e To the right of 5 b The area under the z curve to the left of 0 68 f Between 1 6 and 2 5 c The area under the z curve to the right of 1 20 g To the left of 0 23 d The area under the z curve to the right of 2 82 e The area under the z curve between 2 22 and 0 53 7 66 Let z denote a random variable that has a standard f The area under the z curve between 1 and 1 normal distribution Determine each of the following g The area under the z curve between 4 and 4 probabilities a P z lt 2 36 7 65 Determine each of the following areas under the b P z 2 36 standard normal z curve P z lt 1 23 a To the left of 1 28 d P 1 14 lt z lt 3 35 b To the right of 1 28 e P 0 77 z S 0 55 Bold exercises answered in back Data set available online but not required Y Video solution available f P z gt 2 g P z 3 38 h P z lt 4 98 7 67 Let z denote a random variable having a normal dis tribution with u 0 and 1 Determine each of the following probabilities P z lt 0 10 P z lt 0 10 P 0 40 lt z lt 0 85 P 0 85 lt z lt 0 40 P 0 40 lt z lt 0 85 e P z gt 125 g
14. 262 6 422 6 Construct a normal probability plot and comment on the plausibility of a normal distribution as a model for component lifetime 7 84 The paper The Load Life Relationship for M50 Bearings with Silicon Nitride Ceramic Balls Lubrication Bold exercises answered in back Data set available online but not required Engineering 1984 153 159 reported the following data on bearing load life in millions of revolutions the corre sponding normal scores are also given x Normal Score x Normal Score 47 1 1 867 240 0 0 062 68 1 1 408 240 0 0 187 68 1 1 131 278 0 0 315 90 8 0 921 278 0 0 448 103 6 0 745 289 0 0 590 106 0 0 590 289 0 0 745 115 0 0 448 367 0 0 921 126 0 0 315 385 9 1 131 146 6 0 187 392 0 1 408 229 0 0 062 395 0 1 867 Construct a normal probability plot Is normality plausible 7 85 The following observations are DDT concentra tions in the blood of 20 people 24 26 30 35 35 38 39 40 40 41 56 58 61 75 79 88 102 42 Use the normal scores from Exercise 7 84 to construct a normal probability plot and comment on the appropriate ness of a normal probability model 42 52 7 86 Consider the following sample of 25 observations on the diameter x in centimeters of a disk used in a cer tain system 16 01 16 08 16 13 15 94 16 05 16 27 15 89 15 84 15 95 16 10 15 92 16 04 15 82 16 15 16 06 15 66 15 78 15 99 16 29 16 15 16 19 16 22 16 07 16 13 16 11
15. 7 9 What is the probability that a randomly selected child s score will be within 2 standard deviations of the mean score As Fig ure 7 13 shows values of x within 2 standard deviations of the mean are those f ji i for which u 20 lt x lt u 20 u 20 u u 20 From Example 7 9 we already have u 7 16 The variance is Figure 7 13 Values within o Ye aypa D x 7 16 r x 2 standard deviations of the 0 T 7 16 002 1 7 16 7 001 tart 10 7 16 01 mean 1 5684 and the standard deviation is o V 1 5684 1 25 This gives using the probabilities given in Example 7 9 P u 20 lt x lt u 20 P 7 16 2 50 lt x lt 7 16 2 50 P 4 66 lt x lt 9 66 p 5 p 9 6 a Mean and Standard Deviation When x Is Continuous 0 0 c eee Figure 7 14 illustrates how the density curve for a continuous random variable can be approximated by a probability histo gram of a discrete random variable Computing the mean value and the standard deviation using this discrete distribution gives approximate values of u and for the continuous random variable x If an even more accurate approximating probability histogram is used narrower rectangles better approximations of u and g result In practice such an approximation method is often unnec essary Instead u and g can be defined and computed using methods from calculus The details need not concern us what is important is tha
16. 95 96 97 98 99 100 101 102 px 05 10 12 14 24 17 06 04 x 103 104 105 106 107 108 109 110 p x 03 02 OL 005 005 005 0037 0013 a What is the probability that the airline can accommo date everyone who shows up for the flight b What is the probability that not all passengers can be accommodated c If you are trying to get a seat on such a flight and you are number 1 on the standby list what is the probability that you will be able to take the flight What if you are number 3 7 12 Suppose that a computer manufacturer receives com puter boards in lots of five Two boards are selected from each lot for inspection We can represent possible outcomes of the selection process by pairs For example the pair 1 2 represents the selection of Boards and 2 for inspection a List the 10 different possible outcomes b Suppose that Boards and 2 are the only defective boards in a lot of five Two boards are to be chosen at ran Bold exercises answered in back Data set available online but not required dom Define x to be the number of defective boards ob served among those inspected Find the probability distri bution of x 7 13 Simulate the chance experiment described in Exer cise 7 12 using five slips of paper with two marked defec tive and three marked nondefective Place the slips in a box mix them well and draw out two Record the number of defective boards Replace the slips and repeat until you have 50 obser
17. A nationwide standardized exam consists of a multiple choice section and a free response section For each section the mean and standard deviation are reported to be Standard Mean Deviation Multiple Choice 38 6 Free Response 30 7 Let s define x and x as the multiple choice score and the free response score re spectively of a student selected at random from those taking this exam We are also interested in the variable y total score Suppose that the total score is computed as y x 2x What are the mean and standard deviation of y Because y x 2x is a linear combination of x and x the mean of y is My Myxi 2x My 2py 38 2 30 98 What about the variance and standard deviation of y To use Rule 2 in the preceding box x and x must be independent It is unlikely that the value of x a student s multiple choice score would be unrelated to the value of x the same student s free response score because it seems probable that students who score well on one sec tion of the exam will also tend to score well on the other section Therefore it would not be appropriate to calculate the variance and standard deviation from the given information Luggage Weights A commuter airline flies small planes between San Luis Obispo and San Francisco For small planes the baggage weight is a concern especially on foggy mornings because the weight of the plane has an effect on how quickly the plane can ascend
18. Because Bold exercises answered in back Data set available online but not required a Chapter Review Exercises 433 the sale price is so low only one can of balls will be sold to each customer If 40 of all customers buy Brand W 35 buy Brand P and 25 buy Brand D and if x is the number among three randomly selected customers who buy Brand W what is the probability distribution of x 7 120 Suppose that your statistics professor tells you that the scores on a midterm exam were approximately nor mally distributed with a mean of 78 and a standard devia tion of 7 The top 15 of all scores have been designated A s Your score is 89 Did you receive an A Explain 7 121 Suppose that the pH of soil samples taken from a certain geographic region is normally distributed with a mean pH of 6 00 and a standard deviation of 0 10 If the pH of a randomly selected soil sample from this region is determined answer the following questions about it a What is the probability that the resulting pH is between 5 90 and 6 15 b What is the probability that the resulting pH exceeds 6 10 c What is the probability that the resulting pH is at most S95 d What value will be exceeded by only 5 of all such pH values 7 122 The lightbulbs used to provide exterior lighting for a large office building have an average lifetime of 700 hr If length of life is approximately normally distributed with a standard deviation of 50 hr how often shou
19. DEFINITION value pairs A substantial linear pattern in a normal probability plot sug gests that population normality is plausible On the other hand a systematic departure from a straight line pattern such as curvature in the plot casts doubt on the legitimacy of assuming a normal population distribution Window Widths The following 10 observations are widths of contact windows in integrated circuit chips 3 21 2 49 2 94 4 38 4 02 3 62 3 30 2 85 3 34 3 81 The 10 pairs for the normal probability plot are then 1 539 2 49 0 123 3 34 1 001 2 85 0 376 3 62 0 656 2 94 0 656 3 81 416 Chapter 7 a Random Variables and Probability Distributions 0 376 3 21 1 001 4 02 0 123 3 30 1 539 4 38 The normal probability plot is shown in Figure 7 36 The linearity of the plot sup ports the assumption that the window width distribution from which these observa tions were drawn is normal Figure 7 36 A nor mal probability plot for Example 7 29 Figure 7 37 Plots sug gesting nonnormality a in dication that the population distribution is skewed b indication that the popu lation distribution has heav ier tails than a normal curve c presence of an outlier Observation Normal score The decision as to whether a plot shows a substantial linear pattern is somewhat subjective Particularly when n is small normality should not be ruled out unless the departure from linearity is clea
20. among the 15 who want a diet drink For which possible values of x is everyone satisfied 7 104 A mail order computer software business has six telephone lines Let x denote the number of lines in use at a specified time The probability distribution of x is as follows Bold exercises answered in back Data set available online but not required K 0 1 2 3 4 5 6 po J0 15 20 5 20 Wo 04 Write each of the following events in terms of x and then calculate the probability of each one At most three lines are in use Fewer than three lines are in use At least three lines are in use Between two and five lines inclusive are in use Between two and four lines inclusive are not in use At least four lines are not in use PPRT 7 105 Refer to the probability distribution of Exercise 7 104 a Calculate the mean value and standard deviation of x b What is the probability that the number of lines in use is farther than 3 standard deviations from the mean value 7 106 A new battery s voltage may be acceptable A or unacceptable U A certain flashlight requires two batter ies so batteries will be independently selected and tested until two acceptable ones have been found Suppose that 80 of all batteries have acceptable voltages and let y de note the number of batteries that must be tested a What is p 2 that is P 2 b What is p 3 Hint There are two different outcomes that result in y 3 c In order to
21. are sometimes used in place of E Both are read as n xX x n x choose x and represent the number of ways of choosing x items from a set of n The binomial probability function can then be written as ES rc See 0 1 2 7 x or P x z Lw 1 m a N 0 1 Px saog Some sources use p to represent the probability of success rather than m We prefer the use of Greek letters for character istics of a population or probability distribution thus the use of 7 Notice that the probability distribution is specified using a formula that allows calculation of the various probabilities rather than by giving a table or a probability histogram 7 5 a Binomial and Geometric Distributions 389 Example 7 17 Computer Monitors Sixty percent of all computer monitors sold by a large computer retailer have a flat panel display and 40 have a CRT display The type of monitor purchased by each of the next 12 customers will be noted Define a random variable x by x number of monitors among these 12 that have a flat panel display Because x counts the number of flat panel displays we use S to denote the sale of a flat panel monitor Then x is a binomial random variable with n 12 and 7 P S 60 The probability distribution of x is given by 12 x1 12 x The probability that exactly four monitors are flat panel displays is p 4 P x 4 12 zig 66 495 6 4 8 042 p x _ 6 4 x
22. attempting to register To determine the amount of time that should be allowed before disconnecting a student we need to describe the largest 1 of the distribution of time to register These are the individuals who will be mistakenly dis connected This is illustrated in Figure 7 35 a To determine the value of x we first solve the analogous problem for the standard normal distribution as shown in Fig ure 7 35 b Normal curve w 12 0 2 z curve 01 L 01 b a ed Largest 1 Largest 1 a b By looking at Appendix Table 2 for a cumulative area of 99 we find the closest entry 9901 in the 2 3 row and the 03 column from which z 2 33 For the stan dard normal distribution the largest 1 of the distribution is made up of those val ues greater than 2 33 An equivalent statement is that the largest 1 are those with z scores greater than 2 33 This implies that in the distribution of time to register x or any other normal distribution the largest 1 are those values with z scores greater than 2 33 or equivalently those x values more than 2 33 standard deviations above the mean Here the standard deviation is 2 so 2 33 standard deviations is 2 33 2 and it follows that x 12 2 33 2 12 4 66 16 66 The largest 1 of the distribution for time to register is made up of values that are greater than 16 66 min If the university system was set to disconnect students after 16 66 min only 1 of the students regis
23. changed to 14 7 60 Suppose that 90 of all registered California voters favor banning the release of information from exit polls in presidential elections until after the polls in California close A random sample of 25 California voters is to be selected a What is the probability that more than 20 voters favor the ban b What is the probability that at least 20 voters favor the ban c What are the mean value and standard deviation of the number of voters who favor the ban d If fewer than 20 voters in the sample favor the ban is this at odds with the assertion that at least 90 of the populace favors the ban Hint Consider P x lt 20 when T 9 7 61 Sophie is a dog that loves to play catch Unfortu nately she isn t very good and the probability that she Y Video solution available catches a ball is only 1 Let x be the number of tosses re quired until Sophie catches a ball a Does x have a binomial or a geometric distribution b What is the probability that it will take exactly two tosses for Sophie to catch a ball c What is the probability that more than three tosses will be required 7 62 Suppose that 5 of cereal boxes contain a prize and the other 95 contain the message Sorry try again Consider the random variable x where x number of boxes purchased until a prize is found a What is the probability that at most two boxes must be purchased b What is the probability that exactly fo
24. correct responses and x number of incorrect responses Calculating a total score by Bold exercises answered in back Data set available online but not required subtracting a term based on the number of incorrect re sponses is known as a correction for guessing and is de signed to discourage test takers from choosing answers at random a It can be shown that if a totally unprepared student an swers all 50 questions by just selecting one of the five an swers at random then u 10 and uy 40 What is the mean value of the total score y Does this surprise you Explain b Explain why it is unreasonable to use the formulas given in this section to compute the variance or standard deviation of y 7 43 Consider a large ferry that can accommodate cars and buses The toll for cars is 3 and the toll for buses is 10 Let x and y denote the number of cars and buses re spectively carried on a single trip Cars and buses are ac commodated on different levels of the ferry so the number of buses accommodated on any trip is independent of the number of cars on the trip Suppose that x and y have the following probability distributions x 0 1 2 3 4 5 p x 05 10 25 30 20 10 y 0 1 2 pO 50 30 20 a Compute the mean and standard deviation of x b Compute the mean and standard deviation of y c Compute the mean and variance of the total amount of money collected in tolls from cars d Compute the mean and vari
25. from 01 to 0 5 a Graphing Calculator Explorations 435 Now we turn our attention to calculating the mean and standard deviation of the random variable The data are already entered and we begin by recalling the definition of the mean of a discrete random variable B a A a R all possible x values Because we have stored exactly what is needed in Lists 1 and 2 we can virtually duplicate this definition using the language of lists and list operations for our calculator u D tis all possible x values The strategy for finding the mean of the Apgar random variable translated from math symbols to English is the following calculate the products of numbers in our Lists 1 and 2 store the results in List3 and then find the sum of all the numbers in List3 Mul tiplying to obtain the product is fairly easy as we appeal again to the language of lists Listl List2 List3 Now we need to find the sum of the numbers in List3 Exactly how this is done will vary from calculator to calculator The most likely scenario is for you to calculate the 1 variable statistics for List3 Calculators will usually report the sum look for this symbol gt x Be careful as you scan for the right choice in your calculator window Don t be misled by the symbol for the mean we want the sum You should get the value 7 16 for the Apgar mean We use a similar strategy to compute the standard deviation We will find the vari ance first then t
26. have y 5 what must be true of the fifth battery selected List the four outcomes for which y 5 and then determine p 5 Y Video solution available 432 Chapter 7 a Random Variables and Probability Distributions d Use the pattern in your answers for Parts a c to ob tain a general formula for p y 7 107 A pizza company advertises that it puts 0 5 Ib of real mozzarella cheese on its medium pizzas In fact the amount of cheese on a randomly selected medium pizza is normally distributed with a mean value of 0 5 lb and a standard deviation of 0 025 1b a What is the probability that the amount of cheese on a medium pizza is between 0 525 and 0 550 1b b What is the probability that the amount of cheese on a medium pizza exceeds the mean value by more than 2 standard deviations c What is the probability that three randomly selected me dium pizzas all have at least 0 475 1b of cheese 7 108 Suppose that fuel efficiency for a particular model car under specified conditions is normally distributed with a mean value of 30 0 mpg and a standard deviation of 1 2 mpg a What is the probability that the fuel efficiency for a randomly selected car of this type is between 29 and 31 mpg b Would it surprise you to find that the efficiency of a randomly selected car of this model is less than 25 mpg c If three cars of this model are randomly selected what is the probability that all three have efficiencies exceeding 32 mpg
27. normal curve with this u and on the corresponding probability histogram A normal curve fits the probability histogram well in the first case Figure 7 44 a When this happens binomial proba bilities can be accurately approximated by areas under the normal curve Because of this statisticians say that both x the number of successes and x n the proportion of successes are approximately normally distributed In the second case Figure 7 44 b the normal curve does not give a good approximation because the probability histogram is skewed whereas the normal curve is symmetric Figure 7 44 Normal ap proximations to binomial distributions 7 8 a Using the Normal Distribution to Approximate a Discrete Distribution 427 Let x be a binomial random variable based on n trials and success probability 7 so that mM nr and o Vnrm l rT If n and 7 are such that mar 10 and n 1 77 10 then x has approximately a normal distribution Combining this result with the continuity cor rection implies that 1 i amo bar y Pa sx o P E lt p 2 72 T o That is the probability that x is between a and b inclusive is approximately the area under the ene 1 1 approximating normal curve between a 5 and b 5 Similarly parte Pas b p z Pa x p t When either nm lt 10 or n 1 m lt 10 the binomial distribution is too skewed for the normal approximation to give accurate results Example 7 34 P
28. of x is easily obtained from this information Con sider the smallest possible x value 0 The only outcome for which x 0 is GGGG so p 0 P x 0 P GGGG 1296 7 2 a Probability Distributions for Discrete Random Variables 363 Table 7 1 Outcomes and Probabilities for Example 7 5 Outcome Probability x Value Outcome Probability x Value GGGG 1296 0 GEEG 0576 2 EGGG 0864 1 GEGE 0576 2 GEGG 0864 1 GGEE 0576 2 GGEG 0864 1 GEEE 0384 3 GGGE 0864 1 EGEE 0384 3 EEGG 0576 2 EEGE 0384 3 EGEG 0576 2 EEEG 0384 3 EGGE 0576 2 EEEE 0256 4 There are four different outcomes for which x 1 so p 1 results from summing the four corresponding probabilities p 1 P x 1 P EGGG or GEGG or GGEG or GGGE P EGGG P GEGG P GGEG P GGGE 0864 0864 0864 0864 4 0864 3456 Similarly p 2 P EEGG P GGEE 6 0576 3456 p 3 4 0384 1536 p 4 0256 The probability distribution of x is summarized in the following table x Value 0 1 2 3 4 p x Probability of Value 1296 3456 3456 1536 0256 To interpret p 3 1536 think of performing the chance experiment repeat edly each time with a new group of four customers In the long run 15 36 of these groups will have exactly three customers purchasing an electric hot tub The probability distribution can be used to determine probabilities of various events in volving x For example the probability that at least two of the fou
29. or small values in a distribution Consider describ ing the values that make up the most extreme 5 of the standard normal distribution That is we want to separate the middle 95 025 025 from the extreme 5 This is illustrated in Figure 7 27 Because the standard normal distribution is symmetric the most extreme 5 is equally divided between the high side and the Most extreme 5 Fe i low side of the distribution resulting in an area of 025 for each Middle of the tails of the z curve Symmetry about 0 implies that if z de 95 notes the value that separates the largest 2 5 the value that sepa Figure 7 27 The most extreme 5 of the rates the smallest 2 5 is simply z standard normal distribution To find z first determine the cumulative area for z which is area to the left of z 95 025 975 The cumulative area 9750 appears in the 1 9 row and 06 column of Appendix Table 2 so z 1 96 For the standard normal distribution 95 of the variable val ues fall between 1 96 and 1 96 the most extreme 5 are those values that are ei ther greater than 1 96 or less than 1 96 a Other Normal Distributions 20 00 0000 cceccceccccceccccecccceececutecececesateceeeeessaeeenes We now show how z curve areas can be used to calculate probabilities and to describe values for any normal distribution Remember that the letter z is reserved for those variables that have a standard normal distribution the letter x is used m
30. ran dom variables yet to come Because the discussions in this exploration have been de tailed the discussions in those cases will be less so In Exploration 7 1 we discussed how to graph a discrete distribution When we graphed the probability density function for the Apgar scores we manually entered the outcomes and their associated probabilities Anticipating that you may wish to con sider binomial chance experiments with many potential successes we will streamline the data entry process using some commands and functions we have already discussed in previous calculator explorations 438 Chapter 7 a Random Variables and Probability Distributions Figure 7 46 a Binomial probabilities b histogram of binomial distribution Graphing a binomial distribution will involve three steps 1 Construct the list of possible values in List using the seq command or your cal culator s equivalent 2 Construct the probabilities in List2 using the binompdf function or your calcula tor s equivalent 3 Draw the graph in the form of a histogram of the probability distribution Consider the binomial probability distribution for n 20 and m 20 Carrying out the steps below puts the integers 0 to 20 in List1l and p x for x values from 0 to 20 in List2 1 seq x x 0 20 Listl puts a sequence of 21 integers into Listl Remember to verify your calculator syntax and the order of the information to be entered for your ca
31. ser vice 315 9 318 9 329 9 339 9 344 9 and 359 7 Let y denote the price per gallon paid by a randomly selected customer a Is y a discrete random variable Explain b Suppose that the probability distribution of y is as follows y 315 9 318 9 329 9 339 9 344 9 359 7 pO 36 24 10 16 08 06 What is the probability that a randomly selected customer has paid more than 3 20 per gallon Less than 3 40 per gallon c Refer to Part b and calculate the mean value and standard deviation of y Interpret these values 7 38 A chemical supply company currently has in stock 100 Ib of a certain chemical which it sells to customers in 5 lb lots Let x the number of lots ordered by a ran domly chosen customer The probability distribution of x is as follows x 1 2 3 4 p x 2 4 3 1 Y Video solution available 7 4 a Mean and Standard Deviation of a Random Variable 385 a Calculate the mean value of x b Calculate the variance and standard deviation of x 7 39 Return to Exercise 7 38 and let y denote the amount of material in pounds left after the next customer s order is shipped Find the mean and variance of y Hint y is a linear function of x 7 40 An appliance dealer sells three different models of upright freezers having 13 5 15 9 and 19 1 cubic feet of storage space Let x the amount of storage space pur chased by the next customer to buy a freezer Suppose that x has the following probability dis
32. that w is not between 10 and 30 Y Video solution available 4 Mean and Standard Deviation of a Random Variable We study a random variable x such as the number of insurance claims made by a homeowner a discrete variable or the birth weight of a baby a continuous variable to learn something about how its values are distributed along the measurement scale The sample mean x and sample standard deviation s summarize center and spread for the values in a sample Similarly the mean value and standard deviation of a random variable describe where the variable s probability distribution is centered and the ex tent to which it spreads out about the center The mean value of a random variable x denoted by yz describes where the probability dis tribution of x is centered The standard deviation of a random variable x denoted by o describes variability in the probability distribution When is small observed values of x will tend to be close to the mean value little variability When the value of is large there will be more variability in observed x values 7 4 a Mean and Standard Deviation of a Random Variable 373 Figure 7 10 a shows two discrete probability distributions with the same standard deviation spread but different means center One distribution has a mean of m 6 and the other has u 10 Which is which Figure 7 10 b shows two continuous prob ability distributions that have the same mean but diff
33. that x is farther than 1 standard deviation from its mean value is P x lt m oor x gt my o P x lt 5 21 or x gt 9 79 x 5 P x 10 p 0 plS p 10 p 25 382 using Appendix Table 9 The value of a is 0 when m 0 or m 1 In these two cases there is no uncer tainty in x We are sure to observe x 0 when 7 0 and x n when 7 1 It is also easily verified that 7 1 7 is largest when m 5 Thus the binomial distribution spreads out the most when sampling from a 50 50 population The farther 7 is from 5 the less spread out and the more skewed the distribution 7 5 a Binomial and Geometric Distributions 393 a Geometric Distributions 2 2 0 occ cccccceccccceccccecccuececcececcececsueceaeccesaeceeuncesaess A binomial random variable is defined as the number of successes in n independent tri als where each trial can result in either a success or a failure and the probability of suc cess is the same for each trial Suppose however that we are not interested in the num ber of successes in a fixed number of trials but rather in the number of trials that must be carried out before a success occurs Two examples are counting the number of boxes of cereal that must be purchased before finding one with a rare toy and counting the num ber of games that a professional bowler must play before achieving a score over 250 The variable x number of trials to first success is called a geometri
34. to the right of a value a right tail area is 1 minus the corre sponding cumulative area This is illustrated in Figure 7 23 P z gt c P zSc Cc C Figure 7 23 The relationship between an upper tail area and a cumulative area Similarly the probability that z falls in the interval between a lower limit a and an up per limit b is P a lt z lt b area under the z curve and above the interval from a to b P z lt b P z lt a That is P a lt z lt b is the difference between two cumulative areas as illustrated in Figure 7 24 7 6 a Normal Distributions 403 P a lt z lt b P z lt b P z lt a b b a Figure 7 24 P a lt z lt b as the difference between the two cumulative areas Example 7 22 More About Standard Normal Curve Areas The probability that z is between 1 76 and 0 58 is P 1 76 lt z lt 0 58 P z lt 0 58 P z lt 1 76 7190 0392 6798 as shown in the following figure z curve Shaded area 6798 1 76 0 0 58 The probability that z is between 2 and 2 within 2 standard deviations of its mean since u 0 and 1 is P 2 00 lt z lt 2 00 P z lt 2 00 P z lt 2 00 9772 0228 9544 95 as shown in the following figure Shaded area 9544 z curve N 2 00 0 2 00 This last probability is the basis for one part of the Empirical Rule which states that when a histogram is well approximated by a n
35. values of this score are 0 1 2 9 10 A child s score is determined by five factors muscle tone skin color respiratory effort strength of heartbeat and reflex with a high score indicating a healthy infant Let the random variable x denote the Apgar score at 1 min of a randomly selected new born infant at a particular hospital and suppose that x has the following probability distribution x 0 1 2 3 4 5 6 7 8 9 10 p x 002 001 002 005 02 04 17 38 25 12 0l 376 Chapter 7 a Random Variables and Probability Distributions The mean value of x is Hx 0 p 0 0 002 7 16 1 p 1 9 p 9 10 p 10 1 001 9 12 10 01 The average Apgar score for a sample of newborn children born at this hospital may be x 7 05 x 8 30 or any one of a number of other possible values between 0 and 10 However as child after child is born and rated the average score will ap proach the value 7 16 This value can be interpreted as the mean Apgar score for the population of all babies born at this hospital a Standard Deviation of a Discrete Random Variable annn Example 7 10 Figure 7 12 Probability distribution for the number of defective components in Example 7 10 a Supplier 1 b Supplier 2 The mean value u provides only a partial summary of a probability distribution Two different distributions can have the same value of u yet a long sequence of sample values from one distribution might ex
36. w amount awarded Determine the probability distribu tion of w Hint Think of the slips as numbered 1 2 3 4 and 5 so that an outcome of the experiment consists of two of these numbers 7 17 Components coming off an assembly line are either free of defects S for success or defective F for failure Suppose that 70 of all such components are defect free Y Video solution available 7 3 a Probability Distributions for Continuous Random Variables 367 Components are independently selected and tested one by one Let y denote the number of components that must be tested until a defect free component is obtained a What is the smallest possible y value and what experi mental outcome gives this y value What is the second smallest y value and what outcome gives rise to it b What is the set of all possible y values c Determine the probability of each of the five smallest y values You should see a pattern that leads to a simple for mula for p y the probability distribution of y 7 18 A contractor is required by a county planning de partment to submit anywhere from one to five forms de pending on the nature of the project in applying for a building permit Let y be the number of forms required of the next applicant The probability that y forms are re quired is known to be proportional to y that is p y ky fory 1 5 a What is the value of k Hint gt p y 1 b What is the probability that at most
37. when it is actually fair b What is the probability of judging the coin fair when P H 9 so that there is a substantial bias Repeat for P H 1 c What is the probability of judging the coin fair when P H 6 when P H 4 Why are the probabilities so large compared to the probabilities in Part b d What happens to the error probabilities of Parts a and b if the decision rule is changed so that the coin is judged fair if 7 S x lt 18 and unfair otherwise Is this a better rule than the one first proposed Explain 7 59 A city ordinance requires that a smoke detector be installed in all residential housing There is concern that too many residences are still without detectors so a costly inspection program is being contemplated Let a be the proportion of all residences that have a detector A random sample of 25 residences is selected If the sample strongly suggests that 77 lt 80 less than 80 have detectors as opposed to 7 80 the program will be implemented Let x be the number of residences among the 25 that have a detector and consider the following decision rule Re ject the claim that 7 8 and implement the program if x 15 a What is the probability that the program is imple mented when m 80 b What is the probability that the program is not imple mented if 7 70 if 7 60 c How do the error probabilities of Parts a and b change if the value 15 in the decision rule is
38. 0 1 2 12 If group after group of 12 purchases is examined the long run percentage of those with exactly four flat panel monitors will be 4 2 According to this calcula tion 495 of the possible outcomes there are 2 4096 have x 4 The probability that between four and seven inclusive are flat panel displays is P 4 Sx 57 P x 40rx 5o0rx 60rx 7 Since these outcomes are disjoint this is equal to P4Sx57 pC p 5 pl6 pO 2 ei CO AYS t AENA 042 101 177 227 547 Notice that P 4 lt x lt 7 P x 5orx 6 p 5 p 6 278 so the probability depends on whether lt or lt appears This is typical of discrete random variables The binomial distribution formula can be tedious to use unless n is small Appen dix Table 9 gives binomial probabilities for selected n in combination with various val ues of 77 Appendix Table 9 should help you practice using the binomial distribution without getting bogged down in arithmetic 390 Chapter 7 a Random Variables and Probability Distributions Using Appendi x Table 9 To find p x for any particular value of x 1 Locate the part of the table corresponding to your value of n 5 10 15 20 or 25 2 Move down to the row labeled with your value of x 3 Go across to the column headed by the specified value of m The desired probability is at the intersection of the designated x row and m column For ex ample
39. 1 1 probability that x 2 2 z 9 3 1 2 2 mean value of x Notice that the expression for x is a weighted average of possible x values the weight of each value is the observed relative frequency Similarly the mean value of the random variable x is a weighted average but now the weights are the probabilities from the probability distribution as given in the definition in the following box Example 7 8 Example 7 9 7 4 a Mean and Standard Deviation of a Random Variable 375 value and Exam Attempts Individuals applying for a certain license are allowed up to four attempts to pass the licensing exam Let x denote the number of attempts made by a randomly selected applicant The probability distribution of x is as follows x 1 2 3 4 P x 10 20 30 40 Then x has mean value m X xp x 1 2 3 4 1 p 1 2 p 2 3 p 3 4 p 4 1 10 2 20 3 30 4 40 10 40 90 1 60 3 00 It is no accident that the symbol u for the mean value is the same symbol used previously for a population mean When the probability distribution describes how x values are distributed among the members of a population and therefore the proba bilities are population relative frequencies the mean value of x is exactly the average value of x in the population Apgar Scores At 1 min after birth and again at 5 min each newborn child is given a numerical rat ing called an Apgar score Possible
40. 1 c PX lt 4 1 1 d r lt x lt 1 Hint Use the results of Parts a 4 c e The probability that gravel sold exceeds 5 ton f The probability that gravel sold is at least ton 7 25 Let x be the amount of time in minutes that a par ticular San Francisco commuter must wait for a BART train Suppose that the density curve is as pictured a uni form distribution Density 0 05 x 0 20 Minutes a What is the probability that x is less than 10 min more than 15 min Bold exercises answered in back Data set available online but not required b What is the probability that x is between 7 and 12 min c Find the value c for which P x lt c 9 7 26 Referring to Exercise 7 25 let x and y be waiting times on two independently selected days Define a new random variable w by w x y the sum of the two waiting times The set of possible values for w is the inter val from 0 to 40 because both x and y can range from 0 to 20 It can be shown that the density curve of w is as pictured this curve is called a triangular distribution for obvious reasons Density 0 05 w 0 20 40 Minutes a Verify that the total area under the density curve is equal to 1 Hint The area of a triangle is base height b What is the probability that w is less than 20 less than 10 greater than 30 c What is the probability that w is between 10 and 30 Hint It might be easier first to find the probability
41. 5 Probability distribution for birth weight a weight measured to the near est pound b weight measured to the nearest tenth of a pound c limiting curve as measurement accuracy increases shaded area P 6 lt weight lt 8 x falls in an interval such as 6 x lt 8 is the area under the curve and above that interval Many probability calculations for continuous random variables involve the fol lowing three events 1 a lt x lt b the event that the random variable x assumes a value between two given numbers a and b 2 x lt a the event that the random variable x assumes a value less than a given number a 3 b lt x the event that the random variable x assumes a value greater than a given number b this can also be written as x gt b Figure 7 6 illustrates how the probabilities of these events are identified with areas un der a density curve P a lt x lt b P x lt a P b lt x Example 7 7 Figure 7 7 The uniform distribution for Example 7 7 7 3 a Probability Distributions for Continuous Random Variables 369 Application Processing Times Define a continuous random variable x by x amount of time in minutes taken by a clerk to process a certain type of application form Suppose that x has a probability distribution with density function 5 4 lt x lt 6 flx i otherwise The graph of f x the density curve is shown in Figure 7 7 a It is especially easy to use this density curve to calc
42. 5 S x S 30 binomcdf 250 1 30 binomedf 250 1 14 0 8753286537 0 00931244487 0 8660162088 To evaluate this probability using the normal curve approximation we will use the machine accuracy of the calculator with the mean u 25 and 4 74341649 normalcdf lower bound upper bound u o normalcdf 14 5 30 5 25 4 74341649 0 8634457937 The difference between the two probabilities to machine accuracy is 0 0025704151 This does not seem to be a large difference but it is a difference According to the rule of thumb this approximation meets the test but the investigator in the context of his or her situation must evaluate the practical importance of the difference Now lets redo the calculations not with a sample size of 250 but a sample size of only 50 Keeping the results proportionally the same by dividing by 5 we will con sider approximating the probability of getting between 3 and 6 preemies inclusive from a random sample of 50 babies In this case nmt 50 10 5 lt 10 nl wm 501 10 45 10 444 Chapter 7 a Random Variables and Probability Distributions nr 20 7 10 b n l f 25 c n 20 7 50 d Figure 7 52 Binomial distributions a n 20 a 05 b n 20 7 MOE C A u A d n 20 7 50 Since nm lt 10 our rule of thumb would regard the binomial distribution too skewed for the normal curve approximation to give accurate r
43. 6 98 1 51 1 23 28 1 89 1 37 2 10 1 45 29 90 95 52 72 30 2 05 1 43 Bold exercises answered in back Data set available online but not required Y Video solution available Figure 7 38 Histograms of the precipitation data used in Example 7 31 a un transformed data b square root transformed data Example 7 32 Bold exercises answered in back 7 7 a Checking for Normality and Normalizing Transformations 419 Logarithmic transformations are also common and as with bivariate data either the natural logarithm or the base 10 logarithm can be used A logarithmic transforma tion is usually applied to data that are positively skewed a long upper tail This af fects values in the upper tail substantially more than values in the lower tail yielding a more symmetric and often more nearly normal distribution Beryllium Exposure Exposure to beryllium is known to produce adverse effects on lungs as well as on other tissues and organs in both laboratory animals and humans The article Time Lapse Cinematographic Analysis of Beryllium Lung Fibroblast Interactions Envi ronmental Research 1983 34 43 reported the results of experiments designed to study the behavior of certain individual cells that had been exposed to beryllium An important characteristic of such an individual cell is its interdivision time IDT IDTs were determined for a large number of cells under both exposed treatment and unexposed control
44. Chapter Random Variables and Probability Distributions Walter Bibikow Getty Images Th chapter is the first of two chapters that together link the basic ideas of probabil ity explored in Chapter 6 with the techniques of statistical inference Chapter 6 used probability to describe the long run relative frequency of occurrence of various types of outcomes In this chapter we introduce probability models that can be used to de scribe the distribution of values of a variable In Chapter 8 we will see how these same probability models can be used to describe the behavior of sample statistics Such models are essential if we are to reach conclusions based on a sample from the popu lation of interest In a chance experiment we often focus on some numerical aspect of the outcome An environmental scientist who obtains an air sample from a specified location might be especially concerned with the concentration of ozone a major constituent of smog A quality control inspector who must decide whether to accept a large shipment of com ponents may base the decision on the number of defective components in a group of 20 components randomly selected from the shipment Before selection of the air sample the value of the ozone concentration is uncer tain Similarly the number of defective components among the 20 selected might be any whole number between 0 and 20 Because the value of a variable quantity such as ozone concentration or number of de
45. P 3 lt x lt 7 p 4 p 5 p 6 However if x is a continuous random variable such as task completion time then PB sxs7 PB lt x lt 7 because the area under a density curve and above a single value such as 3 or 7 is 0 Geometrically we can think of finding the area above a single point as finding the area of a rectangle with width 0 The area above an interval of values therefore does not depend on whether either endpoint is included For any two numbers a and b with a lt b ROERE DE Pa x b Pa x lt b Pia lt x lt b when x is a continuous random variable Probabilities for continuous random variables are often calculated using cumulative areas A cumulative area is all of the area under the den P x lt 5 sity curve to the left of a particular value Figure 7 8 illustrates the cu mulative area to the left of 5 which is P x lt 5 The probability that x is in any particular interval P a lt x lt b is the difference between two cumulative areas 5 Figure 7 8 A cumulative area under a density curve The probability that a continuous random variable x lies between a lower limit a and an upper limit b is P a lt x lt b cumulative area to the left of b cumulative area to the left of a y D P x lt a The foregoing property is illustrated in Figure 7 9 for the case of a 25 and b 75 We will use this result extensively in Section 7 6 when we calculate proba bilities using the norma
46. The sample space for this experiment consists of all the different possible random sam ples of size 50 that might result there is a very large number of these and for simple random sampling each of these outcomes is equally likely Let x number of successes in the sample where a success in this instance is defined as a student who plans to attend college Then x is a random variable because it associates a numerical value with each of the possible outcomes random samples that might occur Possible values of x are 0 1 2 50 and x is a discrete random variable E a EXOTCISOS ZAAT roren a AE E EAE AE EE 7 1 State whether each of the following random variables c The number of pages in a book is discrete or continuous d The number of draws with replacement from a deck a The number of defective tires on a car of cards until a heart is selected b The body temperature of a hospital patient e The lifetime of a lightbulb Bold exercises answered in back Data set available online but not required Y Video solution available 7 2 a Probability Distributions for Discrete Random Variables 361 7 2 Classify each of the following random variables as ei ther discrete or continuous a The fuel efficiency mpg of an automobile b The amount of rainfall at a particular location during the next year c The distance that a person throws a baseball d The number of questions asked during a 1 hr lecture e The tension in pounds per sq
47. The 13 largest normal scores for a sample of size 25 are 1 965 1 524 1 263 1 067 0 905 0 764 0 637 0 519 0 409 0 303 0 200 0 100 and 0 The 12 smallest scores result from placing a negative sign in front of each of the given nonzero scores Construct a normal probability plot Does it appear plausible that disk diameter is normally distributed Explain 7 87 Example 7 31 examined rainfall data for Min neapolis St Paul The square root transformation was used to obtain a distribution of values that was more symmetric Y Video solution available 7 7 a Checking for Normality and Normalizing Transformations 423 than the distribution of the original data Another power transformation that has been suggested by meteorologists is the cube root transformed value original value The original values and their cube roots the transformed values are given in the following table Original Transformed Original Transformed 0 32 0 68 1 51 1 15 0 47 0 78 1 62 1 17 0 52 0 80 1 74 1 20 0 59 0 84 1 87 1 23 0 77 0 92 1 89 1 24 0 81 0 93 1 95 1 25 0 81 0 93 2 05 1 27 0 90 0 97 2 10 1 28 0 96 0 99 2 20 1 30 1 18 1 06 2 48 1 35 1 20 1 06 2 81 1 41 1 20 1 06 3 00 1 44 1 31 1 09 3 09 1 46 1 35 1 11 3 37 1 50 1 43 1 13 4 75 1 68 Construct a histogram of the transformed data Compare your histogram to those given in Figure 7 38 Which of the cube root and square root transformations appear to result in the more symmetr
48. Variables and Probability Distributions and the variance of x is oF D x ox 0 p p 0 1 m pO n m pln These expressions appear to be very tedious to evaluate for any particular values of n and 7 Fortunately algebraic manipulation results in considerable simplification mak ing summation unnecessary The mean value and the standard deviation of a binomial random variable are respectively u nTt and o Vnr l 7 Example 7 19 Credit Cards Paid in Full Newsweek December 2 1991 reported that one third of all credit card users pay their bills in full each month This figure is of course an average across different cards and issuers Suppose that 30 of all individuals holding Visa cards issued by a certain bank pay in full each month A random sample of n 25 cardholders is to be selected The bank is interested in the variable x number in the sample who pay in full each month Even though sampling is done without replacement the sample size n 25 is most likely very small compared to the total number of credit card holders so we can approximate the probability distribution of x using a bino mial distribution with n 25 and 7 3 We have defined paid in full as a suc cess because this is the outcome counted by the random variable x The mean value of x is then ux nT 25 30 7 5 and the standard deviation is o Var 1 m V25 30 70 V5 25 2 29 The probability
49. able even though there is no upper limit to the number of possible values E Example 7 3 Stress In an engineering stress test pressure is applied to a thin 1 ft long bar until the bar a snaps The precise location where the bar will snap is uncertain Let x be the distance from the left end of the bar to the break Then x 0 25 is one possibility x 0 9 is _1 _ 7 t another and in fact any number between 0 and 1 is a possible value of x Figure 7 2 ee asof noe ESN shows the case of the outcome x 0 6 This set of possible values is an entire inter x 06 val on the number line so x is a continuous random variable Even though in practice we may be able to measure the distance only to the near Figure 7 2 The bar for est tenth of an inch or hundredth of an inch the actual distance could be any num Example 7 3 and the outcome ber between 0 and 1 So even though the recorded values might be rounded because x 0 6 of the accuracy of the measuring instrument the variable is still continuous E In data analysis random variables often arise in the context of summarizing sample data when a sample is selected from some population This is illustrated in Example 7 4 Example 7 4 College Plans Suppose that a counselor plans to select a random sample of 50 seniors at a large high school and to ask each student in the sample whether he or she plans to attend college after graduation The process of sampling is a chance experiment
50. age value and variability Figure 7 15 displays density curves that are consistent with this information x distribution 6 200 y distribution G 275 a i a TTT T T aa 4300 4500 4700 4900 u 4650 u 4500 a Mean and Variance of Linear Functions and Linear Combinations We have seen how the mean and standard deviation of one or more random variables provide useful information about the variables long run behavior but we might also be interested in the behavior of some function of these variables For example consider the experiment in which a customer of a propane gas com pany is randomly selected Suppose that the mean and standard deviation of the ran dom variable x number of gallons required to fill a customer s propane tank are known to be 318 gal and 42 gal respectively The company is considering two dif ferent pricing models Model 1 3 per gal Model 2 service charge of 50 2 80 per gal The company is interested in the variable y amount billed For each of the two models y can be expressed as a function of the random variable x Model 1 Ymodel 1 T 3x Model 2 Yoder 50 2 8x Both of these equations are examples of a linear function of x The mean and stan dard deviation of a linear function of x can be computed from the mean and standard deviation of x as described in the following box 380 Chapter 7 Random Variables and Probability Distributions unction i
51. ake the square root The formula for the variance Co a e o all possible x values also easily translates into the language of lists g gt il 7O is all possible x values The list language is only slightly more complicated than for the mean Listl 7 16 List2 List3 After this production the sum of the numbers in List3 is the variance of the ran dom variable the square root is the standard deviation Performing these calculations you should get a variance of 1 5684 from x for List3 again don t be misled and choose the or s The standard deviation is then found by taking the square root of 1 5684 resulting in 1 2524 Exploration 7 2 Binomial Probability Calculations Most calculations having to do with random variables are of one of three types They are 1 the probability the variable will assume a value between two given numbers 2 the probability the variable will assume a value less than a given number or 3 the probability the random variable will assume a value greater than a given number Be cause these calculations are so common in statistics your calculator may have a built in capability for finding these probabilities In the case of a discrete random variable such as the binomial distribution there is a special case of 1 above the probability that the random variable will actually assume a particular value 436 Chapter 7 a Random Variables and Probability Distributions For the binomial d
52. al as long as certain conditions are satisfied We hope that working with the approximation to the binomial has given you an appreciation for the uses of the normal distribution to ap proximate these other distributions
53. ally distributed with a mean of 5 mg and a standard deviation of 0 05 mg What is the probability that a ran domly selected capsule contains less than 4 9 mg of vita min E at least 5 2 mg 7 112 Accurate labeling of packaged meat is difficult because of weight decrease resulting from moisture loss defined as a percentage of the package s original net weight Suppose that moisture loss for a package of chicken breasts is normally distributed with mean value 4 0 and standard deviation 1 0 This model is sug gested in the paper Drained Weight Labeling for Meat and Poultry An Economic Analysis of a Regulatory Pro posal Journal of Consumer Affairs 1980 307 325 Let x denote the moisture loss for a randomly selected package a What is the probability that x is between 3 0 and 5 0 b What is the probability that x is at most 4 0 c What is the probability that x is at least 7 0 d Find a number z such that 90 of all packages have moisture losses below z e What is the probability that moisture loss differs from the mean value by at least 1 7 113 The Wall Street Journal February 15 1972 re ported that General Electric was sued in Texas for sex dis crimination over a minimum height requirement of 5 ft 7 in The suit claimed that this restriction eliminated more than 94 of adult females from consideration Let x rep resent the height of a randomly selected adult woman Suppose that x is approximately
54. an dom variable is one whose value is typically obtained by mea surement temperature in a freezer compartment weight of a pineapple amount of time spent in the store etc Because there is a limit to the accuracy of any measur ing instrument such as a watch or a scale it may seem that any variable should be re garded as discrete However when there is a large number of closely spaced values the variable s behavior is most easily studied by conceptualizing it as continuous Do ing so allows the use of calculus to solve some types of probability problems In some books uppercase letters are used to name random variables with lowercase letters representing a par ticular value that the variable might assume We have opted to use a simpler and less formal notation Example 7 1 Example 7 2 7 1 a Random Variables 359 Car Sales Consider an experiment in which the type of car new N or used U chosen by each of three successive customers at a discount car dealership is noted Define a random variable x by x number of customers purchasing a new car The experimental outcome in which the first and third customers purchase a new car and the second customer purchases a used car can be abbreviated NUN The associ ated x value is 2 because two of the three customers selected a new car Similarly the x value for the outcome NNN all three purchase a new car is 3 We display each of the eight possible experimental outcome
55. an or equal to 7 minus the probability of observing a value less than or equal to 3 not 4 In symbols P 4 x T P x 7 P x 3 which is found using geomcdf 4 7 geomcdf 4 3 giving 0 1880064 What is the probability of more than 7 stops before we get jumper cables 1 geomcdf 4 7 gives us 0 0279936 Graphing an entire geometric probability distribution is not possible since there is an infinite number of possible values 1 2 3 Nevertheless we can graph parts of the distribution The method for graphing is similar to that for the binomial ran dom variable Use the seq function to create a list of integers in Listl then use the geompdf function to find the corresponding probabilities and store them in List2 and finally plot the distribution as you would a histogram and as we have previously done with the binomial These steps for the geometric distribution of Example 7 20 are sum marized below 1 seq x x 1 20 gt List 2 geompdf 4 List List2 3 Graph a histogram with the domain List1 and probabilities in List2 The data editing window is shown in Figure 7 47 a and a graph of this geomet ric distribution appears in Figure 7 47 b We are really calculating probabilities for only part of the distribution since the number of possible values is infinite The graph should tail to the right in a gradual manner not suddenly drop out of sight You may notice a sudden plummeting in yo
56. an the critical r for n 40 critical r 960 The correlation coefficient using the trans formed data is 998 which is much larger than the critical r supporting the asser tion that logio IDT has approximately a normal distribution Figure 7 40 displays IDT 10g49 IDT 70 60 50 40 30 20 10 1 0 1 2 normal score a 19 18 4 17 4 16 4 15 14 4 13 12 4 14 normal score b 7 7 a Checking for Normality and Normalizing Transformations 421 MINITAB normal probability plots for the original data and for the transformed data The plot for the transformed data is clearly more linear in appearance than the plot for the original data m Selecting a Transformation os scaiy wises setacevsnecentontevasnetnaseisnciedndeSeolevthsnavertaetioeasens Occasionally a particular transformation can be dictated by some theoretical argu ment but often this is not the case and you may wish to try several different transfor mations to find one that is satisfactory Figure 7 41 from the article Distribution of Sperm Counts in Suspected Infertile Men Journal of Reproduction and Fertility 1983 91 96 shows what can result from such a search Other investigators in this field had previously used all three of the transformations illustrated a adh Sperm concentrations 10 ml Number of cases a da Sperm concentrations 10 ml c d Number of cases Fi
57. ance of the total amount of money collected in tolls from buses e Compute the mean and variance of z total number of vehicles cars and buses on the ferry f Compute the mean and variance of w total amount of money collected in tolls 7 44 Consider a game in which a red die and a blue die are rolled Let xg denote the value showing on the upper most face of the red die and define xg similarly for the blue die a The probability distribution of xp is Xp 1 2 3 4 5 6 p amp r 6 16 16 16 1 6 1 6 Find the mean variance and standard deviation of xp b What are the values of the mean variance and stan dard deviation of xg You should be able to answer this question without doing any additional calculations Y Video solution available 386 Chapter 7 a Random Variables and Probability Distributions c Suppose that you are offered a choice of the following For Game 1 the net amount won in a game is w two games y 7 Xp xg 7 What are the mean and standard Game 1 Costs 7 to play and you win y dollars where deviation of w Yi Xp Xp d For Game 2 the net amount won in a game is w Game 2 Doesn t cost anything to play initially but you 3y 2 3 r xg What are the mean and standard devia win 3y dollars where y xp Xp If y is tion of w negative you must pay that amount if it is posi e Based on your answers to Parts c and d if you had tive you receive that amount
58. bability distribution of x is given by the following table x 15 30 60 D x l 3 6 Bold exercises answered in back Data set available online but not required a Find the average length for commercials appearing on this station b If a 15 sec spot sells for 500 a 30 sec spot for 800 and a 60 sec spot for 1000 find the average amount paid for commercials appearing on this station Hint Consider a new variable y cost and then find the probability dis tribution and mean value of y 7 35 An author has written a book and submitted it to a publisher The publisher offers to print the book and gives the author the choice between a flat payment of 10 000 and a royalty plan Under the royalty plan the author would receive 1 for each copy of the book sold The author thinks that the following table gives the probability distribution of the variable x the number of books that will be sold x 1000 5000 10 000 20 000 P x 05 30 40 25 Which payment plan should the author choose Why 7 36 A grocery store has an express line for customers purchasing at most five items Let x be the number of items purchased by a randomly selected customer using this line Give examples of two different assignments of probabilities such that the resulting distributions have the same mean but quite different standard deviations 7 37 V A gas station sells gasoline at the following prices in cents per gallon depending on the type of gas and
59. bability that 442 Chapter 7 a Random Variables and Probability Distributions z will be below a specific value Suppose we want to find the probability that z is less than 1 76 Remembering that the set of possible values for a standard normal ran dom variable is the entire real line you might think to enter the following normalcdf 1 76 If so your thinking is right on target except for one thing there is no on your calculator Some calculators will have a special symbol for which the calcula tor translates internally to its equivalent of a very small number You should check your manual for this number and how to find it The representation will probably be something like 1E99 or 1e999 which is calculator speak for 1 times 10 raised to the highest power the calculator can handle In the case of the standard nor mal curve it may be just as easy to enter a different but still very small number in place of the 9 perhaps normaledf 10 1 76 On our calculator 0392038577 is returned which agrees with the tabled answer If you are squeamish about 10 use 50 using 50 we get 0392038577 also Finding the probability that a z is greater than a particular value is also easy For example we find the area to the right of z 1 42 as follows 1 normalcdf 1 42 Using 10 for the lower bound we get 0 0778038883 The last type of pro
60. be approximated using a normal curve The most important case of this is the approximation of binomial probabilities a The Normal Curve and Discrete Variables ccceccecccceeccccececeaeceeeeess The probability distribution of a discrete random variable x is represented pictorially by a probability histogram The proba bility of a particular value is the area of the rectangle centered at that value Possible values of x are isolated points on the num ber line usually whole numbers For example if x the IQ of a randomly selected 8 year old child then x is a discrete ran dom variable because an IQ score must be a whole number Often a probability histogram can be well approximated by anormal curve as illustrated in Figure 7 42 In such cases it is Figure 7 42 A normal curve approximation to customary to say that x has approximately a normal distribu a probability histogram tion The normal distribution can then be used to calculate ap proximate probabilities of events involving x Example 7 33 Express Mail Packages The number of express mail packages mailed at a certain post office on a randomly selected day is approximately normally distributed with mean 18 and standard devia tion 6 Let s first calculate the approximate probability that x 20 Figure 7 43 a shows a portion of the probability histogram for x with the approximating normal curve superimposed The area of the shaded rectangle is P x 20 The l
61. ble might be the number of items x purchased by the customer Possible values of this variable are O a frustrated customer 1 2 3 and so on Until a customer is selected and the number of items counted the value of x is uncertain Another variable of potential interest might be the time y minutes spent in a checkout line One possible value of y is 3 0 min and an other is 4 0 min but any other number between 3 0 and 4 0 is also a possibility Whereas possible values of x are isolated points on the number line possible y values form an entire interval a continuum on the number line DEFINITION A numerical variable whose value depends on the outcome of a chance ex periment is called a random variable A random variable associates a nu merical value with each outcome of a chance experiment A random variable is discrete if its set of possible values is a collection of isolated points on the number line The variable is continuous if its set of possible values includes an entire interval on the number line _ We use lowercase letters such as x and y to represent random variables Figure 7 1 shows a set of possible values for each type of Possible values of a random variable In practice a discrete random variable al continuous random variable most always arises in connection with counting e g the number of items purchased the number of gas pumps in use or the number of broken eggs in a carton A continuous r
62. blem examined will be the identification of extreme values The easiest way to do this is with a built in function typically called InvNormal which stands for inverse normal The InvNormal function or whatever it is named on your calculator will be the reverse of finding the probability that z is less than a speci fied value Earlier in our discussion we found the probability that z is less than 1 76 to be 0 0392038577 The InvNormal function returns a z value when given the proba bility Thus InvNormal 0392038577 equals 1 76 Except for the difference in func tion name the syntax for this function should be the same as for normalcdf InvNormal cumulative probability u o On our calculator InvNormal 0 0392038577 returns 1 760000538 Exploration 7 5 The Normal Approximation to the Binomial Distribution In an earlier Exploration we showed you how to use your calculator to find probabil ities associated with the binomial and normal distributions using built in calculator functions We generically used the terms binompdf binomcdf normalpdf and nor malcdf to refer to these functions In this Exploration we would like to focus on the normal approximation to the binomial distribution Whenever a continuous distribu tion is used to approximate a discrete distribution the question naturally occurs How good is the approximation The answer usually given by statistics instructors is it depe
63. c random variable and the probability distribution that describes its behavior is called a geometric probability distribution Suppose an experiment consists of a sequence of trials with the following conditions 1 The trials are independent 2 Each trial can result in one of two possible outcomes success and failure 3 The probability of success is the same for all trials A geometric random variable is defined as x number of trials until the first success is observed including the success trial The probability distribution of x is called the geometric probability distribution For example suppose that 40 of the students who drive to campus at your univer sity carry jumper cables Your car has a dead battery and you don t have jumper cables so you decide to stop students who are headed to the parking lot and ask them whether they have a pair of jumper cables You might be interested in the number of students you would have to stop before finding one who has jumper cables If we define success as a student with jumper cables a trial would consist of asking an individual student for help The random variable x number of students who must be stopped before finding one with jumper cables is an example of a geometric random variable because it can be viewed as the number of trials to the first success in a sequence of independent trials The probability distribution of a geometric random variable is easy to construct We use 7 to d
64. cal constants We can use the results in the preceding box to compute the mean and standard de viation of the billing amount variable for the propane gas example as follows For Model 1 Hmodet Max ee 3 318 954 O hodet 1 oi 3o 9 42 15 876 O model 1 15 876 126 3 42 For Model 2 Moder Hso 28 50 2 8u 50 2 8 318 940 40 Olode F59 2 9 2 807 2 8 42 13 829 76 Tmodet2 V13 829 76 117 60 2 8 42 The mean billing amount for Model 1 is a bit higher than for Model 2 as is the variability in billing amounts Model 2 results in slightly more consistency from bill to bill in the amount charged Now let s consider a different type of problem Suppose that you have three tasks that you plan to complete on the way home from school stop at the public library to return an overdue book for which you must pay a fine deposit your most recent pay check at the bank and stop by the office supply store to purchase paper for your com puter printer Define the following variables x time required to return book and pay fine X time required to deposit paycheck x3 time required to buy printer paper We can then define a new variable y to represent the total amount of time to complete these tasks y x X XR Defined in this way y is an example of a linear combination of random variables 7 4 a Mean and Standard Deviation of a Random Variable 381 Tf x1 X2
65. conditions The authors of the article stated The IDT dis tributions are seen to be skewed but the natural logs do have an approximate normal distribution The same property holds for log transformed data We give represen tative IDT data in Table 7 4 and the resulting histograms in Figure 7 39 which are in agreement with the authors statement Table 7 4 Original and log IDT Values IDT log 9 IDT IDT log o IDT IDT log o IDT 28 1 1 45 31 2 1 49 13 7 1 14 46 0 1 66 25 8 1 41 16 8 1 23 34 8 1 54 62 3 1 79 28 0 1 45 17 9 1 25 19 5 1 29 21 1 1 32 31 9 1 50 28 9 1 46 60 1 1 78 23 7 1 37 18 6 1 27 21 4 1 33 26 6 1 42 26 2 1 42 32 0 1 51 43 5 1 64 17 4 1 24 38 8 1 59 30 6 1 49 55 6 1 75 235 3 1 41 52 1 1 72 21 0 1 32 22 3 1 35 15 3 1 19 36 3 1 56 19 1 1 28 38 4 1 58 72 8 1 86 48 9 1 69 21 4 1 33 20 7 1 32 57 3 1 76 40 9 1 61 Data set available online but not required Y Video solution available 420 Chapter 7 a Random Variables and Probability Distributions Figure 7 39 Histograms of the IDT data used in Example 7 32 a untrans formed data b logy transformed data Figure 7 40 MINITAB generated normal probabil ity plots for Example 7 32 a original IDT data b log transformed IDT 10 20 30 40 50 60 70 80 a 1 1 1 3 1 5 b 1 7 The sample size for the IDT data is n 40 The correlation coefficient for the normal score original untransformed data pairs is 950 which is less th
66. d or slept c Determine P x 4 7 48 Refer to Exercise 7 47 and suppose that 10 rather than 6 passengers are selected n 10 7 8 so that Appendix Table 9 can be used a What is p 8 b Calculate P x lt 7 c Calculate the probability that more than half of the se lected passengers rested or slept 7 49 Twenty five percent of the customers entering a grocery store between 5 P M and 7 P M use an express checkout Consider five randomly selected customers and let x denote the number among the five who use the ex press checkout a What is p 2 that is P x 2 b What is P x 1 c What is P 2 x Hint Make use of your computa tion in Part b d What is P x 2 Bold exercises answered in back Data set available online but not required 7 5 a Binomial and Geometric Distributions 395 7 50 A breeder of show dogs is interested in the number of female puppies in a litter If a birth is equally likely to result in a male or a female puppy give the probability distribution of the variable x number of female puppies in a litter of size 5 7 51 The article FBI Says Fewer than 25 Failed Polygraph Test San Luis Obispo Tribune July 29 2001 states that false positives in polygraph tests i e tests in which an in dividual fails even though he or she is telling the truth are relatively common and occur about 15 of the time Sup pose that such a test is given to 10 trustworthy individ
67. different from the binomial and geometric distributions because the normal distribution is continuous Consequently it will not be graphed as a histogram normal curves are graphed just as any other function is graphed The normal curve however is not particularly simple Fortunately you calculator if it has statistical functions will have normal already in it somewhere It might be something like this normalpdf x u 7 If you are a glutton for punishment or your calculator does not have a built in normalpdf function here is the formula for a normal curve with mean u and standard deviation 1 13 a Here are the keystrokes for the formula yl Msg m o exp aAA p 2 2 0 2 Assuming you are smiling because of your foresight in purchasing a calculator with a built in normalpdf function let s put it to good use The syntax above for the normalpdf function might seem complicated but actual use is simple once you get used to it You should check your calculator manual for two very important pieces of information First make sure you know the required order for the information you must provide Second look closely at the sigma wherever it is in your calculator s syntax Make sure you check whether you must enter the standard deviation or the variance Now let s tackle the notation First of all if your calculator s syntax has those square brackets j o remember that they indicate numbers that are op ti
68. distribution with u 50 and 5 This is why the translation using z scores results in an equivalent problem involving the standard normal distribution 2 a Describing Extreme Values in a Normal Distribution ee Example 7 27 To describe the extreme values for a normal distribution with mean u and standard de viation we first solve the corresponding problem for the standard normal distribu tion and then translate our answer into one for the normal distribution of interest This process is illustrated in Example 7 27 Registration Times Data on the length of time required to complete registration for classes using a tele phone registration system suggest that the distribution of the variable x time to register for students at a particular university can be well approximated by a normal dis tribution with mean u 12 min and standard deviation 2 min The normal Figure 7 35 Capturing the largest 1 in a normal distribution for the problem in Example 7 27 7 6 a Normal Distributions 411 distribution might not be an appropriate model for x time to register at another university Many factors influence the shape center and spread of such a distribu tion Because some students do not sign off properly the university would like to disconnect students automatically after some amount of time has elapsed It is de cided to choose this time such that only 1 of the students are disconnected while they are still
69. dom variable of interest For example if the variable counts the number of female births among the next 25 births at a particular hospital then a female birth would be labeled a success because this is what the variable counts If male births were counted in stead a male birth would be labeled a success and a female birth a failure One illustration of a binomial probability distribution was given in Example 7 5 There we considered x number among four customers who selected an electric as opposed to gas hot tub This is a binomial experiment with four trials and P suc cess P E 4 The 16 possible outcomes along with their probabilities were dis played in Table 7 1 Consider now the case of five customers a binomial experiment with five trials Here the binomial distribution tells us the probability associated with each of the pos sible x values 0 1 2 3 4 and 5 There are 32 possible outcomes and 5 of them yield x 1 SFFFF FSFFF FFSFF FFFSF and FFFFS By independence the first of these outcomes has probability P SFFFF P S P F P F P F P F 4 6 6 6 6 4 6 05184 The probability calculation will be the same for any outcome with only one success x 1 It does not matter where in the sequence the single success occurs Thus p 1 Px 1 P SFFFF or FSFFF or FFSFF or FFFSF or FFFFS 05184 05184 05184 05184 05184 5 05184 25920 Similarly there are ten outcomes for
70. e hypergeometric distribution are very close in value They are so close in fact that statisticians often ignore the difference and use the binomial prob abilities in place of the hypergeometric probabilities Most statisticians recommend the following guideline for determining whether the binomial probability distribution is appropriate when sampling without replacement Let x denote the number of S s in a sample of size n selected without replacement from a population consisting of N individuals or objects If n N 0 05 i e at most 5 of the pop ulation is sampled then the binomial distribution gives a good approximation to the probabil ity distribution of x Example 7 18 Security Systems A Los Angeles Times poll November 10 1991 reported that almost 20 of South ern California homeowners questioned had installed a home security system Sup pose that exactly 20 of all such homeowners have a system Consider a random Figure 7 16 The bino mial probability histogram when n 20 and 7 20 7 5 a Binomial and Geometric Distributions 391 sample of n 20 homeowners much less than 5 of the population Then x the number of homeowners in the sample who have a security system has approxi mately a binomial distribution with n 20 and m 20 The probability that five of those sampled have a system is PS P 5 entry in x row and 7 20 column in Appendix Table 9 n 20 y pp 175 The probabili
71. e weighs at most 0 5 1b between 0 25 and 0 5 1b at least 0 75 1b Bold exercises answered in back Data set available online but not required V2650 50 596 Oy b It can be shown that u and o 4 What is the probability that the value of x is more than 1 standard deviation from the mean value 7 28 The probability distribution of x the number of de fective tires on a randomly selected automobile checked at a certain inspection station is given in the following table x 0 1 a 3 4 p x 54 16 06 04 20 a Calculate the mean value of x b What is the probability that x exceeds its mean value 7 29 Exercise 7 9 introduced the following probability dis tribution for y the number of broken eggs in a carton y 0 1 2 3 4 pO 65 20 10 04 01 a Calculate and interpret u b In the long run for what percentage of cartons is the number of broken eggs less than u Does this surprise you Y Video solution available 384 Chapter 7 a Random Variables and Probability Distributions c Why doesn t u 0 1 2 3 4 5 2 0 Explain 7 30 Referring to Exercise 7 29 use the result of Part a along with the fact that a carton contains 12 eggs to deter mine the mean value of z the number of unbroken eggs Hint z can be written as a linear function of x 7 31 The mean value of x the number of defective tires whose distribution appears in Exercise 7 28 is u 1 2 Calculate o and o 7 32 E
72. e would you recommend Hint Which machine would produce fewer defective corks 7 76 A gasoline tank for a certain car is designed to hold 15 gal of gas Suppose that the variable x actual capac ity of a randomly selected tank has a distribution that is well approximated by a normal curve with mean 15 0 gal and standard deviation 0 1 gal a What is the probability that a randomly selected tank will hold at most 14 8 gal b What is the probability that a randomly selected tank will hold between 14 7 and 15 1 gal c If two such tanks are independently selected what is the probability that both hold at most 15 gal 7 77 WY The time that it takes a randomly selected job ap plicant to perform a certain task has a distribution that can be approximated by a normal distribution with a mean value of 120 sec and a standard deviation of 20 sec The fastest 10 are to be given advanced training What task times qualify individuals for such training 7 78 A machine that produces ball bearings has initially been set so that the true average diameter of the bearings it produces is 0 500 in A bearing is acceptable if its diam eter is within 0 004 in of this target value Suppose how ever that the setting has changed during the course of Bold exercises answered in back Data set available online but not required production so that the distribution of the diameters pro duced is well approximated by a normal distribution with mean
73. efore birth What is the probability that the duration of pregnancy is at least 310 days Does this probability make you a bit skeptical of the claim e Some insurance companies will pay the medical ex penses associated with childbirth only if the insurance has Bold exercises answered in back Data set available online but not required been in effect for more than 9 months 275 days This re striction is designed to ensure that the insurance company pays benefits for only those pregnancies for which con ception occurred during coverage Suppose that concep tion occurred 2 weeks after coverage began What is the probability that the insurance company will refuse to pay benefits because of the 275 day insurance requirement CENGAGENOW lt Do you need a live tutor for homework problems Are you ready Take your exam prep post test now Y Video solution available Graphing Calculator Explorations rT i gt nae 4 E 5 a b Figure 7 45 a Apgar score probability distribu tion b probability histo gram for Apgar score Exploration 7 1 Discrete Probability Distributions The calculator is at its finest when used with random variables transforming minutes of mindless calculation into seconds of easy button pushing In our calculator presen tation of random variables we will capitalize extensively on the list capabilities of your calculator We will show you not only how to gra
74. eft edge of this rectangle is at 19 5 on the horizontal scale and the right edge is at 20 5 426 Chapter 7 a Random Variables and Probability Distributions Therefore the desired probability is approximately the area under the normal curve between 19 5 and 20 5 Standardizing these limits gives 20 5 18 _ 19 5 18 Shaded area P x lt 10 6 42 6 25 Shaded area P x 20 from which we get P x 20 P 25 lt z lt 42 6628 5987 0641 In a similar fashion Figure 7 43 b shows that P x 10 is 1g T20 6 8 10 approximately the area under the normal curve to the left of 10 5 Then 19 5 20 5 10 5 10 5 18 a b P x 10 p z lt P z 1 25 Figure 7 43 The normal approximation for 1056 Example 7 33 E The calculation of probabilities in Example 7 33 illustrates the use of what is known as a continuity correction Because the rectangle for x 10 extends to 10 5 on the right we use the normal curve area to the left of 10 5 rather than 10 In general if possible x values are consecutive whole numbers then P a x b will be ap proximately the normal curve area between limits a 4 and b 3 a Normal Approximation to a Binomial Distribution ec eceeeeee Figure 7 44 shows the probability histograms for two binomial distributions one with n 25 7 4 and the other with n 25 m 1 For each distribution we computed f ni and 0 Vn7 1 m and then we superimposed a
75. en 3 and 5 min elapse before dismissal 7 22 Refer to the probability distribution given in Ex ercise 7 21 Put the following probabilities in order Bold exercises answered in back Data set available online but not required from smallest to largest P 2 lt x lt 3 P 2 lt x S 3 P x lt 2 P x gt 7 Explain your reasoning 7 23 The article Modeling Sediment and Water Column Interactions for Hydrophobic Pollutants Water Research 1984 1169 1174 suggests the uniform distribution on the interval from 7 5 to 20 as a model for x depth in centimeters of the bioturbation layer in sediment for a certain region a Draw the density curve for x b What is the height of the density curve c What is the probability that x is at most 12 d What is the probability that x is between 10 and 15 Between 12 and 17 Why are these two probabilities equal 7 24 Let x denote the amount of gravel sold in tons dur ing a randomly selected week at a particular sales facility Suppose that the density curve has height f x above the value x where fla o b2xe1 0 otherwise The density curve the graph of f x is shown in the fol lowing figure Density x 0 1 Tons Use the fact that the area of a triangle 5 base height to calculate each of the following probabilities a Pl x lt r 2 Y Video solution available 372 Chapter 7 a Random Variables and Probability Distributions 1 b Pl x Sz 2
76. en examples in which the probability distribution of a discrete random variable has been given as a table or as a probability relative frequency histogram It is also possible to give a formula that allows calculation of the probability for each possible value of the random variable Examples of this approach are given in Section 7 5 El Exercises 7 8 7 19 0 7 8 Let x be the number of courses for which a randomly selected student at a certain university is registered The probability distribution of x appears in the following table x 1 2 3 4 5 6 7 px 02 03 09 25 40 16 05 a What is P x 4 b What is P x 4 c What is the probability that the selected student is tak ing at most five courses d What is the probability that the selected student is taking at least five courses more than five courses e Calculate P 3 S x lt 6 and P 3 lt x lt 6 Explain in words why these two probabilities are different Bold exercises answered in back Data set available online but not required 7 9 V Let y denote the number of broken eggs in a ran domly selected carton of one dozen eggs Suppose that the probability distribution of y is as follows y 0 1 2 3 4 pO 65 20 10 04 a Only y values of 0 1 2 3 and 4 have positive proba bilities What is p 4 b How would you interpret p 1 20 c Calculate P y 2 the probability that the carton contains at most two broken eggs and interpret this pr
77. enote the probability of success on any given trial Possible outcomes can be denoted as follows X Number of Trials to Outcome First Success S 1 FS 2 FFS 3 FFFFFFS 7 394 Chapter 7 a Random Variables and Probability Distributions Each possible outcome consists of 0 or more failures followed by a single success So p x P x trials to first success P FF FS x 1 failures followed by a success on trial x Because the probability of success is 7r for each trial the probability of failure for each trial is 1 m Because the trials are independent p x P x trials to first success P FF FS P F P F P F P S i ee m 1 T ar This leads us to the formula for the geometric probability distribution Geometric Probability Distribution If x is a geometric random variable with probability of success 7 for each trial then PC a ye mn A Stone Greg Ceo Getty Images Jumper Cables Consider the jumper cable problem described previously For this problem 7 4 because 40 of the students who drive to campus carry jumper cables The proba bility distribution of x number of students who must be stopped before finding a student with jumper cables p x 6 4 x 1 2 3 The probability distribution can now be used to compute various probabilities For example the probability that the first student stopped has jumper cables i e x l is p 1 6 4
78. equired 7 6 a Normal Distributions 413 b The 77th percentile c The 50th percentile d The 9th percentile e What is the relationship between the 70th z percentile and the 30th z percentile 7 72 Consider the population of all 1 gal cans of dusty rose paint manufactured by a particular paint company Suppose that a normal distribution with mean u 5 ml and standard deviation o 0 2 ml is a reasonable model for the distribution of the variable x amount of red dye in the paint mixture Use the normal distribution model to calculate the following probabilities a P x lt 5 0 d P 4 6 lt x lt 5 2 b P x lt 5 4 e P x gt 4 5 c P x 5 4 f P x gt 4 0 7 73 Consider babies born in the normal range of 37 43 weeks gestational age Extensive data support the as sumption that for such babies born in the United States birth weight is normally distributed with mean 3432 g and standard deviation 482 g Are Babies Normal The American Statistician 1999 298 302 a What is the probability that the birth weight of a ran domly selected baby of this type exceeds 4000 g is be tween 3000 and 4000 g b What is the probability that the birth weight of a ran domly selected baby of this type is either less than 2000 g or greater than 5000 g c What is the probability that the birth weight of a ran domly selected baby of this type exceeds 7 1b Hint 1 lb 453 59 g d How would you characterize
79. erent standard deviations Which distribution i or ii has the larger standard deviation Finally Figure 7 10 c shows three continuous distributions with different means and standard deviations Which of the three distributions has the largest mean Which has a mean of about 5 Which distribution has the smallest standard deviation The correct answers to our questions are the following Figure 7 10 a ii has a mean of 6 and Figure 7 10 a i has a mean of 10 Figure 7 10 b ii has the larger standard deviation Figure 7 10 c iii has P x PQ 40 40 30 30 20 20 10 10 0 0 02 4 6 8 1012141618 0 2 4 6 8 1012141618 i ii a Density Density 0 2 0 2 0 1 0 1 0 0 0 10 20 0 10 20 i ii b Density Density Density 0 2 0 2 0 2 0 1 0 1 0 1 i i 3 15 5 5 15 5 5 15 G ii iii Figure 7 10 Some probability distributions a different values of u with the same value of a b different values of o with the same value of u c different values of H and o 374 Chapter 7 a Random Variables and Probability Distributions the largest mean Figure 7 10 c ii has a mean of about 5 and Figure 7 10 c ii has the smallest standard deviation It is customary to use the terms mean of the random variable x and mean of the probability distribution of x interchangeably Similarly the standard deviation of the random variable x and the standard deviation of the probability distribution of x re fer to the same thing Alt
80. ers for evaluating the probability that z is between two values a and b It is possible your calculator has a table for you to fill in the values as in Figure 7 51 For a calculator utilizing this strategy you would have to fill in the lower bound upper bound and standard deviation a and mean u Other calculators ask for the mean and standard deviation as parameters of the function If your calculator uses this strategy your built in cumulative distribution function will have syntax something like this normalcdf lower bound upper bound u o For a calculator using this syntax you would fill in the lower bound and upper bound with the appropriate values for z and ignore the optional parameters since z will have a standard normal distribution You will specify other values for u and when per forming calculations that are not already in terms of z scores After navigating your calculator s menus you will enter something like this normalcdf z lower z upper Let s find the probability that z is between 1 76 and 58 We enter the function as normalcdf 1 76 0 58 and the calculator will return a value of 0 6798388789 We would not suggest writing all those digits rounding off to 6798 is perfectly fine as you may have surmised from considering Appendix Table 2 As you might guess from its name the normalcdf function can also be used for calculation of the cumulative distribution function that is finding the pro
81. es What assumption must be made to justify these probability calculations Do you think this is reasonable or not Explain 3 Suppose that a carton of one dozen eggs does happen to have exactly three eggs that carry salmonella and that the manager does as he proposes selects three eggs at ran dom and throws them out then uses the remaining nine eggs in four egg quiches Let x number of eggs that carry salmonella among four eggs selected at random from the remaining nine Working with a partner conduct a simulation to ap proximate the distribution of x by carrying out the follow ing sequence of steps a Take 12 identical slips of paper and write Good on 9 of them and Bad on the remaining 3 Place the slips of paper in a paper bag or some other container b Mix the slips and then select three at random and re move them from the bag c Mix the remaining slips and select four eggs from the bags d Note the number of bad eggs among the four selected This is an observed x value e Replace all slips so that the bag now contains all 12 eggs f Repeat Steps b d at least 10 times each time recording the observed x value 4 Combine the observations from your group with those from the other groups Use the resulting data to approxi mate the distribution of x Comment on the resulting dis tribution in the context of the risk of salmonella exposure if the manager s proposed procedure is used
82. espects Window Widths Continued The sample size for the contact window width data of Example 7 29 is n 10 The critical r from Table 7 2 is then 880 The correlation coefficient calculated using the normal score observed value pairs is r 995 Because r is larger than the critical r for a sample of size 10 it is plausible that the population distribution of window widths from which this sample was drawn is approximately normal 418 Chapter 7 a Random Variables and Probability Distributions a Transforming Data to Obtain a Distribution That Is Approximately Normal c5 lt 455coxsessaisedsnnrsmessdecenindcavstandensecetonmtbncetalsavainianidtonteouncs Many of the most frequently used statistical methods are valid only when the sample is selected at random from a population whose distribution is at least approximately normal When a sample histogram shows a distinctly nonnormal shape it is common to use a transformation or reexpression of the data By transforming data we mean ap plying some specified mathematical function such as the square root logarithm or re ciprocal to each data value to produce a set of transformed data We can then study and summarize the distribution of these transformed values using methods that require normality We saw in Chapter 5 that with bivariate data one or both of the variables can be transformed in an attempt to find two variables that are linearly related With univariate data a transformat
83. esults Let s see what happens PB x 6 binomcdf 50 1 6 binomcdf 50 1 2 0 7702268435 0 1117287563 0 6584980872 To evaluate this probability using the normal curve approximation we will use the machine accuracy of the calculator with u 50 1 5 and o V50 1 9 2 121320344 normalcdf lower bound upper bound u oJ normalcdf 2 5 6 5 5 2 121320344 0 6409535402 The difference between the binomial and the normal approximation in this case is 0 017544547 It is interesting to note that using a rule of thumb with 5 instead of 10 would call this difference acceptable We do not argue with this rule of thumb in principle but once again point out that the individual judgment by the investigator on site must be used in evaluating the goodness of the approximation Finally we will superimpose the appropriate normal distribution over the bino mial distribution to get a visual sense of the approximation It is entirely possible that a given approximation will do a better job for different choices of values of the end points of the interval and the graphs may give us an overall sense of when a normal approximation might be acceptable Graphing a binomial distribution and a normal distribution at the same time in volves skills we have seen in previous Explorations You may want to refer back to the calculator explorations about the binomial and normal distributions to refresh your memory We will g
84. et x denote the number of false alarms in arandom sample of 100 alarms Give approximations to the following probabilities a P 20 x S 30 b P 20 lt x lt 30 c P 35 x d The probability that x is farther than 2 standard devia tions from its mean value 7 99 Suppose that 65 of all registered voters in a certain area favor a 7 day waiting period before purchase of a Y Video solution available handgun Among 225 randomly selected voters what is the probability that a At least 150 favor such a waiting period b More than 150 favor such a waiting period c Fewer than 125 favor such a waiting period 7 100 Flash bulbs manufactured by a certain company are sometimes defective a If 5 of all such bulbs are defective could the tech niques of this section be used to approximate the probabil ity that at least 5 of the bulbs in a random sample of size 50 are defective If so calculate this probability if not explain why not b Reconsider the question posed in Part a for the proba bility that at least 20 bulbs in a random sample of size 500 are defective Bold exercises answered in back Activity 7 1 Rotten Eggs Data set available online but not required Activity 7 1 Rotten Eggs 429 7 101 A company that manufactures mufflers for cars of fers a lifetime warranty on its products provided that own ership of the car does not change Suppose that only 20 of its mufflers are replaced under this wa
85. f time the first person has to wait a What is the probability distribution of w b How much time do you expect to elapse between the two arrivals 7 116 Four people a b c and d are waiting to give blood Of these four a and b have type AB blood whereas c and d do not An emergency call has just come in for some type AB blood If blood samples are taken one by one from the four people in random order for blood typing and x is the number of samples taken to obtain an AB in dividual so possible x values are 1 2 and 3 what is the probability distribution of x 7 117 Bob and Lygia are going to play a series of Trivial Pursuit games The first person to win four games will be declared the winner Suppose that outcomes of successive games are independent and that the probability of Lygia winning any particular game is 6 Define a random vari able x as the number of games played in the series a What is p 4 Hint Either Bob or Lygia could win four straight games b What is p 5 Hint For Lygia to win in exactly five games what has to happen in the first four games and in Game 5 c Determine the probability distribution of x d How many games can you expect the series to last 7 118 Refer to Exercise 7 117 and let y be the number of games won by the series loser Determine the probability distribution of y 7 119 A sporting goods store has a special sale on three brands of tennis balls call them D P and W
86. fective components is subject to uncertainty such variables are called random variables Improve your understanding and save time Visit www cengage com login where you will find Step by step instructions for MINITAB Excel TI 83 SPSS Exam prep pre tests that build a Personalized Learning and JMP Plan based on your results so that you know exactly a Video solutions to selected exercises what to study Data sets available for selected examples and exercises Help from alive statistics tutor 24 hours a day 357 358 Chapter 7 a Random Variables and Probability Distributions Possible values of a discrete random variable Figure 7 1 Two different types of random variables In this chapter we begin by distinguishing between discrete and continuous nu merical variables We show how variation in both discrete and continuous numerical variables can be described by a probability distribution this distribution can then be used to make probability statements about values of the random variable Special em phasis is given to three commonly encountered probability distributions the binomial geometric and normal distributions In most chance experiments an investigator focuses attention on one or more variable quantities For example consider a management consultant who is studying the oper ation of a supermarket The chance experiment might involve randomly selecting a customer leaving the store One interesting numerical varia
87. gure 7 41 Histograms of sperm concentrations for 1711 suspected infertile men a untransformed data highly skewed b log transformed data reasonably symmet ric c square root transformed data d cube root transformed data 422 Chapter 7 a Random Variables and Probability Distributions m Exercises 7 81 7 92 0 7 81 Ten measurements of the steam rate in pounds per hour of a distillation tower were used to construct the following normal probability plot A Self Descaling Dis tillation Tower Chemical Engineering Process 1968 79 84 Based on the plot do you think it is reasonable to assume that the normal distribution provides an adequate description of the steam rate distribution Explain Observation 1800 1500 1200 Meese a aaa Normal 1 60 0 80 0 00 0 80 1 60 2 40 score 7 82 The following normal probability plot was con structed using part of the data appearing in the paper Trace Metals in Sea Scallops Environmental Con centration and Toxicology 19 1326 1334 Observation fete aa easee hana HESSSeS Normal 1 60 0 80 0 00 0 80 1 60 2 40 score The variable under study was the amount of cadmium in North Atlantic scallops Do the sample data suggest that the cadmium concentration distribution is not normal Explain 7 83 Consider the following 10 observations on the life time in hours for a certain type of component 152 7 172 0 172 5 173 3 193 0 204 7 216 5 234 9
88. have scores over 125 and the other half have scores below 75 410 Chapter 7 a Random Variables and Probability Distributions Figure 7 34 P 75 lt x lt 125 and cor responding z curve area for the IQ problem of Ex ample 7 26 Shaded area 9050 1 67 0 1 67 When we translate from a problem involving a normal distribution with mean wu and standard deviation to a problem involving the standard normal distribution we convert to z scores X H gs 2 o Because a z score can be interpreted as giving the distance of an x value from the mean in units of the standard deviation a z score of 1 4 corresponds to an x value that is 1 4 standard deviations above the mean and a z score of 2 1 corresponds to an x value that is 2 1 standard deviations below the mean Suppose that we are trying to evaluate P x lt 60 for a variable whose distribution is normal with u 50 and 5 Converting the endpoint 60 to a z score gives 60 50 5 which tells us that the value 60 is 2 standard deviations above the mean We then have P x lt 60 P z lt 2 where z is a standard normal variable Notice that for the standard normal distribution the value 2 is 2 standard deviations above the mean because the mean is 0 and the standard deviation is 1 The value z 2 is located the same distance measured in standard deviations from the mean of the standard normal distribution as is the value x 60 from the mean in the normal
89. he closest cumulative area in the table is 0202 in the 2 0 row and 05 column we will use z 2 05 the best approximation from the table Variable values less than 2 05 make up the smallest 2 of the standard normal distribution Now suppose that we had been interested in the largest 5 of all z values We would then be trying to find a value of z for which P z gt z 05 7 6 a Normal Distributions 405 z curve as illustrated in Figure 7 26 Because Appendix Table 2 always works with cumulative area area to the left the first step is to determine 05 area to the left of z 1 05 95 Looking for the cumulative area closest to 95 in Appendix Table 2 we find that 95 falls exactly halfway between 9495 corresponding to a z value of 2 iN 1 64 and 9505 corresponding to a z value of 1 65 Because 9500 is ex Largest actly halfway between the two areas we use a z value that is halfway be 5 tween 1 64 and 1 65 If one value had been closer to 9500 than the other i we would just use the z value corresponding to the closest area This gives Figure 7 26 The largest 5 of the standard normal distribution z 1 64 1 65 1 645 Values greater than 1 645 make up the largest 5 of the standard normal distribu tion By symmetry 1 645 separates the smallest 5 of all z values from the others Example 7 24 More Extremes Sometimes we are interested in identifying the most extreme un usually large
90. hibit considerably more variability than a long sequence of values from the other distribution Defective Components A television manufacturer receives certain components in lots of four from two dif ferent suppliers Let x and y denote the number of defective components in randomly selected lots from the first and second suppliers respectively The probability distri butions for x and y are as follows x 0 1 2 3 4 y 0 1 2 3 4 px 4 3 2 l 0 pO 2 6 2 0 0 Probability histograms for x and y are given in Figure 7 12 It is easy to verify that the mean values of both x and y are 1 so for either sup plier the long run average number of defective components per lot is 1 However the two probability histograms show that the probability distribution for the second sup plier is concentrated closer to the mean value than is the first supplier s distribution P x Pty Example 7 11 7 4 a Mean and Standard Deviation of a Random Variable 377 The greater spread of the first distribution implies that there will be more vari ability in a long sequence of observed x values than in an observed sequence of y values For example the y sequence will contain no 3 s whereas in the long run 10 of the observed x values will be 3 As with s and s the variance and standard deviation of x involve squared devia tions from the mean A value far from the mean results in a large squared deviation However such a value contributes substant
91. hip Suppose that we are interested in the proportion of the population with IQ scores below 80 that is P x lt 80 With b 80 b 80 100 _ T 15 b 1 33 Figure 7 31 Normal distribution and desired pro portion for Example 7 26 Figure 7 32 P x gt 130 and corresponding z curve area for the IQ problem of Example 7 26 Figure 7 33 P x lt 80 and corresponding z curve area for the IQ problem of Example 7 26 7 6 a Normal Distributions 409 P x gt 130 proportion who are eligible for Mensa 100 130 Shaded area Normal curve for 0228 z curve u 100 0 15 100 130 So P x lt 80 P z lt 1 33 z curve area to the left of 1 33 0918 as shown in Figure 7 33 This probability 0918 tells us that just a little over 9 of the population has an IQ score below 80 Shaded area 0918 80 100 Now consider the proportion of the population with IQs between 75 and 125 Using a 75 and b 125 we obtain a _ 75 100 125 100 oS SSS M 15 1 67 be 15 1 33 0 1 67 SO P 15 lt x lt 125 P 1 67 lt z lt 1 67 z curve area between 1 67 and 1 67 z curve area to the left of 1 67 z curve area to the left of 1 67 9525 0475 9050 This is illustrated in Figure 7 34 The calculation tells us that 90 5 of the popula tion has an IQ score between 75 and 125 Of the 9 5 whose IQ score is not be tween 75 and 125 half of them 4 75
92. hough the mean and standard deviation are computed differ ently for discrete and continuous random variables the interpretation is the same in both cases a Mean Value of a Discrete Random Variable 0 0 eee eeeeceeeeeeeeeeeees Consider an experiment consisting of the random selection of an automobile licensed in a particular state Let the discrete random variable x be the number of low beam headlights on the selected car that need adjustment Possible x values are 0 1 and 2 and the probability distribution of x might be as follows x value 0 1 2 Probability 5 3 2 The corresponding probability histogram appears in Figure 7 11 p In a sample of 100 cars the sample relative frequencies might differ somewhat from the given probabilities which are the limiting relative frequencies We might see x value 0 1 Frequency 46 33 21 The sample average value of x for these 100 observations is then the sum of 46 zeros 33 ones and 21 twos all divided by 100 46 0 33 0 21 2 x 0 1 2 100 46 33 21 Figure 7 11 Probability histogram for Jo 2 Ja i Je the distribution of the number of head rel freq of 0 0 rel freq of 1 1 rel freq of 2 2 lights needing adjustments 75 As the sample size increases each relative frequency approaches the correspond ing probability In a very long sequence of experiments the value of x approaches probability that x 0 0 probability that x
93. ially to variability in x only if the probabil ity associated with that value is not too small For example if u 1 and x 25 isa possible value then the squared deviation is 25 1 576 If however P x 25 000001 the value 25 will hardly ever be observed so it won t contribute much to vari ability in a long sequence of observations This is why each squared deviation is mul tiplied by the probability associated with the value to obtain a measure of variability When the probability distribution describes how x values are distributed among members of a population so that the probabilities are population relative frequencies a and g are the population variance and standard deviation of x respectively Defective Components Revised For x number of defective components in a lot from the first supplier in Ex ample 7 10 a 0 1 p 0 1 1 p 1 2 1 p 2 3 1 P 3 Ox 0 3 1 2 4 G1 Therefore o 1 0 For y the number of defectives in a lot from the second supplier o 0 1 2 1 6 2 12 4 Then o V 4 632 The fact that o gt g confirms the impression conveyed by Figure 7 12 concerning the variability of x and y 378 Chapter 7 a Random Variables and Probability Distributions Example 7 12 More on Apgar scores Reconsider the distribution of Apgar scores for children born at a certain hospi Le tal introduced in Example
94. iation of the number that pass inspection d What is the probability that among 25 randomly se lected cars the number that pass is within 1 standard devi ation of the mean value 7 56 You are to take a multiple choice exam consisting of 100 questions with 5 possible responses to each question Suppose that you have not studied and so must guess se lect one of the five answers in a completely random fash ion on each question Let x represent the number of cor rect responses on the test a What kind of probability distribution does x have b What is your expected score on the exam Hint Your expected score is the mean value of the x distribution c Compute the variance and standard deviation of x d Based on your answers to Parts b and c is it likely that you would score over 50 on this exam Explain the reasoning behind your answer 7 57 Suppose that 20 of the 10 000 signatures on a cer tain recall petition are invalid Would the number of in valid signatures in a sample of size 1000 have approxi mately a binomial distribution Explain 7 58 A coin is spun 25 times Let x be the number of spins that result in heads H Consider the following rule for deciding whether or not the coin is fair Judge the coin fair if 8 x 17 Judge the coin biased if either x 7 or x 18 Bold exercises answered in back Data set available online but not required a What is the probability of judging the coin biased
95. ic histogram s 7 88 The following data are a sample of survival times days from diagnosis for patients suffering from chronic leukemia of a certain type Statistical Methodology for Survival Time Studies Bethesda MD National Cancer Institute 1986 7 47 58 74 177 232 273 285 317 429 440 445 455 468 495 497 532 571 579 581 650 702 715 719 881 900 930 968 1077 1109 1314 1334 1367 1534 1712 1784 1877 1886 2045 2056 2260 2429 2509 a Construct a relative frequency distribution for this data set and draw the corresponding histogram b Would you describe this histogram as having a positive or a negative skew c Would you recommend transforming the data Explain 7 89 In a study of warp breakage during the weaving of fabric Technometrics 1982 63 100 pieces of yarn were tested The number of cycles of strain to breakage Bold exercises answered in back Data set available online but not required was recorded for each yarn sample The resulting data are given in the following table 86 146 251 653 98 249 400 292 131 176 76 264 15 364 195 262 88 264 42 321 180 198 38 20 61 121 282 180 325 250 196 90 229 166 38 337 341 40 40 135 597 246 211 180 93 571 124 279 81 186 497 182 423 185 338 290 398 71 246 185 188 568 55 244 20 284 93 396 203 829 239 236 277 143 198 264 105 203 124 137 135 169 157 224 65 315 229 55 286 350 193 175 220 149 151 353 400 61 194 188 a Construct a frequency distribution using the class in
96. ion is usually chosen to yield a distribution of trans formed values that is more symmetric and more closely approximated by a normal curve than was the original distribution Example 7 31 Rainfall Data e Data that have been used by several investigators to introduce the concept of trans formation e g Exploratory Methods for Choosing Power Transformations Journal of the American Statistical Association 1982 103 108 consist of values of March precipitation for Minneapolis St Paul over a period of 30 years These values are given in Table 7 3 along with the square root of each value Histograms of both the original and the transformed data appear in Figure 7 38 The distribution of the origi nal data is clearly skewed with a long upper tail The square root transformation re sults in a substantially more symmetric distribution with a typical i e central value near the 1 25 boundary between the third and fourth class intervals Table 7 3 Original and Square Root Transformed Values of March Precipitation in Minneapolis St Paul over a 30 year Period Year eONNDNFWN Ke O 10 11 12 13 14 15 Precipitation V Precipitation Year Precipitation V Precipitation wT 88 16 1 62 1 27 1 74 1 32 17 1 31 1 14 81 90 18 32 57 1 20 1 10 19 59 Tt 1 95 1 40 20 81 90 1 20 1 10 21 2 81 1 68 47 69 22 1 87 1 37 1 43 1 20 23 1 18 1 09 3 37 1 84 24 1 35 1 16 2 20 1 48 25 4 75 2 18 3 00 1 73 26 2 48 1 57 3 09 1 76 27 9
97. istribution we will illustrate these calculations using an example Suppose that 60 of all computer monitors have a flat panel display and 40 have a CRT display Suppose further that the next 12 purchases monitored and the random variable is defined as the number of flat panel monitors in the next 12 purchases As an example the probability exactly four monitors would be flat panel displays is De Cn a e466 42 Even if your calculator does not have special binomial functions it is likely to have a key for the combinations C possibly cleverly hidden in the math or probabil ity menu The calculator keystrokes might look like this don t forget that to perform the C calculation above you will have to press n then the C key and then r 12 C 4 644 4 8 If your calculator has built in binomial capabilities you will have fewer keystrokes Let s consider these problems one at a time starting with the function names If your calculator has a built in function for binomial calculations it probably has two a func tion for finding the probability that x is equal to a given value and a function for finding the probability that x is less than or equal to a given value The first function is known to statisticians as a density function and is commonly abbreviated pdf for probability density function The second is known as a cumulative distribution function and is c
98. l distribution For some continuous distributions cumulative areas can be calculated using meth ods from the branch of mathematics called integral calculus However because we are not assuming knowledge of calculus we will rely on tables that have been constructed for the commonly encountered continuous probability distributions Figure 7 9 Calculation P 25 lt x lt 75 P x lt 75 of P a lt x lt b using cu mulative areas P x lt 25 7 3 a Probability Distributions for Continuous Random Variables 371 FS Exercises 7 20 7 26 aaa 7 20 Let x denote the lifetime in thousands of hours of a certain type of fan used in diesel engines The density curve of x is as pictured OD T T 0 25 50 Shade the area under the curve corresponding to each of the following probabilities draw a new curve for each part a P 10 lt x lt 25 P 10 x 25 P x lt 30 The probability that the lifetime is at least 25 000 hr The probability that the lifetime exceeds 25 000 hr oan eo 7 21 A particular professor never dismisses class early Let x denote the amount of time past the hour minutes that elapses before the professor dismisses class Suppose that x has a uniform distribution on the interval from 0 to 10 min The density curve is shown in the following figure Density Time 0 10 minutes a What is the probability that at most 5 min elapse be fore dismissal b What is the probability that betwe
99. l will give us a better understanding of the issues involved when we encounter similar tradeoffs in Statistics courses yet to come At least you ll be more tolerant of those rules of thumb We shall reacquaint ourselves with some syntax and warm up with a distribution of the number of express mail packages mailed at a certain post office in a day The number is approximately normally distributed with u 18 and o 6 Suppose we wish to find the probability that 20 express mail packages are mailed in a given day We calculate the probability that in a normal distribution with u 18 and 6 the event x 20 would happen Remembering the syntax from our earlier discussion normalcdf lower bound upper bound u oJ we enter normalcdf 19 5 20 5 18 6 and our calculator returns 0 062832569 We will now compare binomial calculations with the normal approximations It is reported that 10 of live births in the United States are premature Suppose we ran domly select 250 live births and define the random variable x to be the number of these that are premature We wish to calculate the probability that x is between 15 and 30 inclusive To find the binomial probability we recall that we must use the built in function we called binomcdf This function includes the rightmost interval indicated therefore we subtract the probability of getting x less than or equal to 14 from the probability of getting x less than or equal to 30 P 1
100. lculator 2 binompdf 20 2 List2 Remember to verify 3 Now graph the probability distribution where List1 contains the possible data val ues and List2 contains the probabilities To check your work partial calculator screen output for this problem is given in Figure 7 46 a and the graph of the distribution is shown in Figure 7 46 b men be Ap LUO m LA mM i Z P E 6 a b Exploration 7 3 Geometric Probability Calculations Our calculator exploration of geometric random variables will be an echo of the bino mial random variables we have already discussed in Exploration 7 2 We again con sider 1 the probability the variable will assume a value between two given numbers 2 the probability the variable will assume a value less than a given number and 3 the probability the random variable will assume a value greater than a given number Our example here will be about jumper cables Suppose that 40 of students who drive to campus carry jumper cables If your car has a dead battery and you aren t one of the forward thinking 40 how many students will you have to ask before you find one with jumper cables Consider the first problem the probability of a particular number The probability the first student stopped has jumper cables is pl 1 a 1 4 1 4 4 The corresponding keystrokes for finding this probability will be something like 1 0 4 40 0 4 Now le
101. ld all the bulbs be replaced so that no more than 20 of the bulbs will have already burned out 7 123 Suppose that 16 of all drivers in a certain city are uninsured Consider a random sample of 200 drivers a What is the mean value of the number who are unin sured and what is the standard deviation of the number who are uninsured b What is the approximate probability that between 25 and 40 inclusive drivers in the sample were uninsured c If you learned that more than 50 among the 200 drivers were uninsured would you doubt the 16 figure Explain 7 124 Let x denote the duration of a randomly selected pregnancy the time elapsed between conception and birth Accepted values for the mean value and standard deviation of x are 266 days and 16 days respectively Suppose that the probability distribution of x is approximately normal a What is the probability that the duration of pregnancy is between 250 and 300 days Y Video solution available 434 Chapter 7 a Random Variables and Probability Distributions b What is the probability that the duration of pregnancy is at most 240 days c What is the probability that the duration of pregnancy is within 16 days of the mean duration d A Dear Abby column dated January 20 1973 con tained a letter from a woman who stated that the duration of her pregnancy was exactly 310 days She wrote that the last visit with her husband who was in the navy oc curred 310 days b
102. mallest observation and so on Extensive tabulations of normal scores for many different sample sizes are avail able Alternatively many software packages such as MINITAB and SAS and some graphing calculators can compute these scores on request and then construct a normal probability plot Not all calculators and software packages use the same algorithm to compute normal scores However this does not change the overall character of a nor mal probability plot so either the tabulated values or those given by the computer or calculator can be used After the sample observations are ordered from smallest to largest the smallest normal score is paired with the smallest observation the second smallest normal score with the second smallest observation and so on The first number in a pair is the nor mal score and the second number in the pair is the observed data value A normal probability plot is just a scatterplot of the normal score observed value pairs If the sample has been selected from a standard normal distribution the second number in each pair should be reasonably close to the first number ordered observa tion corresponding mean value Then the n plotted points will fall near a line with slope equal to 1 a 45 line passing through 0 0 When the sample has been obtained from some normal population distribution but not necessarily the standard normal distribution the plotted points should be close to some straight line
103. many different normal distributions and they are distinguished from one another by their mean yp and standard deviation The mean u of Figure 7 17 A normal distribution a normal distribution describes where the corresponding curve is centered and the standard deviation describes how much the curve spreads out around that center As with all continuous probability distributions the total area under any normal curve is equal to 1 Three normal distributions are shown in Figure 7 18 Notice that the smaller the standard deviation the taller and narrower the corresponding curve Recall that areas under a continuous probability distribution curve represent probabilities so when the standard deviation is small a larger area is concentrated near the center of the curve and the chance of observing a value near the mean is much greater because m is at the center The value of u is the number on the measurement axis lying directly below the top of the bell The value of o can also be ascertained from a picture of the curve Consider the normal curve in Figure 7 19 Starting at the top of the bell above u 100 and mov ing to the right the curve turns downward until it is above the value 110 After that point it continues to decrease in height but is turning upward rather than downward Similarly to the left of u 100 the curve turns downward until it reaches 90 and then begins to turn upward The curve changes from turning downward to tu
104. mpdf 12 6 4 Our calculator gives us 0 042042 which is the correct answer This is a good sign Now try this on your calculator Remember you must navigate to the function in the manner presented in your manual and you have to pay attention to the syntax While you are learning how to use this function or any calculator function it is a very good idea to use examples with known answers and check the results Now suppose you wish to find the probability of getting 4 or fewer flat panel monitors out of 12 The appropriate function here is the cumulative density function or cdf binomedf numtrials p x a Graphing Calculator Explorations 437 Does this look disturbingly familiar Except for the c instead of the p they look exactly alike The good news is that we already understand the syntax the bad news is that if we aren t careful we might get the wrong function in haste Be careful The function binomcdf 12 6 4 gives the answer 0573099213 If we are not convinced of our prowess with binomedf we can use binompdf to check the result binompdf 12 6 0 binompdf 12 6 1 binompdf 12 6 4 000016777 0003019898 042042 05731 Now let s move on to another of the common calculations with random variables What is the probability that the random variable will assume a value between 4 and 7 One common source of confusion here is that the word between is disturbingly am biguous
105. ms of these observations and construct a histogram Is the log transformation successful in produc ing a more symmetric distribution 1 and V original value construct a histogram of the transformed data Does it appear to resemble a normal curve d Consider transformed value 7 92 The following figure appeared in the paper EDTA Extractable Copper Zinc and Manganese in Soils of the Canterbury Plains New Zealand Journal of Agricultural Research 1984 207 217 A large number of topsoil samples were analyzed for manganese Mn zinc Zn and copper Cu and the resulting data were summarized using histograms The investigators transformed each data set using logarithms in an effort to obtain more symmetric distributions of values Do you think the transformations were successful Explain Log transformed data 0 6 0 0 Y Video solution available 7 8 a Using the Normal Distribution to Approximate a Discrete Distribution 425 Untransformed data Log transformed data 2 30 Mn amp 3 w D oO g 3 zZ 15 30 45 60 1 0 2 0 ug g log o Ug g Bold exercises answered in back Data set available online but not required Y Video solution available Using the Normal Distribution to Approximate a Discrete Distribution The distribution of many random variables can be approximated by a carefully chosen normal distribution In this section we show how probabilities for some discrete ran dom variables can
106. n and from a person diagnosed as psychotic and asked to identify the psychotic s handwriting The graphologist made correct identifications in 6 of the 10 trials data taken from Statis tics in the Real World by R J Larsen and D F Stroup New York Macmillan 1976 Does this evidence indi Y Video solution available 396 Chapter 7 a Random Variables and Probability Distributions cate that the graphologist has an ability to distinguish the handwriting of psychotics Hint What is the probability of correctly guessing 6 or more times out of 10 Your an swer should depend on whether this probability is rela tively small or relatively large 7 54 Suppose that the probability is 1 that any given citrus tree will show measurable damage when the temperature falls to 30 F If the temperature does drop to 30 F what is the expected number of citrus trees showing damage in or chards of 2000 trees What is the standard deviation of the number of trees that show damage 7 55 Thirty percent of all automobiles undergoing an emissions inspection at a certain inspection station fail the inspection a Among 15 randomly selected cars what is the proba bility that at most 5 fail the inspection b Among 15 randomly selected cars what is the probabil ity that between 5 and 10 inclusive fail to pass inspection c Among 25 randomly selected cars what is the mean value of the number that pass inspection and what is the standard dev
107. nds In the case of the normal approximation to the binomial the goodness of fit depends on the two quantities which define the binomial distribution n and zr Most statisticians have a simple rule of thumb they apply for approximating the binomial with a normal distribution such as When either nm lt 10 or n 1 77 lt 10 the binomial distribution is too skewed for the normal approximation to give accurate results Different statisticians have different rules of thumb some feeling comfortable with the accuracy provided by using 5 instead of 10 in the rule of thumb above In days a Graphing Calculator Explorations 443 of yore that is the precalculator days students would have to accept the rule of thumb as one of the mysteries of statistics In more modern times a statistics student armed with her calculator can not only understand what the rules of thumb are all about but evaluate the various rules of thumb for a particular n and 7 pair It might be argued that using the normal distribution to approximate a distribution that we can evaluate exactly seems a little foolish There is something to this argu ment but remember we will not always be able to find exact probabilities in other situations in statistics and must rely on approximations Using an approximation in volves a fundamental tradeoff between ease of calculation and exactness of answer An understanding of this with the normal approximation to the binomia
108. normally distributed with mean 66 in 5 ft 6 in and standard deviation 2 in a Is the claim that 94 of all women are shorter than 5 ft 7 in correct b What proportion of adult women would be excluded from employment as a result of the height restriction 7 114 The longest run of S s in the sequence SSFSSSSFFS has length 4 corresponding to the S s on the fourth fifth sixth and seventh trials Consider a binomial experiment with n 4 and let y be the length number of trials in the longest run of S s Y Video solution available a When m 5 the 16 possible outcomes are equally likely Determine the probability distribution of y in this case first list all outcomes and the y value for each one Then calculate u b Repeat Part a for the case 7 6 c Let z denote the longest run of either S s or F s Deter mine the probability distribution of z when m 5 7 115 Two sisters Allison and Teri have agreed to meet between 1 and 6 P M on a particular day In fact Allison is equally likely to arrive at exactly 1 P M 2 P M 3 P M 4 P M 5 P M or 6 P M Teri is also equally likely to arrive at each of these six times and Allison s and Teri s arrival times are independent of one another Thus there are 36 equally likely Allison Teri arrival time pairs for ex ample 2 3 or 6 1 Suppose that the first person to ar rive waits until the second person arrives let w be the amount o
109. obability d Calculate P y lt 2 the probability that the carton contains fewer than two broken eggs Why is this smaller than the probability in Part c Y Video solution available 366 Chapter 7 a Random Variables and Probability Distributions e What is the probability that the carton contains exactly 10 unbroken eggs f What is the probability that at least 10 eggs are un broken 7 10 A restaurant has four bottles of a certain wine in stock Unbeknownst to the wine steward two of these bot tles Bottles 1 and 2 are bad Suppose that two bottles are ordered and let x be the number of good bottles among these two a One possible experimental outcome is 1 2 Bottles 1 and 2 are the ones selected and another is 2 4 List all possible outcomes b Assuming that the two bottles are randomly selected from among the four what is the probability of each out come in Part a c The value of x for the 1 2 outcome is 0 neither se lected bottle is good and x 1 for the outcome 2 4 Determine the x value for each possible outcome Then use the probabilities in Part b to determine the probabil ity distribution of x 7 11 Airlines sometimes overbook flights Suppose that for a plane with 100 seats an airline takes 110 reservations Define the variable x as the number of people who actually show up for a sold out flight From past experience the probability distribution of x is given in the following table x
110. om Variables and Probability Distributions Example 7 5 If one possible value of x is 2 we often write p 2 in place of P x 2 Similarly p 5 denotes the probability that x 5 and so on Hot Tub Models Suppose that each of four randomly selected customers purchasing a hot tub at a certain store chooses either an electric E or a gas G model Assume that these customers make their choices independently of one another and that 40 of all cus tomers select an electric model This implies that for any particular one of the four customers P E 4 and P G 6 One possible experimental outcome is EGGE where the first and fourth customers select electric models and the other two choose gas models Because the customers make their choices independently the multiplica tion rule for independent events implies that P EGGE P 1st chooses E and 2nd chooses G and 3rd chooses G and 4th chooses E P E P G P G P E 4 6 6 4 0576 Similarly P EGEG P E P G P E P G 4 6 4 6 0576 identical to P EGGE and P GGGB 6 6 6 4 0864 The number among the four customers who purchase an electric hot tub is a random variable Let x the number of electric hot tubs purchased by the four customers Table 7 1 displays the 16 possible experimental outcomes the probability of each outcome and the value of the random variable x that is associated with each outcome The probability distribution
111. ommonly labeled cdf These two functions on your calculator will in all likelihood mirror these abbreviations The second problem you will face is that to find the binomial probabilities the cal culator will need more than just one number and the order you enter the numbers does make a difference Look in your calculator manual for something that looks like binomial especially with a pdf somewhere The function could be obvious like binompdf or it may be a little more cryptic like binpdf Your manual will be very careful to specify both what the needed function parameters are and the order you should enter them As an example one type of calculator has the following binompdf numtrials p x The manual informs that numtrials is the number of trials p is the probability of success and that x can be either an integer or a list of integers This information col lectively explains what is known as the syntax of the function It is your responsi bility to get the numbers right and get them in the right order The square brackets J are a standard notation in the calculator world They indicate the bracketed quan tity is either optional or defaults to a preselected option if you do not enter a number in that space For our example the number of trials is 12 and the probability of suc cess is 0 6 Since the probability of exactly 4 flat panel monitors is desired we enter bino
112. onal If you leave them out the normalpdf function will simply default to the stan dard normal curve with mean 0 and standard deviation or variance 1 Let s graph the three normal curves The first has a mean and standard deviation of 10 and 5 respectively The second has a mean and standard deviation of 40 and 2 5 and the third a mean and standard deviation of 70 and 10 Navigate your calculator s menu system to find the normal curve function and paste this function into the func tion definition window where you usually define simpler functions Using the syntax above you should see your calculator s equivalent of the following a u yl normalpdf x 10 5 y2 normalpdf x 40 2 5 y3 normalpdf x 70 10 a b Figure 7 49 a Window settings b normal curves 7 50 The standard normal distribution Normal C D Lower 0 Upper 0 o 0 u 0 Execute 7 51 Setup for normal calculations a Graphing Calculator Explorations 441 Graphing these functions using the window setting in Figure 7 49 a we see the graphs in Figure 7 49 b Now let s graph the standard normal distribution If your calculator syntax indi cates that it defaults to a standard normal you will only have to enter your calculator equivalent of yl normalpdf x It is also possible that your calculator does not default to standard normal in which case you would have to specify the mean and s
113. op ulation distribution If the plot is reasonably straight this assumption is reasonable When both nz 10 and n 1 m 10 binomial proba bilities are well approximated by corresponding areas un der a normal curve with u nm ando Vn7 1 r Chapter Review Exercises 7 102 7 124 CENGAGENOW Know exactly what to study Take a pre test and receive your Personalized Learning Plan 7 102 An article in the Los Angeles Times December 8 1991 reported that there are 40 000 travel agencies na tionwide of which 11 000 are members of the American Society of Travel Agents booking a tour through an ASTA member increases the likelihood of a refund in the event of cancellation a If x is the number of ASTA members among 5000 ran domly selected agencies could you use the methods of Section 7 8 to approximate P 1200 lt x lt 1400 Why or why not b Ina random sample of 100 agencies what are the mean value and standard deviation of the number of ASTA members c If the sample size in Part b is doubled does the stan dard deviation double Explain 7 103 A soft drink machine dispenses only regular Coke and Diet Coke Sixty percent of all purchases from this machine are diet drinks The machine currently has 10 cans of each type If 15 customers want to purchase drinks before the machine is restocked what is the probability that each of the 15 is able to purchase the type of drink desired Hint Let x denote the number
114. ore generally for any variable whose distribution is described by a normal curve with mean u and standard deviation o Suppose that we want to compute P a lt x lt b the probability that the variable x lies in a particular range This probability corresponds to an area under a normal curve and above the interval from a to b as shown in Figure 7 28 a 406 Chapter 7 a Random Variables and Probability Distributions Figure 7 28 Equality of nonstandard and standard normal curve areas Equal area a u b a 0 b P a lt x lt b P a lt z lt b a b Our strategy for obtaining this probability is to find an equivalent problem in volving the standard normal distribution Finding an equivalent problem means deter mining an interval a b that has the same probability for z same area under the z curve as does the interval a b in our original normal distribution Figure 7 28 b The asterisk is used to distinguish a and b the values from the original normal distri bution with mean u and standard deviation from a and b the values from the z curve To find a and b we simply calculate z scores for the endpoints of the inter val for which a probability is desired This process is called standardizing the end points For example suppose that the variable x has a normal distribution with mean pw 100 and standard deviation 5 To calculate P 98 lt x lt 107 we first translate this problem into an equivalen
115. ormal curve roughly 95 of the val ues are within 2 standard deviations of the mean 404 Chapter 7 a Random Variables and Probability Distributions The probability that the value of z exceeds 1 96 is P z gt 1 96 1 P z lt 1 96 1 9750 0250 as shown in the following figure z curve Ta Shaded area 0250 That is 2 5 of the area under the z curve lies to the right of 1 96 in the upper tail Similarly P z gt 1 28 area to the right of 1 28 1 P z lt 1 28 1 1003 8997 90 a Identifying Extreme Values 25605 2cciciadevecsedevseasideecetedeoustadacectededuestaecstadeouenaden Example 7 23 Suppose that we want to describe the values included in the smallest 2 of a distri bution or the values making up the most extreme 5 which includes the largest 2 5 and the smallest 2 5 Let s see how we can identify extreme values in the distribu tion by working through Examples 7 23 and 7 24 Identifying Extreme Values Suppose that we want to describe the values that make up the smallest 2 of the standard normal distribution Symbolically we are trying to find a value call it z such that z curve P z lt z 02 Smallest 2 Figure 7 25 The smallest 2 of the standard normal distribution This is illustrated in Figure 7 25 which shows that the cumulative area for z is 02 Therefore we look for a cumulative area of 0200 in the body of Appendix Table 2 T
116. ound Then possible x values are whole numbers such as 4 or 9 The probability distribution can be pictured as a probability histogram in which the area of each rectangle is the prob ability of the corresponding weight value The total area of all the rectangles is 1 and the probability that a weight to the nearest pound is between two values such as 6 and 8 is the sum of the corresponding rectangular areas Figure 7 5 a illustrates this Now suppose that weight is measured to the nearest tenth of a pound There are many more possible weight values than before such as 5 0 5 1 5 7 7 3 and 8 9 As shown in Figure 7 5 b the rectangles in the probability histogram are much narrower and this histogram has a much smoother appearance than the first one Again this his togram can be drawn so that the area of each rectangle equals the corresponding prob ability and the total area of all the rectangles is 1 Figure 7 5 c shows what happens as weight is measured to a greater and greater degree of accuracy The sequence of probability histograms approaches a smooth curve The curve cannot go below the horizontal measurement scale and the total area under the curve is 1 because this is true of every probability histogram The probability that 368 Chapter 7 s Random Variables and Probability Distributions Figure 7 6 Probabilities as areas under a probability density curve 4 5 6 7 8 9 10 5 6 7 8 9 4 5 6 7 8 9 10 a b Figure 7
117. pendix Table 2 gives area under z curve to the left of z P z lt z P z z where the letter z is used to represent a random variable whose distribution is the standard nor mal distribution To find this probability locate the following IL 2 The number at the intersection of this row and column is the desired probability P z lt z The row labeled with the sign of z and the digit to either side of the decimal point for ex ample 1 7 or 0 5 The column identified with the second digit to the right of the decimal point in z for ex ample 06 if z 1 76 A portion of the table of standard normal curve areas appears in Figure 7 22 To find the area under the z curve to the left of 1 42 look in the row labeled 1 4 and the column labeled 02 the highlighted row and column in Figure 7 22 From the table the corresponding cumulative area is 9222 So z curve area to the left of 1 42 9222 We can also use the table to find the area to the right of a particular value Because the total area under the z curve is 1 it follows that 1 z curve area to the left of 1 42 1 9222 0778 z curve area to the right of 1 42 These probabilities can be interpreted to mean that in a long sequence of obser vations roughly 92 22 of the observed z values will be smaller than 1 42 and 7 78 will be larger than 1 42 7 6 a Normal Distributions 401 Figure 7 22 Portion of the table of
118. ph a discrete probability distribution but also how to find the mean and standard deviation of a discrete random variable First recall that we have encountered a similar problem before when we consid ered the problem of graphing a relative frequency histogram of a frequency distribu tion At that time frequencies were converted into relative frequencies for plotting now these relative frequencies have morphed into probabilities You begin by entering the possible values of the random variable in your calculator s equivalent of List1 and the corresponding probabilities of these values in List2 For our example we will use a numerical rating of newborn children called an Apgar score The Apgar score has eleven possible values 0 1 10 based on factors such as muscle tone skin color etc Suppose that the scores have the following probability distribution DG 0 1 2 3 4 5 6 7 8 9 10 p x 002 001 002 005 02 04 17 38 25 4 12 Ol In Figure 7 45 a a portion of the calculator screen after data entry is shown Af ter you enter the data you can graph the probability distribution by supplying the proper lists in the histogram command as was done for the relative frequency histo gram The graph for the Apgar probability distribution is shown in Figure 7 45 b The window is set so that the horizontal and vertical axis would show in the screen to give a more informative display The horizontal axis runs from 5 to 12 and the vertical axis runs
119. r customers choose electric models is P x 2 2orx 30rx 4 P x p 2 p 3 p 4 5248 Thus in the long run 52 48 of the time a group of four hot tub purchasers will in clude at least two who select electric models A probability distribution table for a discrete variable shows the possible x values and also p x for each possible x value Because p x is a probability it must be a num ber between 0 and 1 and because the probability distribution lists all possible x val 364 Chapter 7 a Random Variables and Probability Distributions ues the sum of all the p x values must equal 1 These properties of discrete proba bility distributions are summarized in the following box Properties of Discrete Probability Distributions 1 For every possible x value 0 p x 1 De l ee p x A pictorial representation of a discrete probability distribution is called a probability histogram The picture has a rectangle centered above each pos sible value of x and the area of each rectangle is the probability of the corre sponding value Figure 7 3 displays the probability histogram for the proba bility distribution of Example 7 5 In Example 7 5 the probability distribution was derived by starting with a simple experimental situation and applying basic probability rules When a derivation from fundamental probabilities is not possible because of the com plexity of the experimental situation an investigato
120. r cut Figure 7 37 displays several plots that suggest a nonnormal population distribution a Using the Correlation Coefficient to Check Normality 0 cccee eee The correlation coefficient r was introduced in Chapter 5 as a quantitative measure of the extent to which the points in a scatterplot fall close to a straight line Consider the n normal score observed value pairs smallest normal score smallest observation largest normal score largest observation Then the correlation coefficient can be computed as discussed in Chapter 5 The nor mal probability plot always slopes upward because it is based on values ordered from If 7 7 a Checking for Normality and Normalizing Transformations 417 smallest to largest so r will be a positive number A value of r quite close to 1 indicates a very strong linear relationship in the normal probability plot If ris too much smaller than 1 normality of the underlying distribution is questionable How far below 1 does r have to be before we begin to seriously doubt the plausi bility of normality The answer depends on the sample size n If n is small an r value somewhat below 1 is not surprising even when the distribution is normal but if n is large only an r value very close to 1 supports the assumption of normality For se lected values of n Table 7 2 gives critical values to which r can be compared to check for normality If your sample size is in between two tabulated
121. r often conjectures a prob ability distribution consistent with empirical evidence and prior knowledge It must also be consistent with rules of probability Specifically 1 p x 0 for every x value 0O 1 2 3 4 2 DY p x 1 all x values Figure 7 3 Probability his togram for the distribution of Example 7 5 Example 7 6 Automobile Defects A consumer organization that evaluates new automobiles customarily reports the number of major defects on each car examined Let x denote the number of ma jor defects on a randomly selected car of a certain type A large number of auto mobiles were evaluated and a probability distribution consistent with these obser vations is x 0 1 2 3 4 5 6 7 8 9 10 p x 041 010 209 223 178 114 061 028 011 004 001 The corresponding probability histogram appears in Figure 7 4 The probabilities in this distribution reflect the organization s experience For example p 3 223 indicates that 22 3 of new automobiles had 3 major defects The probability that the number of major defects is between 2 and 5 inclusive is P 2 lt x lt 5 p 2 p 3 p 4 p 5 724 If car after car of this type were examined in the long run 72 4 would have 2 3 4 or 5 major defects 7 2 a Probability Distributions for Discrete Random Variables 365 Figure 7 4 Probability P x histogram for the distribu tion of the number of major 25 defects on a randomly se lected car 20 We have se
122. raph the binomial and normal distributions for four distributions each with sample size 20 but with probabilities of success of 05 1 25 and 5 We will change the windows to make the graphs fill the windows but this should not affect any interpretations of the goodness of fit to the binomial by the normal distribution As a reminder our binomial preparations for the first graph are 1 seq x x 0 20 List1 2 binompdf 20 05 List2 3 Specify that we want a histogram with the values in Listl and the corresponding binomial probabilities in List2 For the normal curve plot define the graphing function by supplying the mean and standard deviation of the binomial as parameters for the normalpdf function Y1 normalpdf x 1 0 97468 The four plots appear in Figure 7 52 As can be seen from a comparison of the plots the normal approximation gets closer and closer to the binomial as gets closer and closer to 0 5 For m 25 the rule of thumb is satisfied for n 20 and for m 5 the rule of thumb is satisfied us ing n 10 It is a bit difficult to judge whether or not the normal approximation to the binomial is adequate for a particular situation by just looking at the plots Modern technology makes it possible to do binomial calculations quickly so the normal approximation to the binomial is not as widely used as it once was However there are other distributions in statistics that are approximately norm
123. remature Babies Premature babies are those born more than 3 weeks early Newsweek May 16 1988 reported that 10 of the live births in the United States are premature Suppose that 250 live births are randomly selected and that the number x of preemies is deter mined Because nm 250 1 25 10 n 1 a 250 9 225 0 x has approximately a normal distribution with u 250 1 25 o V250 1 9 4 743 The probability that x is between 15 and 30 inclusive is 14 5 25 30 5 25 te es 30 oO as OE P 2 21 z 1 16 8770 0136 8634 428 Chapter 7 a Random Variables and Probability Distributions as shown in the following figure p 14 5 Exercises 7 93 7 101 0 00 7 93 Let x denote the IQ for an individual selected at ran dom from a certain population The value of x must be a whole number Suppose that the distribution of x can be approximated by a normal distribution with mean value 100 and standard deviation 15 Approximate the following probabilities a P x 100 b P x 110 c P x lt 110 Hint x lt 110 is the same as x 109 d P 75 x lt 125 7 94 Suppose that the distribution of the number of items x produced by an assembly line during an 8 hr shift can be approximated by a normal distribution with mean value 150 and standard deviation 10 a What is the probability that the number of items pro duced is at most 120 b What is
124. rning upward at a distance of 10 on either side of u so o 10 In general is the distance to either side of u at which a normal curve changes from turning downward to turning upward 398 Chapter 7 a Random Variables and Probability Distributions Figure 7 18 Three nor Density mal distributions 0 15 0 05 u 70 o 10 Figure 7 19 Mean u Curve turns downward and standard deviation o for a normal curve Curve turns upward Curve turns upward 80 90 w 100 110 120 If a particular normal distribution is to be used to describe the behavior of a random variable a mean and a standard deviation must be specified For example a normal dis tribution with mean 7 and standard deviation 1 might be used as a model for the distri bution of x birth weight If this model is a reasonable description of the probability distribution we could use areas under the normal curve with u 7 and 1 to ap proximate various probabilities related to birth weight The probability that a birth weight is over 8 Ib expressed symbolically as P x gt 8 corresponds to the shaded area in Figure 7 20 a The shaded area in Figure 7 20 b is the approximate probability P 6 5 lt x lt 8 of a birth weight falling between 6 5 and 8 1b Figure 7 20 Normal P 6 5 lt x lt 8 distribution for birth weight P x gt 8 a shaded area P x gt 8 b shaded area P 6 5 lt x lt 8 a b 7 6 a Normal Distributions 399 Unfor
125. robability distribution The variance and standard deviation respectively of a discrete random variable these are measures of the extent to which the variable s distribution spreads out about the mean p This formula gives the probability of observing x suc cesses x 0 1 n among n trials of a binomial experiment The mean and standard deviation of a binomial random variable A continuous probability distribution that has a bell shaped density curve A particular normal distribution is determined by specifying values of u and o This is the normal distribution with u 0 and 1 The density curve is called the z curve and z is the letter commonly used to denote a variable having this distribu tion Areas under the z curve to the left of various values are given in Appendix Table 2 A number on the z measurement scale that captures a specified tail area or central area z is obtained by standardizing subtracting the mean and then dividing by the standard deviation When x has a normal distribution z has a standard normal distribution Term or Formula Normal probability plot Normal approximation to the binomial distribution a Chapter Review Exercises 431 Comment This fact implies that probabilities involving any normal random variable any u or a can be obtained from z curve areas A picture used to judge the plausibility of the assump tion that a sample has been selected from a normal p
126. rranty a Ina random sample of 400 purchases what is the ap proximate probability that between 75 and 100 inclusive mufflers are replaced under warranty b Among 400 randomly selected purchases what is the probability that at most 70 mufflers are ultimately replaced under warranty c If you were told that fewer than 50 among 400 ran domly selected purchases were ever replaced under war ranty would you question the 20 figure Explain Y Video solution available Background The Salt Lake Tribune October 11 2002 printed the following account of an exchange between a restaurant manager and a health inspector The recipe calls for four fresh eggs for each quiche A Salt Lake County Health Department inspector paid a visit recently and pointed out that research by the Food and Drug Administration indicates that one in four eggs carries salmonella bacterium so restaurants should never use more than three eggs when prepar ing quiche The manager on duty wondered aloud if simply throwing out three eggs from each dozen and using the remaining nine in four egg quiches would serve the same purpose 1 Working in a group or as a class discuss the folly of the above statement 2 Suppose the following argument is made for three egg quiches rather than four egg quiches Let x number of eggs that carry salmonella Then pO p 0 0 759 A22 for three egg quiches and p0 p x 0 0 75 316 for four egg quich
127. rrent Anthropology 1992 343 370 suggest that a reasonable model for the probability distribution of the continuous numerical variable x height of a randomly selected 5 year old child is a normal distribution with a mean of u 100 cm and standard deviation 0 6 cm What proportion of the heights is between 94 and 112 cm To answer this question we must find P 94 lt x lt 112 First we translate the interval endpoints to equivalent endpoints for the standard normal distribution _a p 94 100 o 6 7 b p _ 112 100 b r 6 2 00 1 00 Then P 94 lt x lt 112 P 1 00 lt z lt 2 00 z curve area to the left of 2 00 z curve area to the left of 1 00 9772 1587 8185 The probabilities for x and z are shown in Figure 7 29 If height were observed for many children from this population about 82 of them would fall between 94 and 112 cm Normal curve for 100 0 6 Shaded area 8185 z curve 94 100 106 112 1 0 1 2 What is the probability that a randomly chosen child will be taller than 110 cm To evaluate P x gt 110 we first compute a pu 110 100 anc g 6 1 67 4 Step by step technology instructions available online 408 Chapter 7 a Random Variables and Probability Distributions Then see Figure 7 30 P x gt 110 P z gt 1 67 z curve area to the right of 1 67 1 z curve area to the left of 1 67 1 9525
128. s and the corresponding value of x in the following table Outcome UUU NUU UNU UUN NNU NUN UNN NNN x value 0 1 1 1 2 2 2 3 There are only four possible x values O 1 2 and 3 and these are isolated points on the number line Thus x is a discrete random variable In some situations the random variable of interest is discrete but the number of possible values is not finite This is illustrated in Example 7 2 This Could Be a Long Game Two friends agree to play a game that consists of a sequence of trials The game continues until one player wins two trials in a row One random variable of interest might be x number of trials required to complete the game Let A denote a win for Player 1 and B denote a win for Player 2 The simplest pos sible experimental outcomes are AA the case in which Player 1 wins the first two trials and the game ends and BB the case in which Player 2 wins the first two tri als With either of these two outcomes x 2 There are also two outcomes for which x 3 ABB and BAA Some other possible outcomes and associated x values are Outcomes x value AA BB 2 BAA ABB 3 ABAA BABB 4 5 ABABB BABAA ABABABABAA BABABABABB 10 and so on 360 Chapter 7 a Random Variables and Probability Distributions Any positive integer that is at least 2 is a possible value Because the values 2 3 4 are isolated points on the number line x is determined by counting x is a discrete random vari
129. standard normal curve areas Example 7 21 Finding Standard Normal Curve Areas The probability P z lt 1 76 is found at the intersection of the 1 7 row and the 06 column of the z table The result is P z lt 1 76 0392 as shown in the following figure z Curve f Shaded area 0392 1 76 0 In other words in a long sequence of observations roughly 3 9 of the observed z values will be smaller than 1 76 Similarly P z 0 58 entry in 0 5 row and 08 column of Table 2 7190 402 Chapter 7 a Random Variables and Probability Distributions as shown in the following figure Shaded area 7190 z curve i 0 0 58 Now consider P z lt 4 12 This probability does not appear in Appendix Table 2 there is no 4 1 row However it must be less than P z lt 3 89 the smallest z value in the table because 4 12 is farther out in the lower tail of the z curve Since P z lt 3 89 0000 that is zero to four decimal places it follows that P z lt 4 12 0 Similarly P z lt 4 18 gt P z lt 3 89 1 0000 from which we conclude that P z lt 4 18 1 E As illustrated in Example 7 21 we can use the cumulative areas tabulated in Appen dix Table 2 to calculate other probabilities involving z The probability that z is larger than a value c is P z gt c area under the z curve to the right of c 1 P z Sc In other words the area
130. t s consider problems 2 and 3 If your calculator has density and cu mulative density functions for the geometric distribution the functions are probably named something like geompdf and geomcdf similar to the names for the binomial SID 2 st te m mm am a b Figure 7 47 a Geo metric probability distri bution b histogram of geometric probability distribution a b Figure 7 48 a In correct display b correct display a Graphing Calculator Explorations 439 functions The calculator syntax for the probability density function will probably look something like geompdf p x where p is the probability of success and x is in this example the number of students you would ask until success We want the probability of jumper cables on the very first stop We enter geompdf 4 1 and the function returns 4 Now suppose you wish to find the probability of jumper cables after 4 or fewer stops Using the cumulative density function geomedf which has the same pa rameters as the geompdf function we enter geomcdf 4 4 which returns 0 8704 As with the binomial we can check this by summing geompdf 4 1 geompdf 4 2 geompdf 4 4 geompdf 4 4 0 4 0 24 0 144 0 0864 0 87041 The probability that a geometric random variable will assume a value between 4 and 7 inclusive is equal to the probability of observing a value less th
131. t problem for the standard normal dis tribution Recall from Chapter 4 that a z score or standardized score tells how many standard deviations away from the mean a value lies the z score is calculated by first subtracting the mean and then dividing by the standard deviation Converting the lower endpoint a 98 to a z score gives 98 100 2 a 40 5 5 and converting the upper endpoint yields 107 100 7 b 7 1 40 5 5 Then P 98 lt x lt 107 P 40 lt z lt 1 40 The probability P 40 lt z lt 1 40 can now be evaluated using Appendix Table 2 Finding Probabilities To calculate probabilities for any normal distribution standardize the relevant values and then use the table of z curve areas More specifically if x is a variable whose behavior is described by a normal distribution with mean u and standard deviation then P x lt b P z lt b Po a Pa z Equivalently P x gt a P z gt a Pia lt x lt b Ra lt z lt b continued 7 6 a Normal Distributions 407 where z is a variable whose distribution is standard normal and Ps cn ad a Nicola Sutton Life File Getty Images Figure 7 29 P 94 lt x lt 112 and cor responding z curve area for the height problem of Ex ample 7 25 ae Children s Heights Data from the article The Osteological Paradox Problems in Inferring Prehistoric Health from Skeletal Samples Cu
132. t w and play exactly the same role here as they did in the discrete case The mean value u locates the center of the continuous distribution and gives the approximate long run average of many ob served x values The standard deviation measures the extent that the continuous dis tribution density curve spreads out about jz and gives information about the amount of variability that can be expected in a long sequence of observed x values Figure 7 14 Approximating a density curve by a probability histogram Example 7 13 A Concrete Example A company receives concrete of a certain type from two different suppliers Define random variables x and y as follows x compressive strength of a randomly selected batch from Supplier 1 y compressive strength of a randomly selected batch from Supplier 2 Figure 7 15 Density curves for Example 7 13 7 4 a Mean and Standard Deviation of a Random Variable 379 Suppose that u 4650 Ib in 200 Ib in by 4500 Ib in 275 Ib in The long run average strength per batch for many many batches from Supplier 1 will be roughly 4650 Ib in This is 150 lb in greater than the long run average for batches from Supplier 2 In addition a long sequence of batches from Supplier 1 will exhibit substantially less variability in compressive strength values than will a similar sequence from Supplier 2 The first supplier is preferred to the second both in terms of aver
133. tandard deviation as 0 and 1 something like yl normalpdf x 0 1 Set your graphing window with x values running from about 3 5 to 3 5 and the y val ues from to 0 40 These values should be fine for the standard normal distribution If you don t see a distribution filling the screen as in Figure 7 50 something is amiss and you need to verify your keystrokes and check your calculator s manual Since the normal probability distribution is a continuous distribution the proba bility that x would be equal to a specific value is of course 0 For continuous distri butions we are usually interested in finding 1 the area under the curve between two specific values and 2 the area in the extremes or tails of the distribution The function that we will use to find these values will be symbolized with the notation normalecdf which stands for the normal cumulative distribution function This ac tually is a misnomer because the functions calling themselves cdf functions on many calculators actually calculate the probability that the standard normal variable is between two values Calculators seem to get it right for the discrete probability den sity functions but for some reason have elected to use similar names for very differ ent kinds of calculation when they get to the continuous probability density functions don t let this minor inconvenience confuse you Two strategies are used by calculator manufactur
134. te the number of coin tosses What are possible values of y Is y discrete or continuous 7 7 A box contains four slips of paper marked 1 2 3 and 4 Two slips are selected without replacement List the pos sible values for each of the following random variables a x sum of the two numbers b y difference between the first and second numbers c z number of slips selected that show an even number d w number of slips selected that show a 4 Y Video solution available The probability distribution for a random variable is a model that describes the long run behavior of the variable For example suppose that the Department of Animal Regulation in a particular county is interested in studying the variable x number of licensed dogs or cats for a household County regulations prohibit more than five dogs or cats per household If we consider the chance experiment of randomly selecting a household in this county then x is a discrete random variable because it associates a numerical value 0 1 2 3 4 or 5 with each of the possible outcomes households in the sample space Although we know what the possible values for x are it would also be useful to know how this variable behaves in repeated observation What would be the most common value What proportion of the time would x 5 be observed x 3 A probability distribution provides this type of information about the long run behavior of a random variable 362 Chapter 7 a Rand
135. ted time b How much time should be allowed for the exam if we wanted 90 of the students taking the test to be able to finish in the allotted time c How much time is required for the fastest 25 of all students to complete the exam Y Video solution available E7 Checking for Normality and Normalizing Transformations Some of the most frequently used statistical methods are valid only when a sample EE xX has come from a population distribution that is at least approximately normal One way to see whether an assumption of population normality is plausible is to construct a normal probability plot of the data One version of this plot uses Example 7 29 7 7 a Checking for Normality and Normalizing Transformations 415 quantities called normal scores The values of the normal scores depend on the sample size n For example the normal scores when n 10 are as follows 1 539 1 001 656 376 123 123 376 656 1 001 1 539 To interpret these numbers think of selecting sample after sample from a standard nor mal distribution each one consisting of n 10 observations Then 1 539 is the long run average of the smallest observation from each sample 1 001 is the long run aver age of the second smallest observation from each sample and so on In other words 1 539 is the mean value of the smallest observation in a sample of size 10 from the z distribution 1 001 is the mean value of the second s
136. ter vals 0 to lt 100 100 to lt 200 and so on b Draw the histogram corresponding to the frequency distribution in Part a How would you describe the shape of this histogram c Find a transformation for these data that results in a more symmetric histogram than what you obtained in Part b 7 90 The article The Distribution of Buying Frequency Rates Journal of Marketing Research 1980 210 216 reported the results of a 35 year study of dentifrice pur chases The investigators conducted their research using a national sample of 2071 households and recorded the number of toothpaste purchases for each household partic ipating in the study The results are given in the following frequency distribution Number of Number of House Purchases holds Frequency 10 to lt 20 904 20 to lt 30 500 30 to lt 40 258 40 to lt 50 167 50 to lt 60 94 60 to lt 70 56 70 to lt 80 26 80 to lt 90 20 90 to lt 100 13 100 to lt 110 9 110 to lt 120 7 120 to lt 130 6 130 to lt 140 6 140 to lt 150 3 150 to lt 160 0 160 to lt 170 2 Y Video solution available 424 Chapter 7 a Random Variables and Probability Distributions a Draw a histogram for this frequency distribution Would you describe the histogram as positively or negatively skewed b Does the square root transformation result in a histo gram that is more symmetric than that of the original data Be careful This one is a bit tricky because
137. tering would be disconnected before com pleting their registration A general formula for converting a z score back to an x value results from solving TH w for x as shown in the accompanying box To convert a z score z back to an x value use x pt zo 412 Chapter 7 a Random Variables and Probability Distributions Example 7 28 Motor Vehicle Emissions Data from the article Determining Statistical Characteristics of a Vehicle Emissions Audit Procedure Technometrics 1980 483 493 suggest that the emissions of nitrogen oxides which are major constituents of smog can be plausibly modeled using a normal distribution Let x denote the amount of this pollutant emitted by a randomly selected vehicle The distribution of x can be described by a normal distri bution with u 1 6 and 0 4 Suppose that the EPA wants to offer some sort of incentive to get the worst pol luters off the road What emission levels constitute the worst 10 of the vehicles The worst 10 would be the 10 with the highest emissions level as shown in the illustration in the margin For the standard normal distribution the largest 10 are those with z values greater than z 1 28 from Appendix Table 2 based on a cumulative area of 90 Hisham F Ibrahim Getty Images Then Normal curve u 1 6 0 4 x u z o i i 1 6 1 28 4 1 6 512 2 112 In the population of vehicles of the type considered about 10 would Worst
138. the most extreme 0 1 of all birth weights e If x is a random variable with a normal distribution and a is a numerical constant a 0 then y ax also has a normal distribution Use this formula to determine the distribution of birth weight expressed in pounds shape mean and standard deviation and then recalculate the probability from Part c How does this compare to your previous answer 7 74 A machine that cuts corks for wine bottles operates in such a way that the distribution of the diameter of the corks produced is well approximated by a normal distribu tion with mean 3 cm and standard deviation 0 1 cm The specifications call for corks with diameters between 2 9 and 3 1 cm A cork not meeting the specifications is considered defective A cork that is too small leaks and causes the wine to deteriorate a cork that is too large Y Video solution available 414 Chapter 7 a Random Variables and Probability Distributions doesn t fit in the bottle What proportion of corks produced by this machine are defective 7 75 Refer to Exercise 7 74 Suppose that there are two machines available for cutting corks The machine described in the preceding problem produces corks with diameters that are approximately normally distributed with mean 3 cm and standard deviation 0 1 cm The second machine produces corks with diameters that are approxi mately normally distributed with mean 3 05 cm and standard deviation 0 01 cm Which machin
139. the probability that at least 125 items are produced c What is the probability that between 135 and 160 in clusive items are produced 7 95 The number of vehicles leaving a turnpike at a cer tain exit during a particular time period has approximately a normal distribution with mean value 500 and standard deviation 75 What is the probability that the number of cars exiting during this period is a At least 650 b Strictly between 400 and 550 Strictly means that the values 400 and 550 are not included c Between 400 and 550 inclusive Bold exercises answered in back Data set available online but not required Normal curve for u 25 6 4 743 25 sf 30 5 7 96 Let x have a binomial distribution with n 50 and a 6 so that u nrm 30 and o Vn7 1 7 3 4641 Calculate the following probabilities using the normal approximation with the continuity correction a P x 30 b P x 25 c P x S 25 d P 25 x 40 e P 25 lt x lt 40 Hint 25 lt x lt 40 is the same as 26 x 39 7 97 Seventy percent of the bicycles sold by a certain store are mountain bikes Among 100 randomly selected bike purchases what is the approximate probability that a At most 75 are mountain bikes b Between 60 and 75 inclusive are mountain bikes c More than 80 are mountain bikes d At most 30 are not mountain bikes 7 98 Suppose that 25 of the fire alarms in a large city are false alarms L
140. three forms are required Bold exercises answered in back Data set available online but not required c What is the probability that between two and four forms inclusive are required d Could p y y 50 for y 1 2 3 4 5 be the proba bility distribution of y Explain 7 19 A library subscribes to two different weekly news magazines each of which is supposed to arrive in Wednes day s mail In actuality each one could arrive on Wednes day W Thursday T Friday F or Saturday S Sup pose that the two magazines arrive independently of one another and that for each magazine P W 4 P T 3 P P 2 and P S 1 Define a random variable y by y the number of days beyond Wednesday that it takes for both magazines to arrive For example if the first mag azine arrives on Friday and the second magazine arrives on Wednesday then y 2 whereas y 1 if both maga zines arrive on Thursday Obtain the probability distribu tion of y Hint Draw a tree diagram with two generations of branches the first labeled with arrival days for Maga zine and the second for Magazine 2 Y Video solution available Es Probability Distributions for Continuous Random Variables A continuous random variable is one that has as its set of possible values an entire interval on the number line An example is the weight x in pounds of a newborn child Suppose for the moment that weight is recorded only to the nearest p
141. tribution x 13 5 15 9 19 1 D x 2 5 3 a Calculate the mean and standard deviation of x b If the price of the freezer depends on the size of the storage space x such that Price 25x 8 5 what is the mean value of the variable Price paid by the next customer c What is the standard deviation of the price paid 7 41 Y To assemble a piece of furniture a wood peg must be inserted into a predrilled hole Suppose that the diame ter of a randomly selected peg is a random variable with mean 0 25 in and standard deviation 0 006 in and that the diameter of a randomly selected hole is a random variable with mean 0 253 in and standard deviation 0 002 in Let x peg diameter and let x denote hole diameter a Why would the random variable y defined as y X2 x be of interest to the furniture manufacturer b What is the mean value of the random variable y c Assuming that x and x are independent what is the standard deviation of y d Is it reasonable to think that x and x are independent Explain e Based on your answers to Parts b and c do you think that finding a peg that is too big to fit in the pre drilled hole would be a relatively common or a relatively rare occurrence Explain 7 42 A multiple choice exam consists of 50 questions Each question has five choices of which only one is correct Suppose that the total score on the exam is com puted as 1 y X 7 4 where x number of
142. tunately direct computation of such probabilities areas under a normal curve is not simple To overcome this difficulty we rely on a table of areas for a ref erence normal distribution called the standard normal distribution ibution with rd normal curve It is whose distribution is m z curve is often used in Few naturally occurring variables have distributions that are well described by the standard normal distribution but this distribution is important because it is also used in probability calculations for other normal distributions When we are interested in find ing a probability based on some other normal curve we first translate our problem into an equivalent problem that involves finding an area under the standard normal curve A table for the standard normal distribution is then used to find the desired area To be able to do this we must first learn to work with the standard normal distribution a The Standard Normal Distribution 2 0 0 0 eee cc eeceeccceeecccecueccecaeeceeeaeeeeees Figure 7 21 a A stan dard normal z curve and b a cumulative area In working with normal distributions we need two general skills 1 We must be able to use the normal distribution to compute probabilities which are areas under a normal curve and above given intervals 2 We must be able to characterize extreme values in the distribution such as the largest 5 the smallest 1 and the most extreme 5 which would include the
143. ty that at least 40 of those in the sample that is eight or more have a system is P x 8 P x 8 9 10 19 or 20 p 8 p 9 p 20 022 007 002 000 000 031 If in fact m 20 only about 3 of all samples of size 20 would result in at least 8 homeowners having a security system Because P x 8 is so small when 7 20 if x 8 were actually observed we would have to wonder whether the reported value of m 20 is correct Although it is possible that we could observe x 8 when a 20 this would happen about 3 of the time in the long run it might also be the case that 7 is actually greater than 20 In Chapter 10 we show how hypothesis testing methods can be used to decide which of two contradictory claims about a population e g 7 20 or m gt 20 is more plausible The binomial formula or tables can be used to compute each of the 21 probabili ties p 0 p 1 p 20 Figure 7 16 shows the probability histogram for the bino mial distribution with n 20 and m 20 Notice that the distribution is skewed to the right The binomial distribution is symmetric only when m 5 p x 0123 45 67 8 9 1011 12131415 1617 18 19 20 E m Mean and Standard Deviation of a Binomial Random Variable A binomial ran dom variable x based on n trials has possible values 0 1 2 n so the mean value is m X xp x 0 p 0 1 p 1 n p n 392 Chapter 7 a Random
144. uals a What is the probability that all 10 pass b What is the probability that more than 2 fail even though all are trustworthy c The article indicated that 500 FBI agents were required to take a polygraph test Consider the random variable x number of the 500 tested who fail If all 500 agents tested are trustworthy what are the mean and standard deviation of x d The headline indicates that fewer than 25 of the 500 agents tested failed the test Is this a surprising result if all 500 are trustworthy Answer based on the values of the mean and standard deviation from Part c 7 52 Industrial quality control programs often include in spection of incoming materials from suppliers If parts are purchased in large lots a typical plan might be to select 20 parts at random from a lot and inspect them A lot might be judged acceptable if one or fewer defective parts are found among those inspected Otherwise the lot is rejected and returned to the supplier Use Appendix Table 9 to find the probability of accepting lots that have each of the fol lowing Hint Identify success with a defective part a 5 defective parts b 10 defective parts c 20 defective parts 7 53 An experiment was conducted to investigate whether a graphologist a handwriting analyst could distinguish a normal person s handwriting from that of a psychotic A well known expert was given 10 files each containing handwriting samples from a normal perso
145. uare inch at which a ten nis racket is strung f The amount of water used by a household during a given month g The number of traffic citations issued by the highway patrol in a particular county on a given day 7 3 Starting at a particular time each car entering an in tersection is observed to see whether it turns left L or right R or goes straight ahead S The experiment ter minates as soon as a car is observed to go straight Let y denote the number of cars observed What are possible y values List five different outcomes and their associated y values 7 4 A point is randomly selected from B C the interior of a square as pictured 1 ft Bold exercises answered in back Data set available online but not required Let x denote the distance from the lower left hand corner A of the square to the selected point What are possible values of x Is x a discrete or a continuous variable 7 5 A point is randomly selected on the surface of a lake that has a maximum depth of 100 ft Let y be the depth of the lake at the randomly chosen point What are possible values of y Is y discrete or continuous 7 6 A person stands at the corner marked A of the square pictured in Exercise 7 4 and tosses a coin If it lands heads up the person moves one corner clockwise to B If the coin lands tails up the person moves one corner counter clockwise to D This process is then repeated until the person arrives back at A Let y deno
146. ulate probabilities because it just requires finding the area of rectangles using the formula area base height Density Density Density P 4 5 lt x lt 5 5 P 5 5 lt x x 4 45 5 55 6 445 5 55 6 Minutes Minutes Minutes a b c The curve has positive height 0 5 only between x 4 and x 6 The total area under the curve is just the area of the rectangle with base extending from 4 to 6 and with height 0 5 This gives area 6 4 0 5 1 as required When the density is constant over an interval resulting in a horizontal density curve the probability distribution is called a uniform distribution As illustrated in Figure 7 7 b the probability that x is between 4 5 and 5 5 is P 4 5 lt x lt 5 5 area of shaded rectangle base width height 5 5 4 5 5 5 Similarly see Figure 7 7 c because in this context x gt 5 5 is equivalent to 5 5 x 6 we have P 5 5 lt x 6 5 5 5 25 According to this model in the long run 25 of all forms that are processed will have processing times that exceed 5 5 min The probability that a discrete random variable x lies in the interval between two limits a and b depends on whether either limit is included in the interval Suppose for example that x is the number of major defects on a new automobile Then P 3 x 7 p 3 p 4 p 5 p 6 p 7 370 Chapter 7 a Random Variables and Probability Distributions whereas
147. ur graph but it could be that there are more significant probabilities to the right As an ex ample suppose we consider the chance experiment of flipping a coin until a head ap pears The distribution of x number of tosses is geometric with success probability 5 If the distribution is plotted using the previous steps but only using a sequence of integers from 1 to 4 the results are shown in Figure 7 48 a Clearly there are values with probabilities different from zero that are not represented in the graph The solu tion is to construct the sequence of integers over a larger range of values say to 16 440 Chapter 7 Random Variables and Probability Distributions At some point of course the geometric probabilities become very close to zero as Figure 7 48 b If your graph looks similar to the one on the right tailing off gradu ally you can be fairly certain you have captured the essential behavior of the particu lar geometric distribution Exploration 7 4 Normal Curves and the Normal Probability Distribution The normal distribution is arguably the most famous distribution in all of statistics As we have learned the normal distribution is really a family of distributions with the same shape but different means and standard deviations The standard normal dis tribution is the normal probability distribution with u 0 and 1 0 From the cal culator perspective working with the normal distribution is slightly
148. ur boxes must be purchased Bold exercises answered in back 7 6 Normal Distributions Data set available online but not required 7 6 a Normal Distributions 397 c What is the probability that more than four boxes must be purchased 7 63 V The article on polygraph testing of FBI agents ref erenced in Exercise 7 51 indicated that the probability of a false positive a trustworthy person who nonetheless fails the test is 15 Let x be the number of trustworthy FBI agents tested until someone fails the test a What is the probability distribution of x b What is the probability that the first false positive will occur when the third person is tested c What is the probability that fewer than four are tested before the first false positive occurs d What is the probability that more than three agents are tested before the first false positive occurs Y Video solution available Normal distributions formalize the notion of mound shaped histograms introduced in Chapter 4 Normal distributions are widely used for two reasons First they provide a reasonable approximation to the distribution of many different variables They also play a central role in many of the inferential procedures that will be discussed in later chapters Normal distributions are continuous probability distributions that are bell shaped and symmetric as shown in Figure 7 17 Normal distributions are sometimes referred to as a normal curves There are
149. values of n use the crit ical value for the larger sample size For example if n 46 use the value 966 for sample size 50 Table 7 2 Values to Which rCan Be Compared to Check for Normality n 3 10 15 20 25 30 40 50 60 75 Criticalr 832 880 911 929 941 949 960 966 971 976 Source MINITAB User s Manual r lt critical r for corresponding n considerable doubt is cast on the assumption of population normality Example 7 30 How were the critical values in Table 7 2 obtained Consider the critical value 941 for n 25 Suppose that the underlying distribution is actually normal Consider obtaining a large number of different samples each one consisting of 25 observations and computing the value of r for each one Then it can be shown that only 1 of the samples result in an r value less than the critical value 941 That is 941 was chosen to guarantee a 1 error rate In only 1 of all cases will we judge normality implau sible when the distribution is really normal The other critical values are also chosen to yield a 1 error rate for the corresponding sample sizes It might have occurred to you that another type of error is possible obtaining a large value of r and concluding that normality is a reasonable assumption when the distribution is actually nonnormal This type of error is more difficult to control than the type mentioned previously but the procedure we have described generally does a good job in both r
150. vations on the variable x Construct a rela tive frequency distribution for the 50 observations and compare this with the probability distribution obtained in Exercise 7 12 7 14 Of all airline flight requests received by a certain discount ticket broker 70 are for domestic travel D and 30 are for international flights I Let x be the num ber of requests among the next three requests received that are for domestic flights Assuming independence of suc cessive requests determine the probability distribution of x Hint One possible outcome is DID with the probabil ity 7 3 C7 147 7 15 Suppose that 20 of all homeowners in an earthquake prone area of California are insured against earthquake damage Four homeowners are selected at ran dom let x denote the number among the four who have earthquake insurance a Find the probability distribution of x Hint Let S de note a homeowner who has insurance and F one who does not Then one possible outcome is SFSS with probability 2 8 2 2 and associated x value of 3 There are 15 other outcomes b What is the most likely value of x c What is the probability that at least two of the four selected homeowners have earthquake insurance 7 16 A box contains five slips of paper marked 1 1 1 10 and 25 The winner of a contest selects two slips of paper at random and then gets the larger of the dollar amounts on the two slips Define a random variable w by
151. when n 20 and 7 8 pds P x 15 entry at intersection of n 15 row and a 8 column 175 Although p x is positive for every possible x value many probabilities are zero to three decimal places so they appear as 000 in the table More extensive binomial tables are available Alternatively most statistics software packages and graphing cal culators are programmed to calculate these probabilities m Sampling Without Replacement Usually sampling is carried out without re placement that is once an element has been selected for the sample it is not a candi date for future selection If the sampling was accomplished by selecting an element from the population observing whether it is a success or a failure and then returning it to the population before the next selection is made the variable x number of suc cesses observed in the sample would fit all the requirements of a binomial random variable When sampling is done without replacement the trials individual selections are not independent In this case the number of successes observed in the sample does not have a binomial distribution but rather a different type of distribution called a hy pergeometric distribution The probability calculations for this distribution are even more tedious than for the binomial distribution Fortunately when the sample size n is much smaller than N the population size probabilities calculated using the binomial distribution and th
152. which x 2 because there are 10 ways to select two from among the five trials to be the S s SSFFF SFSFF and FFFSS The probability of each results from multiplying together 4 two times and 6 three times For example P SSFFF 4 4 6 6 6 4 6 03456 and so p 2 P 2 P SSFFF P FFFSS 10 4 7 6 34560 388 Chapter 7 a Random Variables and Probability Distributions The general form of the distribution here is p x P x S s among the five trials no of outcomes with x S s probability of any particular outcome with x S s no of outcomes with x S s 4 6 gt This form was seen previously where p 2 10 4 6 Let n denote the number of trials in the experiment Then the number of outcomes with x S s is the number of ways of selecting x from among the n trials to be the suc cess trials A simple expression for this quantity is n number of outcomes with x successes x n x where for any positive whole number m the symbol m read m factorial is de fined by m m m 1 m 2 2 1 and 0 1 The Binomial Distribution et 3 II number of independent trials in a binomial experiment m constant probability that any particular trial results in a success p x P x successes among n trials n x O36 ea seme T x 0 1 2 n n n The expressions or C
153. xercise 7 8 gave the following probability distribu tion for x the number of courses for which a randomly selected student at a certain university is registered x 1 2 3 4 5 6 7 px 02 03 09 25 40 16 05 It can be easily verified that u 4 66 and 1 20 a Because u 3 46 the x values 1 2 and 3 are more than standard deviation below the mean What is the probability that x is more than 1 standard deviation below its mean b What x values are more than 2 standard deviations away from the mean value i e either less than u 20 or greater than u 20 What is the probability that x is more than 2 standard deviations away from its mean value 7 33 Suppose that for a given computer salesperson the probability distribution of x the number of systems sold in one month is given by the following table x 1 2 3 4 5 6 7 8 px 05 10 12 30 30 11 6 01 0l a Find the mean value of x the mean number of systems sold b Find the variance and standard deviation of x How would you interpret these values c What is the probability that the number of systems sold is within 1 standard deviation of its mean value d What is the probability that the number of systems sold is more than 2 standard deviations from the mean 7 34 A local television station sells 15 sec 30 sec and 60 sec advertising spots Let x denote the length of a ran domly selected commercial appearing on this station and suppose that the pro
154. you don t have the raw data transforming the endpoints of the class intervals will result in class intervals that are not necessarily of equal widths so the histogram of the trans formed values will have to be drawn with this in mind 7 91 The paper Temperature and the Northern Distrib utions of Wintering Birds Ecology 1991 2274 2285 gave the following body masses in grams for 50 differ ent bird species 7 7 10 1 21 6 8 6 120 114 16 6 9 4 11 5 9 0 8 2 20 2 485 216 26 1 6 2 19 1 21 0 281 106 31 6 6 7 5 0 68 8 23 9 19 8 20 1 6 0 996 19 8 16 5 9 0 448 0 213 174 369 340 41 0 15 9 12 5 10 2 310 215 11 9 32 5 9 8 93 9 10 9 19 6 14 5 a Construct a stem and leaf display in which 448 0 is listed separately beside the display as an outlier on the high side the stem of an observation is the tens digit the Untransformed data Number of samples Number of samples 0 5 1 0 1 5 2 0 Bold exercises answered in back Data set available online but not required leaf is the ones digit and the tenths digit is suppressed e g 21 5 has stem 2 and leaf 1 What do you perceive as the most prominent feature of the display b Draw a histogram based on class intervals 5 to lt 10 10 to lt 15 15 to lt 20 20 to lt 25 25 to lt 30 30 to lt 40 40 to lt 50 50 to lt 100 and 100 to lt 500 Is a transformation of the data desirable Explain c Use a calculator or statistical computer package to cal culate logarith

Download Pdf Manuals

image

Related Search

Related Contents

Philips HD7584  トラックの事故低減に向けて(関東運輸局自動車技術安全部 保安・環境課)  第一種フロン類充塡回収業者 申請書 ※登録番号 ※登録  発電設備用プレート式熱交換器  Introduction Séance inaugurale  6 riders manual-MB - SharkMotorCycleAudio.com  GPX PC308B CD Player User Manual    user manual - Tire Pressure Monitoring System (TPMS)  Vade Mecum  

Copyright © All rights reserved.
Failed to retrieve file