Home
2.1 Boxplots (P.15-16) - School of Mathematics and Statistics
Contents
1. 3 Write down aor x and evaluate it Solution 5 Ti ti tza T3 T4 H T5 i 1 Example Evaluate the following summation expressions for the values 3 4 5 1 4 4 3 4 gt So xi N 22 3 and 5a i 1 i 2 i 1 i l SydU MATH1015 2015 First semester 5 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 Solution 4 X Ti Ly Fra F 3 7 i 1 3 Ti T2 T T3 i 2 S 2z 3 2x1 3 2x2 3 2x3 3 2x4 3 1 4 X 2 2 2 2 2 i 1 2 2 1 The Sample Mean p9 The sample mean is the simple arithmetic mean or the average of observations For n observations 71 2 n this is denoted by z called x bar and is given by T Bi Pee ses eo By 1 7 Ti Example The mean of the sample of 4 values from a previous example is T SydU MATH1015 2015 First semester 6 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Exercise Look at your calculator now Change the mode of your calculator to stat or sd or as per calculator instructions Chack the above answer using your calculator Note The mean is very sensitive to large or small outliers in the sample In such cases it is better to use the median as a measure of the centre of the data Use of R R can be used to find the mean of a sample Practice this ex ample gt x c 3 4 5 1 gt mean x gt 3 25 Exercise Find the media
2. 1 the deviations from the mean are 2 16 14 6 16 11 15 16 1 20 16 4 38 16 22 The sum of squared deviations divided by 4 is considered as a good measure of the spread and known as the sample variance For the above sample 1 24 24 21424 2 14 11 1 4 22 818 204 5 the variance mi Similarly for the sample 2 the variance is 15 As seen from the data the sample 1 has more variablity than the sample 2 SydU MATH1015 2015 First semester 8 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Calculation of the Sample Variance For a set of n observations 1 9 Zn the sample variance s is given by 1 n 2 240 ti T za 2 z 7 l Note It is easier to use the following calculation formula in practice It can be shown after expanding the square term x z and re arranging the terms that the above is equivalent to 2 1 n i n ii 1 n E a ti Soa zh Dane 7 1 Note You do not need to memorize this formula as it is provided on a formula sheet available from the course web site Note The above value is in squared units Example Find the mean and variance of the sample 55 48 59 64 65 57 58 41 57 59 64 62 Solution n _ First calculate 12 ya i 1 12 dts i l SydU MATH1015 2015 First semester 9 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 12 o tL e Mean J w
3. Q3 1 28 IQR Q3 Q1 1 28 0 80 0 48 LT Q 1 5 x IQR 0 80 1 5 0 48 0 08 UT Q3 1 5 x IQR 1 28 1 5 0 48 2 00 Since the max 2 11 lies outside LT UT 0 08 2 00 3 CVs are 0 258 and 0 472 respectively 4 LT min Qi Qe Q3 2mdmax uYpmax 0 08 0 49 0 800 28 1 54 2 00 2 11 0 51 1 03 o 1 e 88 2 34 2 69 LT min Q Q Qs max UT R commands mean x sd x sort x median x sd x mean x cv fivenum x boxplot x y 2 boxplots side by side where x and y are vectors of measurements SydU MATH1015 2015 First semester 18 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 In order to develop further concepts and applications of biostatis tics it is convenient to understand the basic theory of probabil ity Now we look at this topic 3 An Introduction to Probability Theory and Applications P29 This chapter considers the following topics e Basic terminology e Theory of sets and Venn diagrams e Probability axioms and counting methods e Conditional probability and independence Preliminaries e The word fair or unbiased is regularly used in many life science situations This means that all possible outcomes of an experiment have the same chance to occur e Any experiment to collect information is called a random experiment if we are not certain or cannot predict of its outcome s It is clear that in a random experi
4. 2 3 Example List the event A of observing a number less than 3 in experiment 1 above Ans A Example A card is selected at random from a box containg 10 cards with numbers 1 to 10 List the events A of observing even numbers and B of observing numbers divisible by 4 Ans A B 3 1 Probability of equally likely outcomes events First consider the concept of equally likely outcomes Equally Likely Outcomes The outcomes of a random exper iment or in a sample space are called equally likely if all of them have the same chance of occurrence In a historical note the probability was considered as the chance of an event to occur which expresses the strength of one s belief Therefore this was known as subjective probability However this was later developed with a number of common concepts in cluding equally likely outcomes Therefore we have the following definion Definition The probability of an event A is the relative fre quency of its set of outcomes over an indefinitely large number of repeated trials under identical conditions This is denoted by P A SydU MATH1015 2015 First semester 21 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Calculating Probabilities Suppose we have a random experiment which has exactly n total possible equally likely outcomes Let A be an event of interest within this sample space containing m number of simple outcomes Then the probability as
5. 57 158 2632 and Exact variance 1447141 90217 57 56 347 3045 sd 18 63611 Additional worked example Consider the two samples Sample 1 x 1 76 1 45 1 03 1 53 2 34 1 96 1 79 1 21 Sample 2 y 0 49 0 85 1 00 1 54 1 01 0 75 2 11 0 92 each of the two samples For 1 calculate the mean and the standard deviation N find Q1 Q2 Q3 LT and UT 3 find CV 4 draw both boxplots on the same page SydU MATH1015 2015 First semester 16 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Solution In ascending order 1 We haven 81 is even and Ym 13 07 ba 22 5873 yg SBE Yo of 11 2153 i l Sample 1 i 13 07 The meant z 1 63 8 E 8 1 IS Y Th n 2 i e sd s sa 2 Za 1 13 07 22 5873 0 42 Sample 2 8 1 8 67 Th J 1 08 e mean y 3 2 y 3 i 8 The sd Sy 3 1 dv R 33 1 8 672 4 11 2153 0 51 Sample 1 z 1 03 1 21 1 45 1 53 1 76 1 79 1 96 2 34 Sample 2 y 0 49 0 75 0 85 0 92 1 00 1 01 1 54 2 11 SydU MATH1015 2015 First semester 17 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 Sample 1 Q1 1 330 Q2 1 645 Q3 1 875 IQR Q3 Q 1 875 1 330 0 545 LT Q 1 5 x IQR 1 330 1 5 0 545 0 5125 UT Q3 1 5 x IQR 1 875 1 5 0 545 2 6925 There is no outlier Sample 2 Q 0 80 Q2 0 96
6. THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 2 1 Boxplots P 15 16 Recall that observations outside the interval LT UT are called outliers or abnormal observations where Lower threshold value LT lower quartile 1 5 x IQR Upper threshold value UT upper quartile 1 5 x IQR A popular box type graphical representation of the following information from a data set is known as a boxplot e Quartiles Q Q2 and Qs draw a rectangular box from the quartiles Q to Q3 and mark Q within this box e Smallest and largest observations within LT UT e Outliers if exist Diagram Suppose that a data set contains three values be low the LT left outliers and two values above the UT right outliers Now we show these information in the diagram below Boxplots show the shape of the distribution of data very clearly and are helpful in representing any outlying or extreme values of a data set SydU MATH1015 2015 First semester 1 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Example Consider the following data set of 13 observations x from the previous example 4 6 6 7 7 9 10 11 13 15 22 24 30 1 Find LT and UT for this sample 2 Identify any outliers if they exist 3 Draw a boxplot for this sample following the steps a Draw a rectangle horizontal or vertical of arbitrary width from Q to Q3 b Draw a dotted line across the rectangle at Qo c Draw two l
7. ating the mean and variance from such a frequenccy table SydU MATH1015 2015 First semester 12 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 2 3 1 The mean of Grouped Data Suppose that we only have the information provided by a grouped frequency table for a data set That is we only have access to the published report and not the original data set Let k be the number of bins groups or intervals and u1 u2 ug be the cen tres of each interval with corresponding frequencies fi fo fke Then an approximate sample mean is given by k i l Example Consider the data on weight in pounds recorded to the nearest pound of 35 female students from week 1 T d la Females 140 120 130 138 121 125 116 145 150 112 125 130 120 130 131 120 118 125 135 125 118 122 115 102 115 150 110 116 108 95 125 133 110 150 108 We have the frequency distribution from last week CLASS INTERVAL CLASS CENTER FREQUENCY 94 104 99 2 104 114 109 5 114 124 119 11 124 134 129 10 134 144 139 3 144 154 149 4 TOTAL 35 Find the grouped mean SydU MATH1015 2015 First semester 13 SYDNEY MATH1015 Biostatistics Week 2 3 Solution n 35 the number of values 6 Nf ui i 1 6 _ 1 T n Dyfi Exercise Find the exact mean of the data and compare it to the above approximation Answer Using the complete data check with your calculator and R sum of all 35 values 4333 and hence
8. he left Right skewed boxplot is stretched to the right Now we look at a number of additional summaries from a data set 2 2 Measures of Location and Spread P 9 11 Measures of Location We have seen that median is a measure of the center of a data set Another popular measure of the center of a data set is known as the mean Recall from your high school work that the mean of 4 7 9 5 3 is 5 3 5 6 Use your calculator to check this answer Now we develop this concept to handle common problems instatistics we use the following notation SydU MATH1015 2015 First semester 4 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 A Notation Suppose that we have n observations from an experiment This collection or set of n values is called a sample Let x be the first sample point or observation be the second sample point or observation etc and x be the nt sample point or observation Example Suppose that we have a sample of five observations 4 7 9 5 3 For this sample the first observed values is 4 and therefore we write x 4 to identify it Similarly z 7 3 9 44 Qo T5 3 Summation Notation For simplicity the sum of these n values 1 2 n is abbreviated by the sigma notation as fol lows So aja tate H En i 1 Note Many calculators use this notation Please check your calculator now Example Consider the sample 7 4 72 7 73 9 4 5 5
9. hildren ii find the probability that there are a at most one boy and b at least one boy in a family of three children Solution i Tree diagram for the distribution of gender of three children 0 6 P BBB 0 6 x 0 6 x 0 6 P BBG 0 6 x 0 6 x 0 4 B P BGB 0 6 x 0 4 x 0 6 iN w iS w QA iN q i P BGG 0 6 x 0 4 x 0 4 P GBB 0 4 x 0 6 x 0 6 oO iN v A i Q i P GBG 0 4 x 0 6 x 0 4 P GGB 0 4 x 0 4 x 0 6 iN v Q N Q P GGG 0 4 x 0 4 x 0 4 Solution ii a P at most 1 boy b P at least 1 boy SydU MATH1015 2015 First semester 24
10. ines called Whiskers to and from the ob servations within LT UT from the above rectangle d Mark any identified outliers by o Solution 1 From the previous example we have calculated Median Q 10 Lower quartile Q 7 Upper quartile Qs 15 Hence IQR 8 LT 5 UR 27 2 All observations in the interval 5 27 are considered le gitimate Clearly there is only one data point outside this interval Therefore the last observation 30 is considered as abnormally high This is an outlier SydU MATH1015 2015 First semester 2 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 3 The following boxplot summarises the above information as a graph indicating the outlier by o Boxplot in R R can be used to draw a boxplot Let x contains the data gt x c 4 6 6 7 7 9 10 11 13 15 22 24 30 gt boxplot x LT min Q Q2 Q3 2nd max UT max 5 4 7 10 15 24 27 30 SydU MATH1015 2015 First semester 3 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 Notes e Boxplots are useful to compare a continuous variable e g length weight etc with a nominal variable e g treat ment e Length of whisker in R is by default chosen to be 1 5 x IQR e Boxplots give a simple visual display and hence a quick impression of the shape of the data set Symmetrical left and right tails are similar left skewed boxplot is stretched to t
11. me i l e Variance 2 hH Ee S Standard Deviation of a Sample It is clear that the sample variance has squared units Therefore its square root will provide value in original units This square root is known as the sample standard deviation Example Find the standard deviation of the above sample Solution Simply take the square root of the variance Thus the Standard Deviation is S Notes e Many scientific calculators and computer packages includ ing R can be used to find the standard deviation of a given dataset e Look at your calculator now Change the mode of your calculator to STAT or sim ilar depending on your calculator SydU MATH1015 2015 First semester 10 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Look for buttons Z s or o Many calculators have s _ or o7_ button for the sample sd Check with the user manual for details e It can be proved that after a change in origin of a data set the variance and standard deviation remain the same If the sample points change in scale by a factor c then the variance changes by a factor of c and the sd changes by a factor of c Exercise Consider the data set 110 96 118 128 130 114 116 82 114 118 128 124 Show that the mean variance and sd respectively are approx 114 84 194 52 13 95 Note the second data set is twice the first and hence the second mean is twice the first mean second varia
12. ment one cannot state before the experiment what a particular outcome will be Note On contrary a deterministic experiment yields known or predictable outcomes when repeated under the same conditions For example consider the following experiments SydU MATH1015 2015 First semester 19 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 1 Toss a fair six sided die once and observe the number that shows on top 2 Take a marble from a bag containing 2 red 1 black and 1 white balls and observe its colour It is clear that in these random experiments one cannot state before the experiment what a particular outcome will be at each throw However we can make a list of all possible outcomes For example 1 In 1 we observe one of 1 or 2 or 3 or 4 or 5 or 6 2 In 2 we observe one colour from red or black or white Now we provide the following definition for later reference Definition The collection or the set of all possible outcomes of a random experiment is called the sample space This is de noted by S or Q and be written as S For example 1 in experiment 1 above S 2 in experiment 2 above S The following terminology will be useful in many applications Definition An event of a random experiment is a collection of outcomes with specified or interested features SydU MATH1015 2015 First semester 20 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week
13. n mean and mode for the data set 13 3 10 7 11 0 11 1 12 9 11 8 11 9 12 2 10 8 12 2 11 6 11 8 Solution Order the data x to find the median 1057 10 8 11 0 114 i16 11 8 11 8 11 9 122 12 2 19 9 133 Ans mean 11 775 median 11 8 mode 11 8 and 12 2 In this case the mode is not unique Such datasets are also called bimodal Exercise Check the mean of this sample using your calcula tor now changing the mode to stat Exercise Check the answers using R SydU MATH1015 2015 First semester 7 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 2 2 2 Sample Variance and Standard Deviation p12 In order to motivate this topic consider the following two sets of observations 2 5 15 20 38 12 13 15 19 21 ees 16 It is easy to verify that both sets have the same centre or the mean at z 16 However the two samples visually appear radically different This difference lies in the greater spread or variability or dis persion in the first dataset than the second Therefore we need a universal measure to find an indication of the amount of vari ation that a data set exhibits We will now describe the most popular measure of spread used in practice known as the sample variance based on n observations The Sample Variance The difference between an observation and the sample mean is known as the deviation of the observation from the sample mean For example in sample
14. nce is four times the first variance and second sd is twice the first sd 2 2 3 The Coefficient of Variation The coefficient of variation denoted CV is the ratio of the stan dard deviation to the mean For a dataset with 4 0 we define CV Ril w This ratio of the standard deviation to the mean is a useful statistic for comparing the degree of variation from one data series to another even if the means are drastically different from each other SydU MATH1015 2015 First semester 11 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Example The CV for the previous dataset is CV Ri w amp or the s d accounts for 12 of the mean Note Itis clear that the CV is dimensionless as it is a propor tion For example it is not affected by multiplicative changes of scale Therefore the CV is a useful measure for comparing the dispersions of two or more variables that are measured on different scales The next section considers the corresponding results for grouped data 2 3 Grouped Data P 16 17 Recall that large datasets can be summarised with a suitable frequency distribution table with k groups or intervals or bins like this Group Class interval Class center Frequency Relative frequency Y lt r ype ur y y2 2 fi fi n yo lt lt ys uz y2 ys 2 fe fo n Yk lt T lt Yep uk Yk Yk 1 2 r fx m TOTAL n 1 000 Now we look the problem of calcul
15. signed to A P A is given by Examples 1 Throw a fair six sided die There are 6 equally likely possible outcomes The sample space S of this experiment is S If A denotes the event of observing an even number then Prob an even number P A 2 Toss a fair coin 3 times There are 8 possible equally likely outcome and the sample space is S e Let A be the event of observing exactly two heads in this experiment Then A and the probability of observing exactly two heads is SydU MATH1015 2015 First semester 22 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 e Let B be the event of observing at least one head Then the event is B Hence the probability of observing at least one head is 3 2 Probability using tree diagrams p33 Probability Trees or Tree Diagrams can be used to visualize the events and to calculate simple probabilities Example Draw a suitable tree diagram for the experiment of tossing a fair coin two times Hence list the sample space Exercise Draw a tree diagram for the experiment of tossing a fair coin three times SydU MATH1015 2015 First semester 23 THE UNIVERSITY OF SYDNEY MATH1O015 Biostatistics Week 2 3 Example A certain country reports that it has a higher rate of male births with probability of a boy is 0 6 Assuming the births are random i draw a tree diagram to repersent the distribution of children in families with three c
16. the exact mean z Note The grouped mean and the exact mean are close to each other 2 3 2 The Variance of Grouped Data For data from a frequency table the grouped sample variance is 1 k s gt filu z n 1 or equivalently 2 1 iy 1 2 or 1 i 2 2 nS 2 Si p a Sits KE 2 Si nla f SydU MATH1015 2015 First semester 14 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 Example Find the sample variance from the previous frequency distribu tion table of 35 female students Solution 6 D fins i 1 gt 32 gt s Example Find the exact sample sd and compare with the grouped sd 13 35581 solution Check with your calculator and R the following S x 4333 J 2 542505 Thus s 2 2 35 _ 542505 4338 35 _ 17g 8118 and sd 13 37205 34 Notice that these two values are also close to each other Exercise Using the following frequecy table for 57 male stu dents from week1 p14 compute the grouped mean and sd using your calculator and R Compare them with exact values SydU MATH1015 2015 First semester 15 THE UNIVERSITY OF SYDNEY MATH1015 Biostatistics Week 2 3 CLASS INTERVAL CLASS CENTER FREQUENCY 122 136 129 6 136 150 143 17 150 164 157 17 164 178 171 7 178 192 185 8 192 206 199 1 206 220 213 1 TOTAL 57 Answer Grouped mean 157 2456 and grouped variance 367 4431 sd 19 16881 Exact mean 9021
Download Pdf Manuals
Related Search
Related Contents
GE PTDS650GMWT User's Manual Philips GC510/25 GSM2 datasheet - Greenwich Instruments Ltd Mode d`emploi TL - A320 - 2rent AB 1 - Canon 体外式心臓ペースメーカ EDP 20 (EDP 20, EDP 20/A) Scarica LG BP40NS20 Copyright © All rights reserved.
Failed to retrieve file