Home

User Guide - PQStat Spatial Analysis

1. gt gt wi xi Z a T we compute the sum of the difference squares 3 DD Wig ti z As a result Geary s autocorrelation coefficient is expressed with the formula Da jar wig Bi xj 2Sosd where n the number of spatial objects the number of points or polygons i zj are the values of the variable for the compared objects wij elements of the spatial weights matrix weights matrix row standardized So Dio Dojo Wey 2 sd Depa mim variance x it is the mean value of the variable for all objects The interpretation of Geary s coefficient e c lt land c x 0 means the occurrence of clusters with similar values a positive autocor relation e c gt 1 means the occurrence of the so called hot spots i e distinctly different values in neighboring areas a negative autocorrelation e c x 1 means a random spatial distribution of the studied variable a lack of autocorrelation Note When the values of a studied feature are characterized by a great variability of variance then it is desirable to stabilize that variability The basic information about smoothing variables have been described in the Chapter 1 7 SPATIAL SMOOTHING Significance of Geary s autocorrelation coefficient A test for checking the significance of Geary s autocorrelation coefficient serves the purpose of Copyright 2010 2014 PQStat Software All rights reserved 41 4 ANALYSIS OF THE
2. nm var I variance Depending on the assumption concerning the distribution of the population from which the sample has been taken the manner of selecting variance is chosen Cliff and Ord 1981 4 and Goodchild 1986 7 If it is normal distribution then n S nS a 355 o var I m 1 E I where 2 1 5 ier Dajan wig wz 2 S2 im Ea Wij gt wji i If it is random distribution then n n 3n 3 S4 nSo 356 Ko n n S1 2nSq 6 55 I me in NOR n NOS E I where Tm m ND Ui 2 Oee nb n n 1 n 2 n b 1 Statistics asymptotically for a large sample size has the normal distribution On the basis of test statistics p value is estimated and then compared with the significance level a Copyright 2010 2014 PQStat Software All rights reserved 36 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION ifp lt a gt wereject Ho and accept H ifp gt a gt there is no reason to reject Ho The window with settings for Moran s analysis is accessed via the menu Spacial statistics Tools Moran s global statistic Global Moran s I statistic EXAMPLE 5 1 catalog leukemia file leukemia pqs The analysis will concern the data gathered and analyzed by L A Waller and others in 1992 11 and 1994 12 described on 281 objects in 2004 13 The map leukemia contains information about the locati
3. The window with settings for weights matrices is accessed via the menu Spacial analysis Tools Spatial weights matrix Copyright 2010 2014 PQStat Software All rights reserved 13 Xe Spatial weights matrix Options Select the source Weights matrix 5 According to distance According to contiguity Type of contiguity Options for governments 8 Vare 9 Varo 10 Varl0 Neighborhood 0 1 11 Warll Row standardization 12 Varl 153 Varl3 14 Varl4 15 Varl5 16 Varl6 1 6 1 Weights matrix according to distance For creating a weights matrix based on the distances of points we should have at our disposal data from a map which contains objects such as a point a multipoint or a polygon In the case of an analysis of polygons calculations are based on centroids and in the case of multipoints they are based on centers of objects Description you can find in User Guide PQStat section the similarity matrix 1 6 2 Weights matrix according to contiguity For creating a weights matrix based on proximity of objects contiguity we should have at our disposal data from a map which contains objects such as a multipoint or a polygon Type of contiguity The contiguity is usually understood as a common section with a non zero length i e a section longer than 1 point it is the Rook type neighborhood or as any section also of zero length i e a point it is the Queen type neighborhood Weights matrix according to co
4. 30 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D Competition among species has an influence on the changes in the distribution of particu lar species of plants and on their density Competition within a species is usually stronger than that among different species as members of the same species have almost identi cal demands and compete for the same resources The intensity of competition within a species increases with the growth of the population To check the influence of the com petition on a certain species of balsamic poplar a wooded area not regulated by man was studied Locations of young trees and of old ones were studied Mapa T poplar contains fictitious information about the locations of 121 points old balsamic poplars in a rectangular wooded area Mapa S poplar contains fictitious information about the locations of 326 points young balsamic poplars in a rectangular wooded area Ove oft tote a oe t e cd e eu e tes as is P a o so o ee a RAP atte Je ater h e 0 o cee To 45 e o 1 o 9 ih M J 4 sd E ds Le E e e EA Deo a e e O e bee o o o E o P ke a PR ees EE oe OS oe ee o e 00 o e On the map young poplars were marked in red and old poplars were marked in blue On the basis of the nearest neighbor indices the structure of poplar density was compared in the area defined by a
5. 5 115 45 Goodchild M F 1986 Spatial Autocorrelation CATMOG 47 Geobooks Norwich UK Moran P A P 1947 The Interpretation of Statistical Maps Journal of the Royal Statistical Society B10 243 51 O Rourke J 1998 Computational Geometry in C 2nd ed Massachusetts Smith College De Smith M J Goodchild M F Longley P A 2007 Geospatial Analysis A Comprehensive Guide to Principles Techniques and Software Tools 2nd ed Matador Waller L A Turnbull B W Clark L C Nasca P 1992 Chronic disease surveillance and testing of clustering of disease and exposure Application to leukemia incidence and TCE contaminated dumpsites in upstate New York Environmetrics 3 281 300 Waller L A Turnbull B W Clark L C Nasca P 1994 Spatial pattern analyses to detect rare dis ease clusters in Case Studies in Biometry N Lange et al Editors John Wiley and Sons New York 3 23 Waller L A Gotway C A 2004 Applied Spatial Statistics for Public Health Data New York John Wiley and Sons Yamamoto J K 1997 A Pascal program for determining the convex hull for planar sets Computers and Geosciences 23 n 7 725 738 Copyright 2010 2014 PQStat Software All rights reserved 61 K DA
6. LL Z O a Z lt a a LLJ T LL O A Y gt q Z lt a High High 56 Copyright 2010 2014 PQStat Software All rights reserved F Nr Fr B oo ENS We were able to localize 3 clusters 6 census tracts in the analysis of the G coefficient and 4 tracts in the analysis of the G coefficient in which the prevalence of leukemia is significantly higher They are the centers of clusters with high values of leukemia marked in red on the map The obtained results can be additionally illustrated by coloring the map so as to present the values of the local Getis and Ord s coefficient or the values of the test statistic or the p values One just has to copy the appropriate columns from the report and paste them into a datasheet In this example we will use the values of the Z G test statistic for coloring Having pasted it into an empty column of a datasheet in the map manager we color the base map according to the values of that column selecting the standard deviation with the coefficient 3 as a way of gradiating colors Positive and high values of the Z statistic point to a concentration of objects with high values whereas negative and low values point to a concentration of objects with low values and the values near zero point to a random spatial distribution of the studied variable a map can be added with means and confidence intervals Copyright 2010 2014 PQStat Software All righ
7. a spatial weight matrix In Moran s analysis window we can choose matrix generated previously by using menu Spatial analysis Tools Spatial weights matrix or indicate the neighbor matrix according to contiguity Queen row standardized that is pro posed by the program The basic statistics for the analysis of the nearest neighbors are e di NN the distance of each point from its nearest neighbor e NN the mean nearest neighbor distance Nie eet n q 1 e SD nn standard deviation of the nearest neighbors distance e Tan mean random nearest neighbor distance Nearest Neighbor Index Nearest Neighbor Index NNI is based on a method described by botanists Clark and Evans 1954 3 NN I compares distances observed between the nearest points and distances which would appear for a random distribution of points NN ran NNI When the compared distances are the same then NNI 1 When the observed distances be tween the nearest points are smaller than expected then the points are nearer to one another than in a random distribution and NNI lt 1 In such a case clusters occur When the situation is reverse then NNI gt 1 which points to the occurrence of the effect of uniform distribution i e points are distributed more regularly than in a case of random distribution Significance of the Nearest Neighbor Index The test for checking the significance of the Nearest Neighbor Index N N I serves the purpos
8. a very similar result when we return to the analysis the button amp and choose the convex hull as the bounding NNT 1 382828 p lt 0 000001 Analysis time Analysed variables SHP_X SHP_ Significance level 0 05 Spatial weights matrix Euclidean wszystkich eleme Bounding Types points Convex Hull Number of points 379 Area 306696110047 008 Density o The nearest neighbor NN Mean Distance NN 19668 564923 Standard Dev Distance NN 9102 847069 Expected Mean Distance NN 14223 436336 Nearest Neighbor Index NNI 1 382828 i 381 906196 Z statistic 14 257764 p value lt 0 000001 We add the boundaries defined by the convex hull by pressing the button MAP ss and choosing the layer of bounding Copyright 2010 2014 PQStat Software All rights reserved 29 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D The correction of the effect of a boudary defined in this way lowers the value of NNI to 1 340503 but leaves the general tendency of the subsequent nearest neighbor indices unchanged Nearest Neighbor Index MNI In each of the analysis described above the subsequent neighbor indices are greater than 1 and although they initially approximate 1 from order 5 they stabilize at the level of about 1 1 The result then confirms the uniform distribution of Polish powiats EXAMPLE 4 2 directory poplar SHP files T poplar S poplar Copyright 2010 2014 PQStat Software All rights reserved
9. are usually treated as weights in spatial analyses and in this manner allow to use the information from the map The simplest form of a weights matrix is a neighbor matrix Neighbor matrix is a square table with zeros on the main diagonal where the neighborhood of objects is marked with a binary value 1 for neighboring objects O for non neighboring objects object number nr 1 2 3 4 5 6 7 8 1 0 1 1 1 0 0 0 0 2 1 0 1 0 0 1 1 0 3 1 1 0 1 1 1 0 70 4 1 0 1 0 1 0 0 70 d o stoto oia GONHOHONE onoono REA stooo oito Table 1 Example of a neighbor matrix In statistical analyses the most commonly used matrices are row standardized matrices with the values of each of its rows summing to one Row standardization means that each weight is divided by the sum of a row the sum of the weights of all neighboring elements As a result the obtained weights are in the range from O to 1 The influence of objects with varying numbers of neighbors in analyses based on a weights matrix standardized in this way is balanced mji 2 3 4 5 6 778 ajo papo o o o a o A o o mao spas o pa 0 0 4 1 3 0 1 3 0 13 0 oO oO oe REAR Spe yep oo pa are oar 8jojo 0 0 1 8 1 8 1 8 0 Selected weights matrices should reflect spatial relationships which connect the analyzed objects The more realistic the reflection of the model of mutual influence of objects in space the more exact results will be obtained
10. placement of N NT in reference to the line which shows the random point structure and so as to check if a growing or falling trend has been achieved for the indexes Edge Effect Objects placed near the bounding show a tendency to be further away from their nearest neigh bors than other objects within the analysed area The reason for it is the simple fact that the nearest neighbors of the objects near the border can be objects outside the studied area In such a situation we can conduct an analysis with an adjustment for the edge effect In such a case the distance of a point from its nearest neighbor d N N is calculated as the minimum distance of the point from its neighbors and from the boundary Thus if the distance of the point from the boundary will be smaller than the distance from its neighbors then the distance from the bound ary is considered to be d NN However such a calculation of the nearest neighbor requires an assumption that there will always be a neighboring point on the border The window with settings for Nearest Neighbor Analysis is accessed via the menu Spacial analysis Spatial Statistics Nearest Neighbor Analysis Copyright 2010 2014 PQStat Software All rights reserved 27 K DA Test options ser Types ort Edge effect corection to edge Order of neighbour V Add map layers EXAMPLE 4 1 directory districts SHP files districts The admistrative division of Poland into po
11. point distribution the weighted mean of coordinates of the X axis and the Y axis n n _ 2a Witi iy Witi Ly n Yw n iat Wi int Wi where w weights representing the value of a feature in the zth object e Weighted circle The radius of the circle is wsdd weighted standard distance from the center expressed with the formula n x2 n 2 Dis want D wiy i ri i 193 wsdd SED Ss j 1 Wi where e zm Lo dj Dy Yi Yi o Yw Note In the formulas concerning the lengths of the radius of a circle and of a semiaxis of an ellipse the denominator was decreased by value 2 Buliung 2008 2 Smith 2007 10 Copyright 2010 2014 PQStat Software All rights reserved 20 3 DESCRIPTIVE STATISTICS The window with settings for Descriptive statistics is accessed via the menu Spacial analysis Spatial descriptive statistics Spatial descriptive statistics Test options Bounding Types points Convex Hull NUM Add analysed data Spatial Weights Matrix L Add graph Add map layers EXAMPLE 3 1 directory snow SHP files deaths pumps streets Data for the analysis are probably the best known classical example of the use of cartog raphy in epidemiology They present the epidemic of cholera in London in 1854 The map which presents the range of the epidemic was made by John Snow a doctor and the discov erer of the cause of the epidemic considered to be one of the fo
12. the information from neighboring areas or on using a larger amount of information from the studied region 1 7 1 Locally weighted averages The method is based on transforming the value of the studied variable X with elements 71 2 n into a new smoothed variable smooth X with elements smooth x1 smooth x2 smooth ay The transformation consists of calculating the arithmetic mean of the values of the variable X for the studied object and its neighboring objects in accordance with the given weights matrix The zero ele ments of the main diagonal of the weights matrix are replaced with the value 1 Locally weighted averages simple The value x is transformed into the smoothed value smooth x according to the formula Dj Wij hs smooth x Es J 1 4 Copyright 2010 2014 PQStat Software All rights reserved 15 Kae 1 SPATIAL ANALYSIS n the number of spatial objects the number of points or polygons i j are the values of the variable for the compared objects wij elements of the spatial weights matrix If the smoothing is made with the use of a neighbor matrix which only carries the information about the presence 1 or absence 0 of neighborhood then smooth x is really the mean value for the studied object and the neighboring objects However if a weights matrix is used e g according to the inverse Euclidean matrix inside a circle with the radius d and it carries the in
13. 012 2 21 23 21 22 Descriptive statistics Center me EM fi Iv 2012 2 21 23 21 22 Descriptive statistics Standard de EM fil m Preview Data 2 streets shp 2012 2 21 23 21 a The list also allows switching on and off visibility of a layer and changing the order in which layers are added tI editing the layers and deleting them i If the source of a layer a report or Copyright 2010 2014 PQStat Software All rights reserved 8 1 SPATIAL ANALYSIS sl a map linked to the layer is removed then such a layer is automatically removed from the layer list Copyright 2010 2014 PQStat Software All rights reserved 9 1 SPATIAL ANALYSIS 1 3 4 Map Style Edition We can edit map layers by choosing the button Eq in the map list The manner of editing depends on the type of objects presented on a given map points multipoints lines polygons It is possible to select the style of lines color fill and the level of its transparency By default the coloring utilizes one color only In the case of layers representing the base map there is a number of coloring methods Coloring Methods Full color when this method is used all objects will be colored with the use of the same method with the use of one color only the button Fill Color gradation when this method is used objects will be colored according to the value assigned to them in a selected datasheet variable the button Color gradation For example when coloring
14. Graduated colors to 1 84194427 1 84194427 8 241888 8 241888 18 34433371 18 34433321 140 1622378 F We have at our disposal several ways of coloring a map we choose coloring in accordance with the values of the variable prev dividing it into quartiles Copyright 2010 2014 PQStat Software All rights reserved 38 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION Dark colors on the map present the places with a higher frequency coefficient of leukemia whereas light places signify a lower frequency coefficient In order to learn if their ge ographic distribution is random or if they forms clusters we will calculate Moran s coef ficient Before calculating that coefficient we should determine the manner of defining neighborhood of regions and it is advisable to create an appropriate weights matrix In Moran s analysis window we can choose any matrix generated previously by using menu Spatial analysis Tools Spatial weights matrix or indicate the neighbor matrix accord ing to contiguity Queen row standardized that is proposed by the program k Spatial weights matrix Options Select the source Weights matrix Map Sheet According to distance According to contiquity BD Spatial weights matrix leukemia RE SEE Queen Neighborhood 0 1 Row standardization 0 Replace empty cells OnlySel Having generated the weights matrix we select the file
15. IAL AUTOCORRELATION To conduct spatial autocorrelation on the basis of a Map data we should have at our disposal a point multipoint or polygonal file In the case of an analysis of a polygonal file based on the calculation of objects distances calculations are based on centroids of polygons and in the case of a multipoint file they are based on centers of objects An analysis of the phenomenon of autocorrelation is based on values assigned to spatial objects Spatial autocorrelation means that the values of geographically near objects are more similar to one another than those of remote objects The phenomenon causes the creation of spatial clusters with similar val ues Spatial autocorrelation may not occur we then speak of spatial randomness The obtained spatial distribution is as probable as any other distribution When the neighboring values are similar to one another we can speak about positive autocorrelation Negative autocorrelation occurs when the values of neighboring areas are more varied than in the case of random distribution negative autocorrelation a lack of autocorrelation positive autocorrelation When analyzing autocorrelation we can consider a dichotomous variable i e the presence or absence of a given feature or a variable with many categories pointing to the degree of intensity of the analyzed feature For a dichotomous variable the analysis of positive autocorrelation consists of searching for clusters with
16. PQStat Software Statistical Computational Software User Guide PQStat Spatial Analysis Barbara Wieckowska COPYRIGHT 2010 2014 PQSTAT SOFTWARE All rights reserved Version 1 4 8 P7909121213 www pqstat pl Kae Contents 1 SPATIAL ANALYSIS 2 1 1 BASIC DEFINIT TIONS lt 4 ome acca wk ee he eS RR RE EE wh E 2 12 MAPOPENING og oe ee Rw EE OO EEE EEE HE BEE ES 4 1 3 MAP MANAGER c 4 25 eu Gad LER Be Be ee SR E E Be we ee o be eS 5 1 3 1 Map Viewing Tools 64 a8 Be HERES HSE OEE 6 OSES HERES AAA 6 1 3 2 Selection Area Tools aoaaa a 6 o Me so bee eA CAE eC EASES E REEE 7 13 4 MapStyleEdition 2 0 06 ce eee ee ee ee ee ee ee se a 10 1 4 HOWTO REDUCE A WORKSPACE 0 0 0 0 eee eee ee erra 11 1 5 GEOMETRIC CALCULATIONS sasaaa 12 1 6 SPATIAL WEIGHTS MATRIX 1 ee 13 1 6 1 Weights matrix according to distance 0 0 2 eee ee ee ee 14 1 6 2 Weights matrix according to contiguity 0 000 ee ee eee ee ee 14 1 7 SPAIIALSMOOTHING 46 6 4 Gwe w ew EEE eS oR OH HH we Se E E BOS 15 1 7 1 Locally weighted averages 1 A 15 2 TESTING HYPOTHESES 17 3 DESCRIPTIVE STATISTICS 19 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION 25 4 1 Nearest Neighbor Analysis 1 25 5 SPATIAL AUTOCORRELATION 33 5 1 Global Moran s I statistic 1 oaoa 33 5 2 Global Geary s C statistic 2 a a 41 6 LOCAL ESTIMATE OF SPATIAL CLUSTERING 46 6 1 Local Moran s I statistic cea be e
17. RANDOMNESS OF POINT DISTRIBUTION D Dark colors on the map present the places with a higher prevalence of leukemia whereas light places signify a lower prevalence Geary s correlation coefficient obtained in the anal ysis equals 0 884986 0 00sec Under normality assumption Variance zZ statistic Copyright 2010 2014 PQStat Software All rights reserved 44 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D The obtained result assuming a random distribution of data is different from the result obtained with the assumption of a normal distribution That can be indicative of an insta bility of the results and point to the need of further analyses based on smoothed variables Copyright 2010 2014 PQStat Software All rights reserved 45 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION 6 LOCAL ESTIMATE OF SPATIAL CLUSTERING In the local analysis we try to define clusters according to their placement size and intensity A cluster is understood as a limited gathering of objects of certain intensity placed in space and or time an acci dental appearance of which is highly improbable If we identify such a gathering which is not accidental but is a statistically significant cluster we can infer the reasons for its occurrence 6 1 Local Moran s statistic Local Moran s statistic is the most popular analysis from those defined as LISA Local Indicators of Spatial Association Luc Anselin 1995 1 In cont
18. RANDOMNESS OF POINT DISTRIBUTION D verifying the hypothesis about a lack of spatial autocorrelation Hypotheses Ho Ca Hi CALE The test statistic has the form presented below ae E C var C Z where E C 1 the expected value var C variance Depending onthe assumption concerning the distribution ofthe population from which the sample has been taken the manner of selecting variance is chosen Cliff and Ord 1981 4 and Goodchild 1986 7 If it is a normal distribution then 25 5 A452 a E pi A 2 n 1 Sg where S and S are defined as for Moran s analysis If it is a random distribution then n 1 1 n 3n 3 n 1 b2 n 1 So n2 3n 6 n2 n 2 ba 3 53 n 3 n 1 b2 var C S O where tinea n nb n 1 n 2 n b 1 Statistics Z has asymptotically for large sample sizes normal distribution On the basis of test statistics p value is estimated and then compared with the significance level Q fp lt a gt wereject Ho and accept H1 fp gt a gt there is no reason to reject Ho The window with settings for Geary s analysis is accessed via the men Spacial analysis Spacial statis tics Global Geary s C statistic Copyright 2010 2014 PQStat Software All rights reserved 42 k Global Geary s C statistic Statistical analysis Global Geary s C statistic Spatial Weights Mato Report options Add an
19. a map which shows altitude color shade for points lying higher will be different from that for points lying lower The variable according to which we will do the coloring should only contain numerical values If that is not the case then the object for which there is no numerical value is not colored according to the coloring method chosen for that variable but has the default color for the map Methods for variable breaks used in color gradation e Natural Breaks Jenks a method in which a variable is broken into such classes that variance in classes is minimized and variance among classes is maximized e Quantile Breaks a method in which a variable is broken into classes with an equal number of units Copyright 2010 2014 PQStat Software All rights reserved 10 1 SPATIAL ANALYSIS 1 4 HOW TO REDUCE A WORKSPACE Workspace is limited for the purpose of indicating only those objects which will be subjected to the analysis Such objects are indicated in the program by activating or deactivating them Inactive objects are not subjected to statistical analyses Manual activation deactivation of objects e Indicating a row in the data sheet which describes the appropriate object and selecting the option Activate Deactivate from the context menu on its name e Indicating an object on the map and selecting from the context menu the option Acti vate Deactivate or Identify Activate Deactivate object Automatic activation deact
20. a point with coordinates of the X axis and the Y axis calculated as a mean from the coordinates of points constituting polygon vertices Centers can be drawn on the basis of calculations made on the map in such a case we choose the option Draw and calculate based on map data or on the basis of existing points the coordinates of which are in the datasheet we then choose the option Draw based on the datasheet e Layer Label for an object that is a layer of the text type A label is any text or number concerning objects presented on a map Objects can be de scribed by choosing from the datasheet a variable which contains proper labels e Layer Bounding types that is a layer of the polygonal type Min Bounding convex hull that is the smallest convex polygon in which analysed objects are enclosed Yamamoto J K 1997 14 Min Bounding rectangle that is the smallest rectangle in which analysed objects are enclosed Min Bounding circle that is the smallest circle in which analysed objects are enclosed Rectangle from map bounding that is a rectangle in which analysed objects are enclosed with coordinates of the lower left hand vertex min X min Y and of the upper right hand vertex max X max Y Layer List the layer list allows to check how many visible layers constitute the received image button on the tool bar Toe pd Base map E 2012 2 21 23 21 22 Descriptive statistics Selected bo I DM 2
21. a val ues The light red color is used for the census tract with high values of the coefficient describing the prevalence of leukemia The region contrasts with the neighboring census tracts which are characterized by a relatively low coefficient The obtained results can be further illustrated when the map is colored with the values of the local Moran s J coefficient or the values of a test statistic or p values One just has to copy the appropriate columns from the report into a datasheet In this example we will use the values of the Z J test statistic for coloring Having pasted it into an empty col umn of a datasheet in the map manager we color the base map according to the values of that column selecting the standard deviation with the coefficient 3 as a way of gradiating colors Positive and high values of the Z statistics point to the occurrence of clusters of similar values while negative and low values of that statistic point to the occurrence of the so called hot spots Values close to O point to a random distribution of the studied value in space Copyright 2010 2014 PQStat Software All rights reserved 50 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION By analyzing the smoothed variable textsfprev we strengthen the clusterization effect We obtain a similar result but this time we localize 3 clusters 19 census tracts which are cluster centers Analysed variables Significance level Corrected significan
22. alysed data Add graph EXAMPLE 5 1 cont catalog leukemia file leukemia We will analyze the data concerning leukemia The map leukemia contains information about the location of 281 polygons census tracts in the northern part of the state of New York Data for the map leukemia Column CASES the number of cases of leukemia in the years 1978 1982 as cribed to particular objects census tracts The value should be an integral num ber however in agreement with Waller s 1994 description some cases which could not be objectively ascribed to a particular region have been divided propor tionately Hence the numerousnesses of the cases ascribed to the 281 objects are not integral numbers Column POP population size in particular objects Column prev the frequency coefficient of leukemia per 100000 people for each object in one year prev CASES POP 100000 5 Global Moran s analysis has pointed to a lack of spatial autocorrelation This time in order to check if in the studied area of the northern part of the state of New York it is possible to localize clusters of leukemia we will compute the global Geary s C statistic We start from the presentation of the geographic distribution of the prevalence coefficient prev on the map according to the values of the prev variable dividing it into quartiles Copyright 2010 2014 PQStat Software All rights reserved 43 4 ANALYSIS OF THE
23. ap A map can be opened with the help of the Map Manager menu File gt Open file if we open a map from a shapefile SHP menu Plik Project maps or button amp on the tool bar if we open a map from the PQStat program or in the Navigation tree of the PQStat program context menu Map Manager on the name of the datasheet linked to the map The image which presents a map can be exported to a file in the BMP PNG or JPG format by choosing in the Map Manager window menu Files Export view Copyright 2010 2014 PQStat Software All rights reserved 5 1 SPATIAL ANALYSIS 1 3 1 Map Viewing Tools Zoom in allows to view a map in a larger scale and see its details Zoom out allows to view a map in a smaller scale and see all its parts Adjust to the window allows such a view of a map that the whole image is displayed in a window Select E allows to choose a rectangular part of a map which will be enlarged and adjusted to the window size Grabber v allows to move the image in the browser window so as to place a given part of the image in a chosen position As we are browsing the map we also get a tooltip concerning the ID and the name of the object we point to with the cursor The name is loaded from the data sheet it is the variable indicated as active in the Map manager By default during import the first variable of the text type is set as active We can get more information about th
24. apefile shp E Import data from SHP SHX DEF file Options File name E snow deaths shp Attribute file encoding Windows 1250 Shape Type Point Ria SHP Number of objects 578 Total number of paints 5 8 SHX Range 4 44 1586 968 0222 DBF Range 1 328 862538 916 52904 Data Preview I Eine Ec In the import window we can preview the imported map and its attributes saved in a DBF file If the directory from which we import contains all files necessary for loading the map then the correct reading of appropriate files is confirmed in yellow by proper controls Attributes ascribed to a shapefile in the form of a DBF database are not required for proper loading of a map An attribute table can be completed after a map file has been loaded by filling in proper cells of the datasheet linked with the map Copyright 2010 2014 PQStat Software All rights reserved 4 1 SPATIAL ANALYSIS 1 3 MAP MANAGER The Map Manager is a tool for managing a map and the layers ascribed to the map It is possible to view both maps imported to the PQStat program and maps opened directly from an SHP file E Map Manager File Tools Feature Layers Help 988 00 DO0 Copyright 2009 2012 PQStat Software The Map Manager is activated through the menu Spacial analysis gt Map Manager button on the tool bar context menu Map Manager on the name of a datasheet linked to the m
25. are based on centroids of polygons and in the case of a multipoint file they are based on centers of objects The effect of uniform dispersion appears when points are distributed more regularly than the possible result of random distribution If the spatial distribution is as probable as any other distribution we speak about spatial randomness When the points come in groups we speak about clustered distribution uniform distribution random distribution clustered distribution 4 1 Nearest Neighbor Analysis In the Nearest Neighbor Analysis the boundaries of the area in which the analysed points are enclosed have the crucial influence on the result The example below illustrates regularly distributed points and their clustered distribution when bounded by a large rectangle Depending on the needs the bounding can be defined with the help of a convex hull the smallest rectangle a rectangle from layer bounding or the smallest circle The studied area can also be defined only with the use of the size of its area Copyright 2010 2014 PQStat Software All rights reserved 25 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D The distance between the points is measured with the Euclidean metric The first stage of the nearest neighbor analysis is calculating the distance among all points Next for each point we search for the nearest point i e for the nearest neighbor N N Note The distances between all points are defined by
26. as for data which describe map geometry geometric geographic functions Data for transformation are chosen from a shapefile SHP ee Data transformations Transformation only in the selected area Input columns for trans Transformation functions 1 NUM Var meanCenter poly Polygons Mean Centers centroid poly Polygons Centroids area poly Polygons Area perimeter poly Polygons Perimeter Transformation results Insert to existing fields Insert new fields Add after 2 Var2 Set te proper columns and furcten Available formulas meanCenter poly gives center coordinates for polygons centroid poly gives centroid coordinates for polygons area poly gives polygon areas perimeter poly gives polygon perimeters e Formulas for data visible in a datasheet creating maps Available formulas map points gives a vector map presenting points together with assigned datasheet Copyright 2010 2014 PQStat Software All rights reserved 12 STM 1 SPATIAL ANALYSIS SA 1 6 SPATIAL WEIGHTS MATRIX Spatial relations among objects presented on a map can be organized in the form of a matrix The matrices are called weights matrices Due to their large size and a great amount of detailed information weights matrices are not carriers of knowledge which could be presented directly in the results of a conducted study but they form the basis of further analysis The data included in the matrices
27. boring objects N dim gt wistand a j l A graphic presentation of spatial autocorrelation is Moran s scatter plot Points in the first quarter HH and in the third quarter LL are objects surrounded by similar neighbors HH high high objects with high values surrounded by objects with high values LL low low objects with low values surrounded by objects with low values Points in the second quarter LH and the fourth quarter HL are objects surrounded by neighbors not similar to them LH low high objects with low values surrounded by objects with high values HL high low objects with high values surrounded by objects with low values The belonging to and distribution of points in the four quarters of Moran s diagram indicates the type of autocorrelation If points are distributed mainly in the second quarter LH and fourth HL it is a sign of negative correlation if they belong mainly to the first quarter HH and third LL it is a sign of positive correlation If the points are distributed evenly in all four quarters then spatial autocorrelation does not exist On the Moran s diagram there is a regression line the direction of which also allows to interpret Moran s coefficient J e J gt OQ indicates the presence of clusters of similar values positive autocorrelation i e measurement points lie near the straight line and the increase of the variable stand X is reflected in the increase o
28. ce level Bonferroni Average number of neighbors Spatial weights matrix Number of objects Variable smoothing Mean Ii Standard deviation Ii Frequency High High 1 Frequency Low low 3 Frequency Low High 2 Frequency High Low 4 Copyright 2010 2014 PQStat Software All rights reserved 51 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D ee a a oe EO a e an Ao CR e a No Es Eis n vo J 6 2 Local Getis Ord s G statistic Getis and Ord s G statistic Getis and Ord 1992 Ord and Getis 1995 allows the detection of a local concentration of high and low values in neighboring objects and studies the statistical significance of that dependence Getis and Ord have also defined a G statistics very similar to the G statistics The only difference between them is that in the case of the former the object for which the study is made also takes part in the analysis In a weight matrix then the so called potential is defined for that object i e the neighborhood with itself values on the axis are greater than 0 Getis Ord s G coefficient The local form of Getis and Ord s G coefficient for the 2 observation is defined with the formula n 2 s j 1 Wij hj a j l 9 The G coefficient is defined with the same formula but the computations are also made for the studied object that is the object for which the z and the 7 in
29. dard deviation 92 562385 Median 160 616634 Lower quartile 102 745253 Upper quartile 229 246456 Minimum o Maximum 662 896352 The area in which there are the points defined by the convex hull is 0 257531km7 We can draw them on the map by pressing the button of object bounding and selecting the layer There is on average over 2 points per 1000m density 0 002244 points per m7 The analysis of the point distance matrix allows a more exact evaluation of their density Copyright 2010 2014 PQStat Software All rights reserved 22 3 DESCRIPTIVE STATISTICS Some points are in the same place because the smallest distance is Om There are also points at a far greater distance from each other the greatest distance is 662 896352m We can also find information about the average distance and the standard deviation of the points here The most interesting information in the analysis of the deaths map is offered by the lo calized Center of point distribution 703 79 631 65 together with the area of standard deviations which describe the the degree of concentration and the direction of dispersion circle ellipse rectangle Mean X r sdd 703 78827 631 64920 138 17912 59983 908 Mean a 2sdX b 25d 703 78627 691 64920 210 94659 176 16345 37583 326 Mean X Mean Angle 287 4 58724 971 Mean Semiaxes Semiaxes 703 78627 631 649270 125 29 12 151 60726 Mean The ellipse o
30. dexes are equal G where lt j Copyright 2010 2014 PQStat Software All rights reserved 52 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D As the coefficient is based on a quotient of two sums of the values of the x objects in order to interpret the coefficient correctly it is important that the analyzed phenomenon is described with the use of positive numbers The interpretation of Getis and Ord s local coefficient simi larly to local Moran s coefficient depends to a great degree on the selected weight matrix row standardization of the matrix is recommended High values of the G or G coefficients point to a concentration of objects with high values of the analyzed phenomenon whereas low values point to a clustering of objects with low values When the values are close to the expected value then the spatial distribution of the studied value is random The expected value is defined with the formula jai Wij n 1 jar Wij n E Gi where i j E G The significance of Getis and Ord s coefficient By testing the statistical significance of the relationship among the neighboring objects the fol lowing hypotheses are studied Ho G E G Ho Gi E G Hi GAE G Hi Gt 4 E G The test statistic has the form presented below Z G n 1 where x 1 ix the mean of the variable X s i 2 i s the variance of the X variable The Z statistics has asymptot
31. e object pointed at by choosing the option Identify from the context menu In the identification window it is also possible to Activate Deactivate object 1 3 2 Selection Area Tools Creating and saving a selection area allows to choose parts of a map which can later be subjected to a separate analysis Creating a Selection Area To select and save a selected area we choose Tools Create selection area Then using the mouse or filling in the fields in the upper part of the Map Manager window we select the chosen part of a map elliptical or rectangular shape The selection is saved with the use of the button Save Edition of a Selection Area The placement of each selection area which was saved can be changed In the edition window we can also delete a selection area Copyright 2010 2014 PQStat Software All rights reserved 6 1 SPATIAL ANALYSIS Re Map Manager Projectl SHP Data 1 File Tools Feature Layers Help 432022 69156 746925 984509 5962526 336 49 00506 8136 3 118079724613 F 5 67713 7594 4 706845 974866 1107260 423 787 777673 70465 6008789 4683 A 3 39935 23249 Y 6092831 5 326 Shape Polygon districts shp That window is opened with the help of the menu Tools Edit selection area and closed with the use of the button Close Deleting All Selection Areas All selection areas can be deleted with the use of the menu Tools Delete all selection areas 1 3 3 Layers Both a map and elements added
32. e of verifying the hypothesis that the distances observed between the nearest points are the same as the expected distances which would appear in a random distribution of points Hypotheses Ho NNI 1 Hi NNI 1 Copyright 2010 2014 PQStat Software All rights reserved 26 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION The test statistic has the form presented below NN ran VA SErm where OLan Izi standard error of the mean random nearest neighbor distance Statistics Z asymptotically for a large sample size has the normal distribution Value p defined on the basis of test statistics is compared with the significance level a fp lt a gt wereject Ho and accept H1 ifp gt a gt there is no reason to reject Ho Analysis of Subsequent Nearest Neighbors To analyse subsequent nearest neighbors one takes into account the distance to the second near est neighbor the third nearest neighbor and so on to the k order nearest neighbor For the neighborhood of each order from the nearest neighbor to the k order neighbor subsequent Nearest Neighbor Indexes Kordereg NNI are calculated k NN KorderedN NI n ordered an where korderedIN N mean distance from neighbors of k order k 2k Mando tan EL mean random distance from neighbors of k order 2kk1 E The results of the point density analysis conducted for subsequent neighbors can be presented on a graph so as to illustrate the
33. e studied variable The expected value is defined with the formula jai Wij B The significance of Moran s autocorrelation coefficient By testing the statistical significance of the relationship among the neighboring objects the fol lowing hypotheses are studied Copyright 2010 2014 PQStat Software All rights reserved 46 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D Ho i Ej H I F E I The test statistic has the form presented below E l Zi var I where Da ti wio n b 2w 2bo n 1 wiz var I wiay nr ba Zwier 2b2 n ATT variance in a random n 1 n 1 n 2 n 1 distribution n 1 Dy wi 2 Dr a w 2 the sum of weights square for the 1 row 2W kh the sum of possible weights ratios for the 1 row after the exclusion of ratios with the same indexes The Z statistics has asymptotically for large sample sizes normal distribution On the basis of test statistics p value is estimated and then compared with the significance level Q fp lt a gt wereject Ho and accept H1 fp gt a gt there is no reason to reject Ho Due to the problem of a lack of independence of coefficients computed for neighboring objects it is suggested to use a corrected significance level The suggested corrections are Bonferroni correction ay a k or Sid k correction a 1 1 a where k is the ar
34. ee OO HERE ADK ww EER ERS SERGE SO ww 46 6 2 Local Getis Ord s G statistic 2 0 0 eee ee ek 52 1 SPATIAL ANALYSIS SIN 1 SPATIAL ANALYSIS Statistical spatial analysis is defined as a set of techniques for studying data which are located in space viewed in relationship with the surface of the Earth Particular techniques of spatial analysis are used in diverse areas of science from medicine epidemiology through logistics and physics to economy finding the best locations for plants shops etc The development of the methods of spatial distribution analysis and of the analysis of interrelation ships among objects has been and is to a large extent determined by the development of information technology Computers with ever increasing computing power together with GIS systems allow the processing of large amounts of geographic data 1 1 BASIC DEFINITIONS Geographic Information System GIS is a system for entering storing processing and visualization of geographic data From the technical point of view GIS is a tool which allows the analysis of interrelated e information about spatial locations of objects represented by means of a map e descriptive characteristic concerning objects presented on a map represented by means of a database Objects represented by means of a map are e Points the location of which in 2D is defined with the help of two coordinates x y e Multipoints they are points g
35. f standard deviations and the Center is drawn again by moving on to the map manager on the layer list we uncheck the bounding Oo o o As a result of conversations with local people Snow suspected that water could have been the source of the epidemic When the three maps are joined we can identify the water Copyright 2010 2014 PQStat Software All rights reserved 23 3 DESCRIPTIVE STATISTICS sty pump the water from which turned out to be the cause of the epidemic To find it we should first display the streets map in the Map Manager and next we should overlay the deaths map and the pumps onto it by pressing the button j E a 1 e 1 us R e o TA i oN a The source of the epidemic turned out to be the water pump on the Broad Street we can display its label in the Map Manager That is the only pump which was in the selected elliptical area and its location 678 85 633 27 and the location of the middle of the ellipse 703 79 631 65 i e the place around which the deaths centered are very close to each other Copyright 2010 2014 PQStat Software All rights reserved 24 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION To conduct analysis of the randomness of point distribution on the basis of a Map data we should have at our disposal a point multipoint or polygonal file In the case of an analysis of a polygonal file calculations
36. f the variable L X e J lt Q indicates the presence of the so called hot spots i e decidedly different values in neighboring areas negative autocorrelation i e measurement points lie near the straight line but the increase of the variable stand X is accompanied by a decrease of the variable L X Copyright 2010 2014 PQStat Software All rights reserved 35 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D e J x Oindicates random distribution of the studied value in space a lack of autocorrelation i e the obtained spatial distribution is as probable as any other distribution The square of Moran s coefficient 72 informs about the degree it is a percentage to which the value of the variable in the object 2 is explained by the value of that variable in neighboring ob jects Note When the values of a studied feature are characterized by a great variability of variance then it is desirable to stabilize that variability The basic information about smoothing variables have been described in the Chapter 1 7 SPATIAL SMOOTHING Significance of Moran s autocorrelation coefficient A test for checking the significance of Moran s autocorrelation coefficient serves the purpose of verifying the hypothesis about a lack of correlation between stand X and spatial lag L X Hypotheses Ho I O H 10 The test statistic has the form presented below _1 E D var I Z where EU es Di the expected value
37. formation about the real distance of objects then the nearer objects will have a greater impact on the result than farther objects and the influence of objects outside of the circle will be zero Copyright 2010 2014 PQStat Software All rights reserved 16 2 TESTING HYPOTHESES SA 2 TESTING HYPOTHESES Verification of statistical hypotheses is checking certain assumptions formulated for parameters of a general population on the basis of results from a sample Formulation of hypotheses which will be verified with the help of statistical tests Each statistical test gives the general form of a null hypothesis Ho and of an alternative hypoth esis H1 Ho inthe studied population THERE IS NOT a statistically significant e g dependence e g difference between e g spatial distribution e g presence of particular values in the analysed area H inthe studied population THERE IS a statistically significant e g dependence e g difference between e g spatial distribution e g presence of particular values in the analysed area Example Ho THERE IS NOT a statistically significant dependence between the spatial distribution of chemist s shops in Wielkopolska we assume that their distribution in the studied area is random If we do not know if the distribution of the shops can be more regular than random distribution or the other way round more clustered than random distribution then the alterna
38. he G and G coefficients The analysis will be conducted with the prev variable and the neighborhood matrix Queen row standardized according to contiguity which is suggested by the program In order to use a different matrix one has to generate it first see chapter Spatial weight matrix We also select one of the corrections of the significance level Analysis time Analysed variables Significance level Corrected significance level Bonferroni Average number of neighbors Spatial weights matrix Number of objects Mean Gi Standard deviation Gi Frequency Low Low 1 Frequency High High 2 O d sec Analysed variables Significance level Corrected significance level Bonferroni 0 009147 Average number of neighbors 5 466192 Spatial weights matrix Number of objects Fotential value Mean Gi 0 007063 Standard deviation GI 0 004523 Frequency Low Low 1 Frequency High High 2 The obtained report presents the values of local coefficients the values of test statistics and the corresponding values of test probability We will also find the information about the number of regions defining the spatial regimes High High Low SKUER Also a result is ascribed to the analysis which we can draw on the map button Ar sa spatial regimes are defined in the report with the use of the color column Copyright 2010 2014 PQStat Software All rights reserved 55 zZ gt co A A zZ O O LL O Y Y
39. he look of the view is only possible when particular layers forming that view and placed in a real location i e linked to another datasheet are edited As long as there is only one datasheet related to a map in the project the option window Copyright 2010 2014 PQStat Software All rights reserved 7 1 SPATIAL ANALYSIS sl of another map is empty When there are several datasheets the option window contains a list of layers The names of the layers on the list comprise the number and name of the datasheet linked to the map and the name of the file from which the map was imported If the map is appended with a map view from another datasheet in such a way that a circular reference is made e g map 2 is assigned a view of map 1 and map 1 is assigned a view of map 2 then a message is displayed about a circular reference The reference will be managed but circular references are not advised e Layer Centroid of a polygon that is a layer of the point type A centroid of a polygon is a point lying within a polygon and representing the center of mass O Rourke J 1998 9 Centroids can be drawn on the basis of calculations made on the map in such a case we choose the option Draw and calculate based on map data or on the basis of existing points the coordinates of which are in the datasheet we then choose the option Draw based on the datasheet e Layer Center of a polygon that is a layer of the point type A center is
40. ically for large sample sizes normal distribution On the basis of test statistics p value is estimated and then compared with the significance level Q fp lt a gt we reject Ho and accept H1 fp gt a gt there is no reason to reject Ho Due to the problem of a lack of independence of coefficients computed for neighboring objects it is suggested to use a corrected significance level The suggested corrections are Bonferroni correction ay a k or Sid k correction a 1 1 a where k is the arithmetic mean number of the neighbors Map layers The combination of the information from the value of the Z statistic and its significance presents the so called spacial regimes on the map Copyright 2010 2014 PQStat Software All rights reserved 53 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D e Statistically significant objects with high values of the Z statistic are marked as High High objects with high values surrounded by objects with high values and marked in red on the map e Statistically significant objects with low values of the Z statistic are marked as Low Low objects with low values surrounded by objects with low values and marked in blue on the map The window with settings for Local Getis and Ord s analysis is accessed via the menu Spatial analysis Spatial statistics Getis Ord G statistic F E n A x RE Local Getis Ord Gi statistic Statistical analys
41. ignificance level Copyright 2010 2014 PQStat Software All rights reserved 48 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION Analysis time Analysed variables Significance level Corrected significance level Bonferroni Average number of neighbors Spatial weights matrix Queen Immediate neigh Number of objects 281 Mean Ii 0 048404 Standard deviation Ii 0 380885 Frequency High High 1 Frequency Low low 3 Frequency Low High 2 Frequency High Low 4 The obtained report presents the values of local coefficients the values of test statistics and the corresponding values of test probability We will also find here the information about the number of regions defining the spatial regimes High High Low Low Low High High Low y 0 014 x 0 049 xy Lag Standard prev ca i l Pas 4 2 0 2 4 6 E 10 12 Standard prev Also a result is ascribed to the analysis which we can draw on the map button those are spatial regimes described in the report with the use of the color column Copyright 2010 2014 PQStat Software All rights reserved 49 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION We have been able to localize small but significant clusters in which the prevalence of leukemia is higher The red color is used for the 2 clusters 4 register regions lying in smaller and more populated regions they are the centers of the clusters with high leukemi
42. ins information about the location of 281 polygons census tracts in the northern part of the state of New York Data for the map leukemia Column CASES the number of cases of leukemia in the years 1978 1982 as cribed to particular objects census tracts The value should be an integral num ber however in agreement with Waller s 1994 description some cases which could not be objectively ascribed to a particular region have been divided propor tionately Hence the numerousnesses of the cases ascribed to the 281 objects are not integral numbers Column POP population size in particular objects Column prev the frequency coefficient of leukemia per 100000 people for each object in one year prev CASES POP 100000 5 The global analysis has not yielded an unambiguous answer as to the occurrence of spa tial autocorrelation We will then check if we can find regions in which the prevalence of leukemia is significantly higher In order to localize clusters of leukemia and regions which contrast with the environment with respect to the prevalence of that disease we will compute the local Moran s coeffi cient For the analysis we will use the prev variable and the neighborhood matrix Queen row standardized according to the contiguity which is suggested by the program In order to use a different matrix one has to generate it first see chapter Spatial weight matrix We also select one of the corrections of the s
43. is Local Getis Ord Gi statistic Test options Variable smoothing locally weighted aves eE cei G statistic Report options 7 Add analysed data Add graph Add map layers EXAMPLE 5 1 cont catalog leukemia file leukemia We will analyze the data concerning leukemia The map leukemia contains information about the location of 281 polygons census tracts in the northern part of the state of New York Data for the map leukemia Column CASES the number of cases of leukemia in the years 1978 1982 as cribed to particular objects census tracts The value should be an integral num ber however in agreement with Waller s 1994 description some cases which could not be objectively ascribed to a particular region have been divided propor tionately Hence the numerousnesses of the cases ascribed to the 281 objects are not integral numbers Column POP population size in particular objects Column prev the frequency coefficient of leukemia per 100000 people for each object in one year prev CASES POP 100000 5 Copyright 2010 2014 PQStat Software All rights reserved 54 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION The global analysis has not yielded an unambiguous answer as to the occurrence of spatial autocorrelation We will then check if we can find regions in which the prevalence of leukemia is significantly higher In order to localize leukemia clusters we will compute t
44. ithmetic mean number of the neighbors Map layers The combination of information from Moran s scatter plot the division of objects into High High Low Low Low High High Low and from the significance of the local Moran s statistics presents on a map the so called spatial regimes e Statistically significant High High objects objects with high values surrounded by objects with high values are marked in red on the map e Statistically significant Low Low objects objects with low values surrounded by objects with low values are marked in blue on the map e Statistically significant Low High objects objects with low values surrounded by objects with high values are marked in light blue on the map e Statistically significant High Low objects objects with high values surrounded by objects with low values are marked in light red on the map The window with the settings of the local Moran s analysis option is accessed via the menu Spatial analysis Spatial statistics Local Moran s statistic Copyright 2010 2014 PQStat Software All rights reserved 47 Local Moran s I statistic Statistical analysis Local Moran s statistic Ss Spatial Weights Matix Test optons Queen row standardization Vee Teco tor locally weighted aver l nano event EXAMPLE 5 1 cont catalog leukemia file leukemia We will analyze data about leukemia The map leukemia conta
45. ivation of objects e Selecting objects on the basis of data sheet for example one can indicate as active only those shops which are groceries with an area not larger than 1000m2 In such a case the setting of appropriate conditions for selecting objects takes place in the window of Activa tion Deactivation available after selecting the Edit Activate Deactivate filter menu A detailed description of the manners of selection of that type can be found in the User Manual PQStat Chapter How to Reduce Data Sheet Workspace e Selecting objects on the basis of a map for example one could only distinguish those shops which are within a rectangular or elliptical area marked on a map We select the area with the use of the selection area tools see Chapter 1 3 2 and later activate or deacti vate in the window Activate Deactivate in the selection available after selecting the Tools Activate Deactivate in the selection menu in the window of the Map manager In order to activate all objects one should select the Tools Activate all menu in the window of the Map manager or the Edit Activate all menu in the window of PQStat Copyright 2010 2014 PQStat Software All rights reserved 11 1 SPATIAL ANALYSIS 1 5 GEOMETRIC CALCULATIONS Geometric calculations are formulas read the User Manual POStat Chapter Formulas The formulas can pertain data which describe map geometry and data visible in a datasheet e Formul
46. l standard deviation median quartiles minimum and maximum The analysis also gives a graph pertaining to a distance matrix and layers which can be drawn on the surface of a map Layers pertain to centrographic measures the measure of central tendency and the measure of dispersion e The center of point distribution the mean of coordinates of the X axis and the Y axis z e The area of standard deviations built around the center defined by Circle The radius of the circle is sdd standard distance from the center standard distance devi ation expressed with the formula Yan PR dd n 2 where x f a Y Yi y Copyright 2010 2014 PQStat Software All rights reserved 19 3 DESCRIPTIVE STATISTICS SAI Ellipse The angle of the inclination of an ellipse axis Y with respect to the coordinate system OY axis is expressed with the formula O arctan ae n t The lengths of the semiaxes of an ellipse n gt gt vi cos 0 y sind 1 1 n gt a sind y cos 6 1 1 Rectangle The lengths of rectangle sides are a 2sd b 2sd where sd and sd are standard deviations for the coordinates of the X and Y axes After the weights for particular objects have been defined we calculate the weighted center of point distribution and the weighted circle representing the standard deviation area e The weighted center of
47. leukemia and start Moran s analysis by selecting the menu Spatial analysis Spatial statistics Global Moran s statistic In the analysis window we select the variable Prev and the neighbor matrix Queen and select the option Add graph Moran s correlation coefficient obtained in the analysis is small and has the value 0 048577 Copyright 2010 2014 PQStat Software All rights reserved 39 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION SAI Analysis time Analysed variables Significance level Spatial weights matrix Queen Immediate neighbor Number of objects 261 0 048577 0 003571 Under normality assumption Variance I 0 001395 zZ statistic 1 3962 p value 0 162654 Under randomness assumption Variance 1 0 001241 Z statistic 1 480333 p value 0 138754 When we test the significance of Moran s coefficient we study the randomness of the dis tribution of the frequency coefficient of leukemia in the studied region We check if similar shades on the map are located close to one another or not In other words we check if the odds of having leukemia in the studied population depends on geographic location or not The value p calculated with the assumption of randomness as in the case of the as sumption of normality is greater than the standard assumed significance level 0 05 which means that there is no evidence for autocorrelation Thus we assume that the distribution of the variable prev is a random distributi
48. mation from weights matrices must be used gt 5 wij xi B xj T In this way non neighboring objects obtain the weight value O for which reason the values of those objects are not added Further operations which change the formula obtained in this man ner are made with the view to making the obtained coefficient Z independent from the number of analyzed objects and to standardizing it so that its values are limited to the interval lt 1 1 gt As a result Moran s autocorrelation coefficient is expressed with the formula Daio Dj Wij Bi E 7 7 Soo where n the number of spatial objects the number of points or polygons i zj are the values of the variable for the compared objects x it is the mean value of the variable for all objects wij elements of the spatial weights matrix weights matrix row standardized So doin jai Wij 2 2 DBO n o variance Copyright 2010 2014 PQStat Software All rights reserved 34 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D stand X standardized value of the studied variable L X spatial lag of the studied variable Figure 1 Moran s diagram Moran s linear autocorrelation coefficient studies the strength of the linear relationship be tween the standardized variable X stand x and the spatial lag of the variable X L x Spatial lag is the weighted mean from the standardized values of neigh
49. ntiguity e Immediate neighbors it is a square symmetrical matrix in which on the main diagonal there are zeros the elements outside the diagonal are Copyright 2010 2014 PQStat Software All rights reserved 14 1 SPATIAL ANALYSIS wij 1 ifthe objects are connected along a common border wij 0 inthe opposite case e Neighbors order of contiguity lt k it is a square symmetrical matrix in which on the main diagonal there are zeros the elements outside the diagonal are wij 1 ifthe objects are direct neighbors they are connected along a common border Wij 2 if the objects are the second nearest neighbors the second degree of neighbor hood i e the so called neighbor s neighbor wij k ifthe objects are the kt neighbors k degree of neighborhood wij O neighborhood is farther than the kt degree e Neighbors order of contiguity k it is a square symmetrical matrix in which on the main diag onal there are zeros the elements outside the diagonal are wij 1 ifthe objects are the kt neighbors kt degree of neighborhood wij O inthe opposite case Weights matrices can be row standardized it is the recommendation of some statistical analyses based on those matrices 1 7 SPATIAL SMOOTHING The idea of spatial smoothing is obtaining a better more stable and less noisy value of the variable The most common methods of building such a variable are based on borrowing
50. on Moran s diagram confirms that assumption y MOL x 00 049 4 amp xy linear 3 2 E 1 4 i l Pa 4 2 0 2 4 6 E 10 12 Standard prev The existence of positive autocorrelation in which we are the most interested would result in the distribution of the points of the Moran s diagram in quarters and Ill Here however we see that the points are as frequent in quarters and Ill as in Il and IV Copyright 2010 2014 PQStat Software All rights reserved 40 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION 5 2 Global Geary s C statistic Similarly to Moran s analysis global Geary s statistic studies the degree of the intensity of a given fea ture in spatial objects Note It is not recommended to conduct Geary s analysis for objects without a neighborhood objects de scribed in a weight matrix only with the value 0 Such objects can be excluded from the analysis by deactivating them Chapter or the analysis can be made with the use of a different manner of defin ing neighborhood a different weight matrix Geary s autocorrelation coefficient introduced by Geary in 1954 6 It is one of the possible alternatives for the global Moran s statistic Similarly to Moran s analysis Geary s statistic studies the degree of intensity of a given x feature in spatial objects described with the use of a weight matrix with w elements This time instead of computing the sum of quotients
51. on of 281 polygons census tracts in the northern part of the state of New York The map is prepared in the set of flat rectangular coordinate system UTM 18N and is based on the data of the file BNA Boundary File available on the server CIESIN ftp ciesin columbia edu Data for the map leukemia Column CASES the number of cases of leukemia in the years 1978 1982 as cribed to particular objects census tracts The value should be an integral num ber however in agreement with Waller s 1994 description some cases which could not be objectively ascribed to a particular region have been divided propor tionately Hence the numerousnesses of the cases ascribed to the 281 objects are not integral numbers Column POP population size in particular objects Column prev the frequency coefficient of leukemia per 100000 people for each object in one year prev CASES POP 100000 5 Epidemiologically interesting are the regions in which the prevalence of leukemia is higher as their grouping could indicate the existence within their boundaries of environmental teratogens causing an increased frequency of occurrence of leukemia We start from presenting the geographic distribution of the frequency coefficient prev Copyright 2010 2014 PQStat Software All rights reserved 37 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D on the map For that purpose we draw a map in the Map Manager and edit the layer Eq choosing
52. ortions distortions of angles areas lengths the choice of a proper system depends on the aim for which the map is to be used Coordinate systems used in cartography are classified as e geographic coordinate systems they define geographic latitude and longitude e cartesian coordinate systems e polar coordinate systems For a map to be loaded correctly the program PQStat requires a vector map saved in a SHAPEFILE shp type of file and defined in a proper Cartesian coordinate system with line scale The program tries to automatically detect maps with geographic coordinates If while importing the map the program detects a geographic coordinate system it suggests converting the coordi nates into a UTM system Universal Transverse Mercator on the basis of the WGS 84 system of reference As conversion might be incorrect due to the use of many geographic coordinate sys tem and the lack of certainty with regard to the applied system it is recommended that properly prepared maps be used in a Cartesian coordinate system Copyright 2010 2014 PQStat Software All rights reserved 3 1 SPATIAL ANALYSIS oO 1 2 MAP OPENING A map with the attribute file assigned to the map can be loaded via e import of a shapefile SHP into the datasheet e loading the PQS PQx file which contains data from shapefiles SHP Import of a Shapefile SHP Import is made by choosing the menu option File Import data SHP SHX DBF ESRI Sh
53. rast to Global Moran s statistic it defines the local spacial autocorrelation i e defines the similarity of a spatial unit to its neighbors and studies the sta tistical significance of that dependence Local Moran s coefficient The local form of Moran s J coefficient for the 2 observation is defined with the formula xi T Dojo Wij j T I O where n the number of spatial objects the number of points or polygons i j are the values of the variable for the compared objects x it is the mean value of the variable for all objects wij elements of a spacial weight matrix it is recommended that the matrix is row standardized 2 Dalei Oo variance n l The interpretation of the local Moran s coefficient is analogous to its global counterpart however it largely depends on the selected weight matrix Most often non zero matrices are ascribed only to neighboring objects As a result the local coefficient only describes the similarity of objects in the zone of neighborhood Row standardization makes it easier to compare the values of coeffi cients obtained for various objects as the expected value for each coefficient is then the same High values of a coefficient point to the occurrence of clusters with similar values while low values of a coefficient point to the occurrence of the so called hot spots and values near the expected value E I point to the random distribution in space of th
54. rectangle of layer bounding Copyright 2010 2014 PQStat Software All rights reserved 31 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION Analysis time Analysed variables SHP_X SHP_ Significance level 0 05 Bounding Types Bounding Rectangle Number of points 326 Area 135714634 95 75652 Density 0 00002377 The nearest neighbor NN Mean Distance NN 103 116479003 Standard Dev Distance NN 57 129039778 Expected Mean Distance NN 102 55417152 Nearest Neighbor Index MNI 1 005483029 SE 2 960041 6 Z statistic 0 189390223 p value 0 849786986 Analysis time Analysed variables SHP_X SHP_ Significance level 0 05 Bounding Types Bounding Rectangle Number of points 171 13663254 256036 0 000006643 The nearest neighbor NN Mean Distance NN 282 270974905 Standard Dev Distance NN 50 395872027 Expected Mean Distance NN 168 140254422 Nearest Neighbor Index NNI 1 678782846 SE 7 990073817 zZ statistic 14 284063339 p value lt 0 000000001 Young poplars have greater density than old ones Their mean nearest neighbor distance is 103 12m whereas for old poplars the value is 282 27m Due to competition in the de velopment of the structure of forest stand the spatial pattern for old trees is more regular NNI 1 68 p lt 0 000001 than the one for young poplars NNI 1 01 p 0 8498 Copyright 2010 2014 PQStat Software All rights reserved 32 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION 5 SPAT
55. rouped in sets O 1 O O 1 An example of a multipoint in which each point is defined as belonging to one of 3 groups e Lines they are created by linking subsequent points in proper order lines can intersect e Polygons they are closed spaces restricted by means of external rings closed lines which do not intersect and which go through at least 3 points in an appropriate order Polygons can also contain internal rings constituting their internal boundaries External rings are de fined clockwise and the internal ones the other way round Copyright 2010 2014 PQStat Software All rights reserved 2 1 SPATIAL ANALYSIS cii 1 2 1 An example of a polygon which only has an external boundary with no internal rings 2 An example of a polygon which has both an external boundary and internal boundaries areas defined by the internal rings constitute a part of an external area i e they do not belong to the polygon Object attributes are entered into the base in the form of numbers e g area temperature texts e g names of objects Map projection is a mathematical method of mapping the surface of the Earth onto a map surface There is a number of methods for such mapping The mappings can be based on a spheroid or on the surface of a ball a sphere or on a part of either of them Each mapping forms the basis for defining an appropriate coordinate system Because each projection of a surface entails certain dist
56. s matrix or indicate the neighbor matrix according to contiguity Queen row standardized that is pro posed by the program Note It is not recommended to conduct Moran s analysis for objects without neighborhood objects de scribed in the weight matrix only with the O value Such objects can be excluded from the analysis by deactivating them or an analysis can be made with the use of a different manner of defining neigh borhood a different weight matrix Moran s coefficient introduced by Moran in 1948 8 In order to check if the selected objects are characterized by similar values of the variable one can use the multiplying rule which says that multiplying 2 positive numbers gives a positive result and multiplying 2 different numbers 1 positive and 1 negative gives a negative result With the use of this rule we calculate X gt x x Unfortunately as the results of that rule are only obtained when there are both positive and negative values the simple rule must be modified so as to ensure the presence of different signs The values of the variable will then be replaced in the earlier formula with the differences of the values of the variable and of its mean value In this way the objects with values smaller than the mean will be negative and those with values greater than the mean will be positive X gt x x x x Obviously the summation should concern neighboring objects which means that at this point infor
57. the same value Usually objects in which the studied phenomenon occurs are marked in black color on the map and the ones in which the phenomenon does not occur are marked in white color Clusters of objects of the same color the so called black black white white are looked for For a variable which describes the degree of intensity of a studied feature the analysis of positive auto correlation consists of searching for clusters with similar values Usually objects on the map are colored in accordance with the degree of intensity of the studied phenomenon from the lightest low values to the darkest high values Clusters of objects with a similar shade are looked for 5 1 Global Moran s statistic It is an analysis of the degree of intensity of a given feature in spatial objects Copyright 2010 2014 PQStat Software All rights reserved 33 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION We use two pieces of information for the construction of a coefficient which will allow to check if the neighboring objects form clusters with similar values of the variable 1 information about the values of a variable for particular objects x 2 information about which objects are neighbors weights matrix with elements w Note The objects neighborhood is defined by a spatial weight matrix In Moran s analysis window we can choose any matrix generated previously by using menu Spatial analysis Tools Spatial weight
58. tive hypothesis should be two sided i e we do not presume a particular direction H THERE IS a statistically significant dependence between the spatial distribution of chemist s shops in Wielkopolska we assume that their distribution in the given area is not random i e we presume the presence of 2 directions a distribution which is more regular than random distribution and a distribution which is more clustered than random distribution Copyright 2010 2014 PQStat Software All rights reserved 17 2 TESTING HYPOTHESES It may happen in very rare cases that we are certain that we know the direction in the alterna tive hypothesis We can then utilize a one sided alternative hypothesis Hypothesis Verification To check which of the hypotheses Ho or H1 is more probable we select a proper statistical test Test statistic of a chosen test calculated according to its formula is subjected to the theoretical distribution appropriate for that statistic l a a 2 a 2 N value of test statistic The program calculates the value of a test statistic and p value for that statistic that is the part of the area under the curve which corresponds to the value of the test statistic Value p allows to choose which hypothesis the null hypothesis or the alternative hypothesis is more probable The truth of the null hypothesis is always presumed and the proofs gathered in the data are to provide a sufficient number of argumen
59. to it form layers Layers are organized so that they only contain infor mation about objects of one type The use of layered organization enables easy modification of only selected objects The basic layer is the base layer containing a map We can add new elements to that layer by creating new layers Adding Layers to draw objects on subsequent layers choose the menu Feature Layers Add Layer e Layer Result of statistical analysis It is a layer created together with a report on the statistical spatial analysis It presents the result of the statistical analysis appended to the report The information about the existence of layers one can draw on the map can be found at the bottom of the report the button gt MAP lt lt The layer can also be added with the button g inthe Map Manager window As long as there are no reports on statistical spatial analysis the option window for analysis results is empty When there are such reports the option window contains a list of layers The names of the layers on the list consist of the name of the report from which a layer comes together with the date and hour the report was created and a description of the type of objects drawn there e Layer View of another map That is a layer which presents a map related to another datasheet the button B in the Map Manager window The map view can be a single layer or it can consist of a number of layers It cannot be edited directly The change of t
60. ts against that hypothesis fp lt a gt reject Ho and accept H ifp gt a gt _ there is no reason to reject Ho Usually significance level a 0 05 is chosen with the acceptance of the premise that in 5 of situations the null hypothesis will be rejected being a true one In special cases a different significance level e g 0 01 or 0 001 can be set Copyright 2010 2014 PQStat Software All rights reserved 18 3 DESCRIPTIVE STATISTICS 3 DESCRIPTIVE STATISTICS To conduct Descriptive Statistics on the basis of a Map data we should have at our disposal a point multipoint or polygonal file In the case of an analysis of a polygonal file calculations are based on centroids of polygons and in the case of a multipoint file they are based on centers of objects Boundaries of an area in which analysed points are enclosed can be defined depending on a particular need with the help of a convex hull the smallest rectangle a rectangle from from layer bounding or the smallest circle The studied area can also be defined only with the use of the size of its area The distance between the points is measured with the Euclidean metric The basic statistics made for point analysis e A the area of a studied region e n the size of a sample i e the number of points lying within the studied region e D 4 density e descriptive statistics of the distance matrix between points arithmetic mean with confidence interva
61. ts reserved 57 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION D By analyzing the smoothed variable textsfprev we strengthen the clusterization effect We obtain a similar result i e 3 clusters 15 census tracts in the analysis of the G coefficient and 9 tracts in the analysis of the G coefficient which are cluster centers Copyright 2010 2014 PQStat Software All rights reserved 58 Copyright 2010 2014 PQStat Software All rights reserved 59 PE k at we E Lo a M mi Ad TIA Copyright 2010 2014 PQStat Software All rights reserved 60 REFERENCES References 1 2 3 4 5 6 7 8 Led 9 10 11 12 13 14 Anselin L 1995 Local Indicators of Spatial Association LISA Geographical Analysis 27 2 93 115 Buliung R N Remmel T K 2008 Open source spatial analysis and activity travel behaviour re search capabilities of the aspace package Journal of Geographical Systems 10 191 216 Clark P J Evans F C 1954 Distance to nearest neighbour as a measure of spatial relationships in populations Ecology 35 445 453 Cliff A D Ord J K 1981 Spatial Processes Models and Applications Pion London Fisher R A 1936 The use of multiple measurements in taxonomic problems Annals of Eugenics 7 179 188 Geary R C 1954 The Contiguity Ratio and Statistical Mapping The Incorporated Statistician
62. unders of epidemiology The coordinates of points which constituted the basis for drawing the maps come from the original John Snow s map which was digitalized by Rusty Dodson from the US National Cen ter for Geogra phic Information Analysis http ncgia ucsb edu Publications Software cholera and later presented in meters The map deaths contains information about the location of 578 points deaths due to cholera in Soho a London district The map pumps contains information about the location of 13 points water pumps in Soho The map streets contains information about the location of lines streets in Soho After importing the above shapefiles SHP we can view and edit each of them in the Map manager To conduct an analysis we select the deaths map and perform the Spatial descriptive statis tics Because we will utilize the map coordinates as data for the analysis in the descrip tive statistics window we select the option Use points from map coordinates and as the bounding type we select the Convex Hull Copyright 2010 2014 PQStat Software All rights reserved 21 3 DESCRIPTIVE STATISTICS Analysis time Analysed variables SHP_X SHP_ Significance level Bounding Types 0 05 Convex Hull Number of points 578 Area 257531 649115 Density 0 002244 Descriptive statistics of the distance matrix Arithmetic mean 171 909909 95 CI for the group mean 171 465637 95 CI for the group mean 172 354181 Stan
63. wiats should by definition be uniform With the use of NNI we will check if that is the case The districts map contains information about locations of polygons Polish powiats The nearest neighbor analysis will be based on centroids representing powiats We can draw them add the centroid layer to the map of powiats with the use of the Map manager Copyright 2010 2014 PQStat Software All rights reserved 28 4 ANALYSIS OF THE RANDOMNESS OF POINT DISTRIBUTION The nearest neighbor analysis will be made with the use of information about the size of the area of Poland it is 311888000000m Apart from the nearest neighbor index we will also calculate the indices of subsequent neighbors up to 15 Analysis time Analysed variables SHP_ amp SHP_ Significance level 0 05 Spatial weights matrix Bounding Types points Area specified Number of points Area Density The nearest neighbor NN Mean Distance NN 19665 564925 Standard Dev Distance NN 9102 84769 Expected Mean Distance NN 14343 321467 Nearest Neighbor Index NNI 1 37127 SE 365 125171 zZ statistic 13 827306 p value lt 0 000001 After entering the size of the area in the analysis window a nearest neighbor index amount ing to 1 37127 was received Its statistical significance was p lt 0 000001 greater than value 1 The mean distance between the nearest neighboring centroids is 19668 564923m and the standard deviation is 9102 84769m We will receive

User Guide - PQStat Spatial Analysis

Contents

Download Pdf Manuals

Related Search

Related Contents