Home
        SaTScanJ User Guide
         Contents
1.     Additional Output Files    In addition to the standard results file that is automatically shown at the completion of the calculations  it  is possible to request four additional output files with different types of information     e Cluster Information  with each row containing summary information for each cluster    e Cluster Cases Information  with data set and ordinal category specific information for each  cluster  concerning observed and expected cases  their ratio and the relative risk  This file is  primarily used for the ordinal model or when there are multiple data sets  For other analyses this    file is redundant as it contains a subset of the information already in the Cluster Information file     e Location Information  with each row containing information about a particular location and its  cluster membership     e Risk Estimates for Each Location   e Simulated Log Likelihood Ratios  You must manually open all these files after the run is completed  They are provided in either ASCII or  dBase format so that they can be easily imported into spreadsheets  geographical information systems or    other database software     Related Topics  Output Tab  Results of Analysis  Cluster Information File  Location Information File   Risk Estimates for Each Location  Simulated Log Likelihood Ratios     SaTScan User Guide v7 0 42    Advanced Features    While most SaTScan analyses can be performed using the features on the three basic tabs for input   analysis and output 
2.    Miscellaneous       New Versions    To check whether there is a later version than the one you are currently using  simply click on the update    button 3 on the tool bar  If a newer version exists  you will be asked whether you want to automatically  download and install it  At any given time  it is also possible to download the latest version of the  SaTScan from the World Wide Web at    http   www satscan org         Related Topics  Download and Installation     Analysis History File    In the analysis history file  SaTScan automatically maintains a log of all the SaTScan analyses  conducted  Included in the log is an assigned analysis number together with information about the time  of the analysis  parameter settings  a very brief summary of the results  as well as the name of the  standard results file created     The analysis history is in a dBase file with the name AnalysisHistory dbf  located in the same directory  as the SaTScan executable  It can be opened and read using most database and spreadsheet software   including Excel  You can erase the file at any time  A new file will them be created the next time you run    SaTScan  starting the list of analyses from scratch     Related Topics  Running SaTScan  Results of Analysis     Random Number Generator    The choice of random number generator is critical for any software creating simulated data  SaTScan  uses a Lehmer random number generator with modulus 2    1   2147483647 and multiplier 48271   which is known
3.    Related Topics  Advanced Features  Analysis Tab  Temporal Window Tab  Maximum Spatial Cluster  Size  Include Purely Temporal Clusters     Maximum Spatial Cluster Size    The program will scan for clusters of geographic size between zero and some upper limit defined by the  user  The upper limit can be specified either as a percent of the population used in the analysis  as a  percent of some other population defined in a max circle size file  or in terms of geographical size using  the circle radius  The maximum can also be defined using a combination of these three criteria     The recommended choice is to specify the upper limit as a percent of the population at risk  and to use  50  as the value  It is possible to specify a maximum that is less than 50   but not more than 50   A  cluster of larger size would indicate areas of exceptionally low rates outside the circle rather than an area  of exceptionally high rate within the circle  or vice versa when looking for clusters of low rates   When    SaTScan User Guide v7 0 46    in doubt  choose a high percentage  since SaTScan will then look for clusters of both small and large  sizes without any pre selection bias in terms of the cluster size  When calculating the percentage   SaTScan uses the population defined by the cases and controls for the Bernoulli model  the covariate  adjusted population at risk from the population file for the Poisson model  the cases for the space time  permutation  ordinal  exponential and nor
4.    Sometimes it is interesting to simultaneously search for and evaluate clusters in more than one data set   For example  one may be interested in spatial clusters with excess incidence of leukemia only  of  lymphoma only or of both simultaneously  As another example  one may be interested in detecting a  gastrointestinal disease outbreak that effect children only  adults only or both simultaneously  If SaTScan  is used to analyze one single combined data set  one may miss a cluster that is only present in one of the  subgroups  On the other hand  if two SaTScan analyses are performed  one for each data set  there is a  loss of power if the true cluster is about equally strong in both data sets  A SaTScan analysis with  multiple data sets and the multivariate scan option solves this problem     The multivariate scan statistic with multiple data sets works as follows   when searching for clusters with  high rates      1  For each window location and size  the log likelihood ratio is calculated for each data set     2  The log likelihood ratios for the data sets with more than expected number of cases is summed  up  and this sum is the likelihood for that particular window     3  The maximum of all the summed log likelihood ratios  taken over all the window locations and  sizes  constitutes the most likely cluster  and this is evaluated in the same way as for a single data  set     When searching for clusters with low rates  the same procedure is performed  except that we inst
5.   McGrath G  Martin SW  Bovine tuberculosis in badgers in  four areas in Ireland  does tuberculosis cluster  Preventive Veterinary Medicine  59 103 111  2003     Joly DO  Ribic CA  Langenberg JA  Beheler K  Batha CA  Dhuey BJ  Rolley RE  Bartelt G  Van  Deelen TR  Samual MD  Chronic wasting disease in free ranging Wisconsin white tailed deer   Emerging Infectious Disease  9  599 601  2003   online     Miller MA  Grigg ME  Kreuder C  James ER  Melli AC  Crosbie PR  Jessup DA  Boothroyd JC   Brownstein D  Conrad PA  An unusual genotype of Toxoplasma gondii is common in California sea  otters  Enhydra lutris nereis  and is a cause of mortality  International Journal for Parasitology   34 275 284  2004     Olea Popelka FJ  Flynn O  Costello E  McGrath G  Collins JD  O    Keeffe JO  Kelton DF  Berke O   Martin SW  Spatial relationship between Mycobacterium bovis strains in cattle and badgers in four  areas in Ireland  Preventive Veterinary Medicine  71 57 70  2005     Demography    134     Collado Chaves A  Fecundidad adolescente en el gran area metropolitana de Costa Rica  Poblacion  y Salud en Mesoam  rica  1 4  2003   online     Forestry    135     136     Coulston JW  Riitters KH  Geographic Analysis of Forest Health Indicators Using Spatial Scan  Statistics  Environmental Management  31  764 773  2003     Riitters KH  Coulston JW  Hot spots of perforated forest in the eastern United States   Environmental Management  35 483 492  2005     SaTScan User Guide v7 0 89    Toxicolo
6.   Tab    pete tpe et ehe rd ettet eet dedica 45  Spatial Window Tab        eerte iR cus e met edu EENE re erbe estan 46  Temporal Window Tab    oen eue eem ee ee 48  Spatial and Temporal Adjustments Tab                      sse nee enne enne 50  Inference T  b  x  5 eoe e t e UR Os reatu bed Rees 22  Clusters Reported Tab  teen Ie eee eres 54   FOIDUHTANDE INICIO                                 57  Specifying Analysis and Data Options                 sese neen rennen enne tene tenete 57       SaTScan User Guide v7 0    Launching the Analysis          tone inerte eter Pre peo te ten pee edipi 57    St  t  s Messages C                                   S EES 58  Warnings and EfFors    eU et p SE EA EERE eE pU i Ue E REE 58  Saving Analysis Parameters  nri iras i E E A nenne tne tren nennen nennen eterne rennen nennen 59  Parallel ProcessOrs ssi  niter trt eerte Rr EE UR TH SEX leere t E REEF reet 60  Batch  Mode    stereo beret m aut a penton S Rte E store unto e  cR Pole OE on oie 60  Computing Time    o eee teen eI Rte tei Ie rie io ae ats 61  Memory Requirements    epe Yero ese ee tette pee etre oreet need 62  PRTIHIEEOSUEIEICETESTETI OTT LS sess sanacdesas seas scbatessandesponssedsandswassesesssecnseasSonsoussen 65  Standard Results File    out            cccsscccssssececsssceceeseececssaeecessseeeceeaeeecsecaeeessueeecsesaeeesseeaeeseseeeenaas 65  Cluster Information File    col           ccccccccssssscecssccececssececsesececsecceceseeecsesaececsesaeeesueeecsenaeeeesenaee
7.   population in some areas grows faster than in others  This is typically not a problem if the total study  period is less than a year  However  the user is advised to be very careful when using this method for data  spanning several years  If the background population increases or decreases faster in some areas than in  others  there is risk for population shift bias  which may produce biased p values when the study period is    SaTScan User Guide v7 0 10    longer than a few years  For example  if a new large neighborhood is developed  there will be an increase  in cases there simply because the population increases  and using only case data  the space time  permutation model cannot distinguish an increase due to a local population increase versus an increase in  the disease risk  As with all space time interaction methods  this is mainly a concern when the study  period is longer than a few years      If the population increase  or decrease  is the same across the  study region  that is okay  and will not lead to biased results     Related Topics  Likelihood Ratio Test  Analysis Tab  Probability Model Comparison  Methodological  Papers     Ordinal Model    With the ordinal model  each observation is a case  and each case belongs to one of several ordinal  categories  If there are only two categories  the ordinal model is identical to the Bernoulli model  where  one category represents the cases and the other category represent the controls in the Bernoulli model   The case
8.   this option allows one to keep the original maximum on the circle  size for the statistical inference  and at the same time limit the size of the clusters reported  This will  typically result in SaTScan reporting many clusters that were not reported from the original analysis  due  to the fact that they shared the centroid of a more likely cluster     SaTScan User Guide v7 0 55    The unit by which to define the maximum size of the reported cluster is the same as the unit used to  define the maximum cluster size for inference purposes  as defined on the Spatial Window Tab     Related Topics  Advanced Features  Inference Tab  Results of Analysis  Criteria for Reporting  Secondary Clusters  Maximum Spatial Cluster Size  Log Likelihood Ratio  Standard Results File     SaTScan User Guide v7 0 56    Running SaTScan       Specifying Analysis and Data Options    The SaTScan program requires that you specify parameters defining input  analysis and output options  for the analysis you wish to conduct  A tabbed dialog is provided for this purpose  To access the  parameter tab dialog  either press the    button or select the File New menu item  Specify the parameters  for your session on the following tabs     e Input Tab   e Analysis Tab   e Output Tab  See the section on Basic SaTScan Features for instructions on how to fill in these tabs   Most analyses can be performed using only these three tabs  For each tab  there are additional features  that can be selected by first clicking
9.  0 29    Grid File    The optional grid file defines the centroids of the circles used by the scan statistic  If no grid file is  specified  the coordinates given in the coordinates file are used for this purpose  Each line in the file  represents one circle centroid  There should be at least two variables representing Cartesian  standard   X y coordinates or exactly two variables representing latitude and longitude  The choice between  Cartesian and latitude longitude must coincide with the coordinates file  as must the number of  dimensions     Related Topics   nput Tab  Grid File Name  Coordinates  Cartesian Coordinates  Latitude and  Longitude  Coordinates File  SaTScan Import Wizard  SaTScan ASCII File Format  Computing Time     Neighbors File    This is an optional file  It cannot be defined using the SaTScan Import Wizard  but has to be specified  using the ASCII file format  With this option  the coordinates and grid files are not needed  and ignored if  provided     With the standard parameter settings  SaTScan uses the coordinates file to determine which locations are  closest to the center of each circle constructed  This is done using Euclidean distances  In essence  for  each centroid SaTScan finds the closest neighbor  the second closest  and so on  until it reaches the  maximum window size  With the neighbors file  it is possible for the user to specify these neighbor  relations in any way without being constrained to Euclidean distances  For example  the neigh
10.  AR  Rufenacht J  Zurbriggen A  Heim D  Geographical clustering of cases of  bovine spongiform encephalopathy  BSE  born in Switzerland after the feed ban  Veterinary  Record  151  467 472  2002     Perez AM  Ward MP  Torres P  Ritacco V  Use of spatial statistics and monitoring data to identify  clustering of bovine tuberculosis in Argentina  Preventive Veterinary Medicine  56  63 74  2002     Schwermer H  Rufenacht J  Doherr MG  Heim D  Geographic distribution of BSE in Switzerland   Schweizer Archiv fur Tierheilkunde  144 701 708  2002     Ward MP  Clustering of reported cases of leptospirosis among dogs in the United States and  Canada  Preventive Veterinary Medicine  56 215 226  2002     Falconi F  Ochs H  Deplazes P  Serological cross sectional survey of psoroptic sheep scab in  Switzerland  Veterinary Parasitology  109 119 127  2002     Knuesel R  Segner H  Wahli T  A survey of viral diseases in farmed and feral salmonids in  Switzerland  Journal of Fish Diseases  26 167 182  2003     Berke O  Grosse Beilage E  Spatial relative risk mapping of pseudorabies seropositive pig herds in  an animal dense region  Journal of Veterinary Medicine  B50  322   325  2003     Abrial D  Calavas D  Lauvergne N  Morignat E  Ducrot C  Descriptive spatial analysis of BSE in  western France  Veterinary Research  34 749 60  2003     Sheridan HA  McGrath G  White P  Fallon R  Shoukri MM  Martin SW  A temporal spatial analysis  of bovine spongiform encephalopathy in Irish cattle herds  from 
11.  Bernoulli model      there are cases and non cases represented by a 0 1 variable  These variables  may represent people with or without a disease  or people with different types of disease such as early  and late stage breast cancer  They may reflect cases and controls from a larger population  or they may  together constitute the population as a whole  Whatever the situation may be  these variables will be  denoted as cases and controls throughout the user guide  and their total will be denoted as the population   Bernoulli data can be analyzed with the purely temporal  the purely spatial or the space time scan  statistics     Example  For the Bernoulli model  cases may be newborns with a certain birth defect while controls are  all newborns without that birth defect     The Bernoulli model requires information about the location of a set of cases and controls  Separate  locations may be specified for each case and each control  or the data may be aggregated for states   provinces  counties  parishes  census tracts  postal code areas  school districts  households  etc  with  multiple cases and controls at each data location  To do a temporal or space time analysis  it is necessary  to have a time for each case and each control as well     Related Topics  Likelihood Ratio Test  Analysis Tab  Probability Model Comparison  Methodological  Papers     SaTScan User Guide v7 0 9    Poisson Model    With the Poisson model     the number of cases in each location is Poisson distribu
12.  Brown BW  Holly EA  A test to detect clusters of disease  Biometrika   114 631 635  1987     SaTScan User Guide v7 0 92    
13.  Coordinates File  Grid File     Coordinates    Specify the type of coordinates used by the coordinates file and the grid file  as either Cartesian or  latitude longitude  Cartesian is the mathematical name for the regular x y coordinate system taught in  high school     Related Topics  Cartesian Coordinates  Latitude Longitude  Coordinates File  Grid File     SaTScan User Guide v7 0 37    Analysis Tab    Input Analysis   Output      r Type of Analysis    M Probability Model  gt  Scan for Areas with  1  Retrospective Analyses    Poisson    High Rates  C Purely Spatial r  Louise   C Bernoulli     Purely Temporal C High or Low Rates    Space Time Permutation     Space Time TUM     Time Aggregation    C Ordinal pene  Prospective Analyses  Units      Year    C Mo   C Purely Temporal  C Exponential I  Space Time     Normal Length  E Years    Monte Carlo Replications  0  9  999  or value ending in 999    399    Advanced  gt  gt             Analysis Tab Dialog Box    The Analysis Tab is used to set various analysis options  Additional features are available by clicking on  the Advanced button in the lower right corner     Related Topics  Basic SaTScan Features  Statistical Methodology  Spatial Window Tab  Temporal  Window Tab  Spatial and Temporal Adjustments Tab  Inference Tab     Type of Analysis    SaTScan may be used for a purely spatial  purely temporal or space time analyses  A purely spatial  analysis ignores the time of cases  even when such data are provided  A purely tempo
14.  Data    For temporal and space time data  there is an additional difference among the probability models  in the  way that the temporal data is handled  With the Poisson model  population data may be specified at one  or several time points  such as census years  The population is then assumed to exist between such time  points as well  estimated through linear interpolation between census years  With the Bernoulli  space   time permutation  ordinal  exponential and normal models  a time needs to be specified for each case and  for the Bernoulli model  for each control as well     Related Topics  Bernoulli Model  Poisson Model  Space Time Permutation Model  Likelihood Ratio  Test  Methodological Papers     Spatial  Temporal and Space Time Scan Statistics    Spatial Scan Statistic    The standard purely spatial scan statistic imposes a circular window on the map  The window is in turn  centered on each of several possible grid points positioned throughout the study region  For each grid  point  the radius of the window varies continuously in size from zero to some upper limit specified by the  user  In this way  the circular window is flexible both in location and size  In total  the method creates an  infinite number of distinct geographical circles with different sets of neighboring data locations within  them  Each circle is a possible candidate cluster     The user defines the set of grid points used through a grid file  If no grid file is specified  the grid points  are 
15.  SaTScan User Guide v7 0 43    Data sets are added by first clicking on the    Add    button  and then entering the file names by either   typing it in the text box  by using the browser button E or through the SaTScan Import Wizard  Import   8H We s   File 2  button  Remove a data set by selecting it and clicking on the    Remove    button    Multiple data sets can be used for two different purposes  One purpose is when there are different types   of data  and we want to know if there is a cluster in either one or more of the data sets  The evidence for a   cluster could then come exclusively from one data set or it may use the combined evidence from two or    more data sets  The other purpose is to adjust for covariates  In this case the evidence of a cluster is based  on all data sets  The difference is discussed in more detail in the statistical methodology section     Warning  The computing time is considerably longer when analyzing multiple data sets as compared to  a single data set  Hence  it is not recommended to use multiple data sets when there are many locations in    the coordinates file     Related Topics  Advanced Features  Input Tab  Multivariate Scan with Multiple Data Sets  Covariate  Adjustments Using Multiple Data Sets  Computing Time  Case File  Control File  Population File     Data Checking Tab    Advanced Input Features    Multiple Data Sets Data Checking   Non Eucledian Neighbors      Study Period Check       Check to ensure that cases and controls ar
16.  SaTScan can evaluate more different cluster locations and sizes without restrictions  imposed by administrative geographical boundaries  minimizing assumptions about the  geographical cluster location and size     4  If my data is sparse  won t the rates be statistically unstable     The stability of rates does not depend on the geographical resolution of the input data  but on the  population size of the circles constructed by SaTScan     5  Whatis the minimum number of spatial locations needed to run SaTScan     The purely temporal scan statistic can be run with only one geographical location  The space   time scan statistic needs at least two locations  With only two locations  the space time scan  statistic will look for temporal clusters in either or both of the locations  Technically  the purely  spatial scan statistic can also be run using only two geographical locations  providing correct  inference  There is no point using a purely spatial scan statistic for such data though  for which a  regular chi square statistic can be used instead  as there is no multiple testing to adjust for  With  three locations or more  the fundamental scan statistic concept of including different  combinations of locations into the potential clusters is being utilized  In most practical  applications though  the spatial and space time scan statistics are used for data sets with    SaTScan User Guide v7 0 73    hundreds or thousands of geographical locations  If there is a choice  less sp
17.  a  percent  then for the Bernoulli and Poisson models  it can be at most 90 percent  and for the space time  permutation model  at most 50 percent  The recommended value is 50 percent    Related Topics  Temporal Window Tab  Maximum Spatial Cluster Size  Include Purely Spatial Clusters   Flexible Temporal Window Definition  Time Aggregation     Include Purely Spatial Clusters    In addition to the maximum temporal cluster size  it is also possible to allow clusters to contain the whole  time period under study  In this way  purely spatial clusters are included among the evaluated windows   The purpose of specifying a maximum temporal size  but still including purely spatial clusters  is to  eliminate clusters containing the whole study period except a small time period at the very beginning or  at the very end of the study period     Note  When adjusting for purely spatial clusters using stratified randomization  all purely spatial clusters  are adjusted away  and this parameter has no effect on the analysis     Related Topics  Temporal Window Tab  Maximum Temporal Cluster Size  Include Purely Temporal  Clusters  Spatial Adjustment     Flexible Temporal Window Definition    For retrospective analyses  SaTScan will evaluate all temporal windows less than the specified  maximum  and for prospective analyses the same is true with the added restriction that the end of the  window is identical to the study period end date  When needed  SaTScan can be more flexible than that   and 
18.  adjusted for by using multiple data sets  which limits the number of  covariate categories that can be defined  or through a pre processing regression analysis done before  running SaTScan     All probability models can be used for either individual locations or aggregated data     With the Poisson model  population data is only needed at selected time points and the numbers are  interpolated in between  A population time must be specified even for purely spatial analyses  Regardless  of model used  the time of a case or control need only be specified for purely temporal and space time  analyses     The space time permutation model automatically adjusts for purely spatial and purely temporal clusters   For the Poisson model  purely temporal and purely spatial clusters can be adjusted for in a number of  different ways  For the Bernoulli  ordinal  exponential and normal models  spatial and temporal  adjustments can be done using multiple data sets  but it is limited by the number of different data sets  allowed  and it is also much more computer intensive     Few Cases Compared to Controls    In a purely spatial analysis where there are few cases compared to controls  say less than 10 percent  the  Poisson model is a very good approximation to the Bernoulli model  The former can then be used also for  0 1 Bernoulli type data  and may be preferable as it has more options for various types of adjustments   including the ability to adjust for covariates specified in the case and
19.  and Newell   Cuzick and Edwards  Diggle  and ChetwyndP    Grimson 5  Moran      Ranta     Tango   Walter     and Whittemore et al       These  methods test for clustering throughout the study region without the ability to pinpoint the location of  specific clusters  As such  these tests and the spatial scan statistic complement each other  since they are  useful for different purposes     Global Space Time Interaction Tests    Knox   Mantel      Diggle et al        Jacquez    Baker       and Kulldorff and Hjalmars     have proposed  different tests for space time interaction  Like the space time permutation  version of the space time scan  statistic  these methods are designed to evaluate whether cases that are close in space are also close in  time and vice versa  adjusting for any purely spatial or purely temporal clustering  Being global in nature   these other tests are useful when testing to see if there is clustering throughout the study region and time  period  and the preferred method when for example trying to determine whether a disease is infectious   Unlike the space time permutation based scan statistic though  they are unable to detect the location and  size of clusters and to test the significance of those clusters     Related Topics  Likelihood Ratio Test  SaTScan Methodology Papers    SaTScan User Guide v7 0 25    Input Data       Data Requirements    Required Files  The input data should be provided in at least two different files  a case file and a  coordi
20.  and Suggestions    Feedback from users is greatly appreciated  Very valuable suggestions concerning the SaTScan software  have been received from many individuals  including     Allyson Abrams  Harvard Medical School  amp  Harvard Pilgrim Health Care  Frank Boscoe  New York State Health Department   Eric Feuer  National Cancer Institute   Laurence Freedman  National Cancer Institute   David Gregorio  University of Connecticut   Goran Gustafsson  Karolinska Institute  Sweden   Jessica Hartman  New York Academy of Medicine   Richard Heffernan  New York City Department of Health   Kevin Henry  New Jersey Department of Health   Ulf Hjalmars  Ostersund Hospital  Sweden   Richard Hoskins  Washington State Department of Health   Lan Huang  National Cancer Institute   Ahmedin Jemal  American Cancer Society   Inkyung Jung  Harvard Medical School  amp  Harvard Pilgrim Health Care  Ann Klassen  Johns Hopkins University   Ken Kleinman  Harvard Medical School  amp  Harvard Pilgrim Health Care  Kristina Metzger  New York City Department of Health   Barry Miller  National Cancer Institute   Farzad Mostashari  New York City Department of Health   Karen Olson  Children   s Hospital  Boston    SaTScan User Guide v7 0 71    Linda Pickle  National Cancer Institute   Tom Richards  Centers for Disease Control and Prevention  Gerhard Rushton  University of Iowa   Joeseph Sheehan  University of Connecticut   Tom Talbot  New York State Health Department   Toshiro Tango  National Institute of Public 
21.  between them     Related Topics  Covariate Adjustment Using the Input Files  Covariate Adjustment using Statistical  Regression Software  Covariate Adjustment Using Multiple Data Sets  Methodological Papers     Covariate Adjustment Using the Input Files    With the Poisson and space time permutation models  it is possible to adjust for multiple categorical  covariates by specifying the covariates in the input files  To do so  simply enter the covariates as extra  columns in the case file  all models  and the population file  Poisson model   There is no need to enter  any information on any of the window tabs     For the Poisson model  the expected number of cases in each area under the null hypothesis is calculated  using indirect standardization  Without covariate adjustment the expected number of cases in a location  is  spatial analysis      E c    p C P    where c is the observed number of cases and p the population in the location of interest  while C and P  are the total number of cases and population respectively  Let c   pj  C  and P  be defined in the same  way  but for covariate category i  The indirectly standardized covariate adjusted expected number of  cases  spatial analysis  is     Ele    M Elci    Xipi  Ci Pi    The same principle is used when calculating the covariate adjusted number of cases for the space time  scan statistic  although the formula is more complex due to the added time dimension     SaTScan User Guide v7 0 18    Since the space time permuta
22.  cluster detection was proposed by Rushton and Lolonis 66  who used  p value contour maps to depict the clusters rather than overlapping circles  As with GAM  it does not  adjust for the multiple testing inherent in the many potential cluster locations evaluated     Cluster Detection Tests    The spatial scan statistic is a cluster detection test  A cluster detection test is able to both detect the  location of clusters and evaluate their statistical significance without problems with multiple testing  In  1990  Turnbull et al      proposed the first such test using overlapping circles with fixed population size   assigning the circle with the most cases as the detected cluster     SaTScan User Guide v7 0 24    The spatial scan statistic was in part inspired by the work of Openshaw et al      and Turnbull et al         By  applying a likelihood ratio test  it was possible to evaluate clusters of different sizes  as Openshaw et al   did  while at the same time adjusting for the multiple testing  as Turnbull et al  did      In a power comparison     it was shown that Turnbull s method has higher power if the true cluster size is  within about 20 percent of what is specified by that method  while the spatial scan statistic has higher  power otherwise  Note that the cluster size in Turnbull s method must be specified before looking at the  data  or the procedure is invalid     Focused Cluster Tests    Focused tests should be used when there is a priori knowledge about the locatio
23.  clustering   Statistical Methods in Medical Research  4 124 136  1995     153  Glaz J  Balakrishnan N  editors   Scan Statistics and Applications  Birkhauser  Boston  1999   154  Glaz J  Naus JI  Wallenstein S  Scan Statistics  Springer Verlag  New York  2001     155  Grimson RC  A versatile test for clustering and a proximity analysis of neurons  Methods of  Information in Medicine  30 299 303  1991     156  Jacquez GM  A k nearest neighbor test for space time interaction  Statistics in Medicine  15 1935   1949  1996     157  Knox G  The detection of space time interactions  Applied Statistics  13 25 29  1964     158  Kulldorff M  Statistical Methods for Spatial Epidemiology  Tests for Randomness  in GIS and  Health in Europe  L  yt  nen M and Gatrell A  eds   London  Taylor  amp  Francis  1998     159  Kulldorff M  Hjalmars U  The Knox method and other tests for space time interaction  Biometrics   9 621 630  1999     160  Lawson AB  On the analysis of mortality events associated with a pre specified fixed point  Journal  of the Royal Statistical Society  Series A  156 363 377  1993     161  Mantel N  The detection of disease clustering and a generalized regression approach  Cancer  Research  27 209 220  1967     162  Moran PAP  Notes on continuous stochastic phenomena  Biometrika  37 17 23  1950     163  Naus J  The distribution of the size of maximum cluster of points on the line  Journal of the  American Statistical Association  60 532 538  1965     164  Openshaw S  Cha
24.  compared to the baseline  Setting a value of one is equivalent of not  doing any adjustments  A value of greater than one is used to adjust for an increased risk and a value of  less than one to adjust for lower risk  A relative risk of zero is used to adjust for missing data for that  particular time and location     start time  Optional  The start of the time period to be adjusted using this relative risk   end time  Optional  The end of the time period to be adjusted using this relative risk     If no start and end times are given  the whole study period will be adjusted for that location  If    All    is  selected instead of a location ID  but no start or end times are given  that has the same effect as when no  adjustments are done     The name of the adjustments file is specified on the Analysis Tab  gt  Advanced Features  gt  Risk  Adjustments     Note  Assigning a relative risk of x to half the locations is equivalent to assigning a relative risk of 1 x to  the other half  Assigning the same relative risk to all locations and time periods has the same effect as not  adjusting at all     Note  It is permissible to adjust the same location and time periods multiple times  through different rows  with different relative risks  SaTScan will simply multiply the relative risks  For example  if you adjust  location A with a relative risk of 2 for all time periods  and you adjust 1990 with a relative risk of 2 for  all locations  then the 1990 entry for location A will be ad
25.  on the Advanced button in the lower right corner of the tab  These    additional features may be useful in special circumstances     The available choices for some features may depend on what was selected in other places  For example   if a purely spatial analysis is chosen  the space time permutation model is not available  and vice versa     Related Topics  Basic SaTScan Features  Input Tab  Analysis Tab  Output Tab  Advanced Features   Launching the Analysis     Launching the Analysis    Once the data input files have been created  and the parameters defining the input  analysis and output  options have been specified  select the Execute    button to launch the analysis and produce the results  file  A special job status window will appear containing status  warning and or error messages  Once the  analysis has been completed  the standard results file will appear in the job status window     Multiple parameter session windows may be opened simultaneously for data entry  and multiple analyses  may be run concurrently  If you are running multiple analyses concurrently  please verify that the output    files have different names     Related Topics  Input Data  Data Requirements  Specifying Analysis and Data Options  Status  Messages  Warnings and Errors  Computing Time  Batch Mode     SaTScan User Guide v7 0 57    Status Messages    Status messages are displayed as the program executes the analysis  as the data is read and at each step of  the analysis  Normal status messag
26.  population files  As an  approximation for Bernoulli type data  the Poisson model produces slightly conservative p values     Bernoulli versus Ordinal Model    The Bernoulli model is mathematically a special case of the ordinal model  when there are only two  categories  The Bernoulli model runs faster  making it the preferred model to use when there are only two  categories     Normal versus Exponential Model    Both the normal and exponential models are meant for continuous data  The exponential model is  primarily designed for survival time data but can be used for any data where all observations are positive   It is especially suitable for data with a heavy right tail  The normal model can be used for continuous data  that takes both positive and negative values  While still formally valid  results from the normal model are  sensitive to extreme outliers     SaTScan User Guide v7 0 13    Normal versus Ordinal Model    The normal model can be used for categorical data when there are very many categories  As such  it is  sometimes a computationally faster alternative to the ordinal model  There is an important difference  though  With the ordinal model  only the order of the observed values matters  For example  the results  are the same for ordered values    1     2     3     4    and    1     10     100     1000     With the normal model  the  results will be different  as they depend on the relative distance between the values used to define the  categories     Temporal
27.  sets     This option is not available for the Bernoulli  ordinal  exponential or space time permutation models  in  the latter case because the method automatically adjusts for any purely spatial clusters     Note  It is not possible to simultaneously adjust for spatial clusters and purely temporal clusters using  stratified randomization  and if both types of adjustments are desired  the space time permutation model  should be used instead     Related Topics  Spatial and Temporal Adjustments Tab  Poisson Model  Adjusting for Temporal Trends   Adjusting for Known Relative Risks    Sometimes it is known a priori that a particular location and or time has a higher or lower risk of known  magnitude  and we want to detect clusters above and beyond this  or in other words  we want to adjust for  this known excess lower risk  One way to do this is to simply change the population at risk numbers in  the population file  A simpler way is to use the adjustments file  In this file  a relative risk is specified for  any location and time period combination  The expected counts are then multiplied by this relative risk  for that location and time  For example  if it is known from historical data that a particular location  typically have 50 percent more cases during the summer months June to August  then for each year one  would specify a relative risk of 1 5 for this location and these months  A summer cluster will then only  appear in this location if the excess risk is more than 50 p
28.  south distance from the equator  and locations south of the equator should be entered as negative  numbers  Longitude represents the east west distance from the Prime Meridian  Greenwich  England    and locations west of the Prime Meridian should be entered as negative numbers  For example  the  National Institutes of Health in Bethesda  Maryland  which is located at 39 00 degrees north and 77 10  degrees west  should be reported as 39 00 and  77 10 respectively     Latitudes and longitudes can  for the purpose of this program  not be specified in degrees  minutes and  seconds  Such latitudes and longitudes can easily be converted into decimal numbers of degrees  DND   by the simple formula  DND   degrees   minutes 60   seconds 3600     If latitude longitude coordinates are used  the coordinates file should contain the following information     location id  Any numerical value or string of characters  Empty spaces may not form part of the id   latitude  Latitude in decimal number of degrees     longitude  Longitude in decimal number of degrees     Note  When coordinates are specified in latitudes and longitudes  SaTScan does not perform a projection  of these coordinates onto a planar space  Rather  SaTScan draws perfect circles on the surface of the  spherical earth     Related Topics  Input Tab  Coordinates File  Coordinates  Cartesian Coordinates  Latitude and  Longitude  Grid File  SaTScan Import Wizard  SaTScan ASCII File Format  Computing Time     SaTScan User Guide v7
29.  tests whether it is     17  If I am interested in whether there is spatial auto correlation in the data  why should I use  the spatial scan statistic rather than a traditional spatial auto correlation test     If you are only interested in whether there is spatial auto correlation or not  but don t care about  cluster locations  there are tests for spatial auto correlation   global clustering that have higher  power than the spatial scan statistic and should be used instead  The spatial scan statistic should  be used when you are interested in the detection and statistical significance of local clusters     18  In spatial statistics  is it not always important to adjust for spatial auto correlation  This  cannot be done in SaTScan     Whether to adjust for spatial auto correlation depends on the question being asked from the data   As an example  let s assume that we have geographical data on people who get sick due to food  poisoning  In such data there is clearly spatial auto correlation  since bad food sold at restaurants  or grocery stores are often sold to multiple customers  many of who will live in the same  neighborhood     If we are doing spatial regression trying to determine what neighborhood characteristics such as  mean income  house values  educational levels or ethnic origin contribute to a higher risk for  food poisoning  it is critical to adjust for the spatial auto correlation in the data  If not  the  confidence in the risk relationships will be overestima
30.  there is no need for any modification of  the input data     Poisson Model    To adjust the Poisson model for missing data  use the adjustments file to define the location and time  combinations for which the data is missing  and assign a relative risk of zero to those location time  combinations     Space Time Permutation Model    It is a little more complex to adjust for missing data in the space time permutation model  but still  possible   First add day of week as a covariate in the analysis file  When a particular location   time  period is missing  then for that location  remove all data for the days of the week for which any data is  missing  For example  if data from Thursday 10 23 and Friday 10 24 are missing for zip code area A and  data from Saturday 10 25 are missing from area B  remove data from all Thursdays and Fridays for area  A and data from all Saturdays from area B  while retaining all data from Saturdays through Wednesdays  for area A and all data except Saturdays from area B  For all other zip code areas  retain all data for all  days  Note that  in addition to adjusting for the missing data  this approach will also adjust for any day   of week by spatial interaction effects     The same approach can be used with other categorization of the data  as long as the categorizations is in  some time periodic unit that occur several times and is evenly spread out over the study period  For  example  it is okay to categorize into months if the study period span
31.  time     time  Optional  Time may be specified in years  months or days  All control times must fall within  the study period as specified on the Analysis tab  The format of the times must be the same as in the  case file     Note  Multiple lines may be used for different controls with the same location  time and attributes   SaTScan will automatically add them     Related Topics  Input Tab  Control File Name  Multiple Data Sets Tab  SaTScan Import Wizard   SaTScan ASCII File Format     SaTScan User Guide v7 0 27    Population File    The population file is used for the Poisson model  providing information about the background  population at risk  This may be actual population count from a census  or it could be for example  covariate adjusted expected counts from a statistical regression model  It should contain the following  information     location id  Any numerical value or string of characters  Empty spaces may not form part of the id     time  The time to which the population size refers  May be specified in years  months or days  If the  population time is unknown but identical for all population numbers  then a dummy year must be  given  the choice not affecting result     population  Population size for a particular location  year and covariate combination  If the  population size is zero for a particular location  year  and set of covariates  then it should be included  in the population file specified as zero  The population can be specified as a decimal number t
32.  to find  Communications of  the ACM  31 1192 1201  1988     SaTScan User Guide v7 0 80    Visualization and Mapping    22     Boscoe FP  McLaughlin C  Schymura MJ  Kielb CL  Visualization of the spatial scan statistic using  nested circles  Health and Place  9 273 277  2003     Methods Evaluations and Comparisons    23     24     235    26     27     28     29     30     31     32     Kulldorff M  Tango T  Park P  Power comparisons for disease clustering tests  Computational  Statistics and Data Analysis  42 665 684  2003     Song C  Kulldorff M  Power evaluation of disease clustering tests  International Journal of Health  Geographics  2 9  2003   online     Kulldorff M  Zhang Z  Hartman J  Heffernan R  Huang L  Mostashari F  Evaluating disease outbreak  detection methods  Benchmark data and power calculations  Morbidity and Mortality Weekly  Report  53 144 151  2004   online        Nordin J  Goodman M  Kulldorff M  Ritzwoller D  Abrams A  Kleinman K  Levitt MJ  Donahue J   Platt R  Using modeled anthrax attacks on the Mall of America to assess sensitivity of syndromic  surveillance  Emerging Infectious Diseases  11 1394 1398  2005   online        Ozdenerol E  Williams BL  Kang SY  Magsumbol MS  Comparison of spatial scan statistic and  spatial filtering in estimating low birth weight clusters  International Journal of Health Geographics   4 19  2005   online        Costa MA  Assun    o RM  A fair comparison between the spatial scan and Besag Newell disease  clustering test
33.  to perform well        Related Topics  Monte Carlo Replications     Contact Us    Please direct technical questions about installation and running the program  as well as the web site  to   techsupport   satscan org  Please direct substantive questions about the statistical methods and suggestions about new features to     Martin Kulldorff  Associate Professor  Biostatistician    SaTScan User Guide v7 0 70    Department of Ambulatory Care and Prevention   Harvard Medical School and Harvard Pilgrim Health Care  133 Brookline Avenue  6th Floor  Boston  MA 02215  USA  Email  kulldorff 9 satscan org    Acknowledgements    Financial Support    National Cancer Institute  Division of Cancer Prevention  Biometry Branch  SaTScan v1 0  2 0  2 1   National Cancer Institute  Division of Cancer Control and Population Sciences  Statistical Research  and Applications Branch  SaTScan v3 0  part   v6 1  part     Alfred P  Sloan Foundation  through a grant to the New York Academy of Medicine  Farzad  Mostashari  PI   SaTScan v3 0  part   3 1  4 0  5 0  5 1    Centers for Disease Control and Prevention  through Association of American Medical Colleges  Cooperative Agreement award number MM 0870  SaTScan v6 0  v6 1  part     National Institute of Child Health and Development  through grant  RO1HD048852  7 0     Their financial support is greatly appreciated  The contents of SaTScan are the responsibility of the  developer and do not necessarily reflect the official views of funders     Comments
34.  whether the case file and the control file  when applicable  contain information about the time of  each case  and control   and if so  whether the precision should be read as days  months or years  If the  time precision is specified to be days but the precision in the case or control file is in month or year  then  there will be an error  If the time precision is specified as years  but the case or control file includes some  dates specified in terms of the month or day  then the month or day will be ignored     For a purely spatial analysis  the case and control file need not contain any times  If they do  it has to be  specified that they do contain this information  but the information is ignored     Note  The choice defines only the precision for the times in the case and control files  The precision of  the times in the population file can be different     Related Topics   nput Tab  Case File  Control File  Study Period  Time Aggregation   Study Period    Specify the start and end date of the time period under study  This must be done even for a purely spatial  analysis in order to calculate the expected number of cases correctly  Allowable years are those between  1753 and 9999     All times in the case and control files must fall on or between the start and end date of the study period   Dates in the population file are allowed to be outside the start and end date of the study period     Start Date  The earliest date to be included in the study period   End Date  Th
35. 0     Hayran M  Analyzing factors associated with cancer occurrence  A geographical systems approach   Turkish Journal of Cancer  34 67 70  2004   online     Sheehan TJ  DeChello LM  A space time analysis of the proportion of late stage breast cancer in  Massachusetts  1988 to 1997  International Journal of Health Geographics  4 15  2005   online     Fukuda Y  Umezaki M  Nakamura K  Takano T  Variations in societal characteristics of spatial  disease clusters  examples of colon  lung and breast cancer in Japan  International Journal of Health  Geographics  4 16  2005   online     Ozonoff A  Webster T  Vieira V  Weinberg J  Ozonoff D  Aschengrau A  Cluster detection methods  applied to the Upper Cape Cod cancer data  Environmental Health  A Global Access Science  Source  4 19  2005   online     Klassen A  Curriero F  Kulldorff M  Alberg AJ  Platz EA  Neloms ST  Missing stage and grade in  Maryland prostate cancer surveillance data  1992 1997  American Journal of Preventive Medicine   30 877 87  2006   online     Pollack LA  Gotway CA  Bates JH  Parikh Patel A  Richards TB  Seeff LC  Hodges H  Kassim S   Use of the spatial scan statistic to identify geographic variations in late stage colorectal cancer in  California  United States   Cancer Causes and Control  17 449   457  2006     Cardiology    81     Kuehl KS  Loffredo CA  A cluster of hypoplastic left heart malformation in Baltimore  Maryland  Pediatric Cardiology  27 25 31  2006     Rheumatology   Auto Immune Diseases    8
36. 1996 to 2000  Canadian Journal of  Veterinary Research  69 19 25  2005   online     Guerin MT  Martin SW  Darlington GA  Rajic A  A temporal study of Salmonella serovars in  animals in Alberta between 1990 and 2001  Canadian Journal of Veterinary Research  69 88 89   2005   online     SaTScan User Guide v7 0 88    Veterinary Medicine  Wildlife    126     127     128     129     130     131     132     133     Smith KL  DeVos V  Bryden H  Price LB  Hugh Jones ME  Keim P  Bacillus anthracis diversity in  Kruger National Park  Journal of Clinical Microbiology  38 3780 3784  2000   online        Berke O  von Keyserlingk M  Broll S  Kreienbrock L  On the distribution of Echinococcus  multilocularis in red foxes in Lower Saxony  identification of a high risk area by spatial  epidemiological cluster analysis  Berliner und Munchener Tierarztliche Wochenschrift  115 428   434  2002     Miller MA  Gardner IA  Kreuder C  Paradies DM  Worcester KR  Jessup DA  Dodd E  Harris MD   Ames JA  Packham AE  Conrad PA  Coastal freshwater runoff is a risk factor for Toxoplasma  gondii infection of southern sea otters  Enhydra lutris nereis   International Journal for  Parasitology  32 997 1006  2002     Hoar BR  Chomel BB  Rolfe DL  Chang CC  Fritz CL  Sacks BN  Carpenter TE  Spatial analysis of  Yersinia pestis and Bartonella vinsonii subsp berkhoffii seroprevalence in California coyotes  Canis  latrans   Preventive Veterinary Medicine  56 299 311  2003     Olea Popelka FJ  Griffin JM  Collins JD
37. 2     83     84     85     Walsh SJ  Fenster JR  Geographical clustering of mortality from systemic sclerosis in the  Southeastern United States  1981 90  Journal of Rheumatology  24 2348 2352  1997     Walsh SJ  DeChello LM  Geographical variation in mortality from systemic lupus erythematosus in  the United States  Lupus  10 637 646  2001     L  pez Abente G  Morales Piga A  Bachiller Corral FJ  Illera Mart  n O  Mart  n Domenech R   Abraira V  Identification of possible areas of high prevalence of Paget s disease of bone in Spain   Clinical and Experimental Rheumatology  21 635 368  2003     Donnan PT  Parratt JDE  Wilson SV  Forbes RB  O Riordan JI  Swingler RJ  Multiple sclerosis in  Tayside  Scotland  detection of clusters using a spatial scan statistic  Multiple Sclerosis  11 403   408  2005     Neurological Diseases    86     Sabel CE  Boyle PJ  L  yt  nen M  Gatrell AC  Jokelainen M  Flowerdew R  Maasilta P  Spatial  clustering of amyotrophic lateral sclerosis in Finland at place of birth and place of death  American  Journal of Epidemiology  157  898 905  2003     SaTScan User Guide v7 0 85    Liver Diseases    87  Ala A  Stanca CM  Bu Ghanim M  Ahmado I  Branch AD  Schiano TD  Odin JA  Bach N  Increased  prevalence of primary biliary cirrhosis near superfund toxic waste sites  Hepatology  43 525 531   2006    Diabetes   88  Green C  Hoppa RD  Young TK  Blanchard JF  Geographic analysis of diabetes prevalence in an    urban area  Social Science and Medicine  57 551 
38. 560  2003     Pediatrics  see also cancer  cardiology     89     90     9      92     93     94     95     96     97     98     99     Kharrazi M  et al  Pregnancy outcomes around the B K K  landfill  West Covina  California  An  analysis by address  California Department of Health Services  1998     Sankoh OA  Ye Y  Sauerborn R  Muller O  Becher H  Clustering of childhood mortality in rural  Burkina Faso  International Journal of Epidemiology  30 485 492  2001   online     George M  Wiklund L  Aastrup M  Pousette J  Thunholm B  Saldeen T  Wernroth L  Zaren B   Holmberg L  Incidence and geographical distribution of sudden infant death syndrome in relation to  content of nitrate in drinking water and groundwater levels  European Journal of Clinical  Investigation  31  1083 1094  2001     Bell S  Spatial Analysis of Disease   Applications  In Beam C  ed   Biostatistical Applications in  Cancer Research  Boston  Kluwer p151 182  2002   online     Forand SP  Talbot TO  Druschel C  Cross PK  Data quality and the spatial analysis of disease rates   congenital malformations in New York State  Health and Place  8 191 199  2002     Colorado Department of Public Health and Environment  Analysis of birth defect data in the  vicinity of the Redfield plume area in southeastern Denver county  1989 1999  Colorado  Department of Public Health and the Environment  2002   online     Boyle E  Johnson H  Kelly A  McDonnell R  Congenital anomalies and proximity to landfill sites   Irish Medical J
39. 8  2001     Chaput EK  Meek JI  Heimer R  Spatial analysis of human granulocytic ehrlichiosis near Lyme   Connecticut  Emerging Infectious Diseases  8 943 948  2002   online     Huillard d Aignaux J  Cousens SN  Delasnerie Laupretre N  Brandel JP  Salomon D  Laplanche JL   Hauw JJ  Alperovitch A  Analysis of the geographical distribution of sporadic Creutzfeldt Jakob  disease in France between 1992 and 1998  International Journal of Epidemiology  31  490 495   2002   online     Mostashari F  Kulldorff M  Hartman JJ  Miller JR  Kulasekera V  Dead bird clustering  A potential  early warning system for West Nile virus activity  Emerging Infectious Diseases  9 641 646  2003    online     Ghebreyesus TA  Byass P  Witten KH  Getachew A  Haile M  Yohannes M  Lindsay SW   Appropriate Tools and Methods for Tropical Microepidemiology  a Case study of Malaria  Clustering in Ethiopia  Ethiopian Journal of Health Development  17 1 8  2003     Sauders BD  Fortes ED  Morse DL  Dumas N  Kiehlbauch JA  Schukken Y  Hibbs JR  Wiedmann  M  Molecular subtyping to detect human listeriosis clusters  Emerging Infectious Diseases  9 672   680  2003   online     Brooker S  Clarke S  Njagi JK  Polack S  Mugo B  Estambale B  Muchiri E  Magnussen P  Cox J   Spatial clustering of malaria and associated risk factors during an epidemic in a highland area of  western Kenya  Tropical Medicine and International Health  9  757 766  2004     Washington CH  Radday J  Streit TG  Boyd HA  Beach MJ  Addiss DG  Lovin
40. Ds have exactly the same  coordinates  then the data for the two are combined and treated as a single location     A coordinates file is not needed for purely temporal analyses     Related Topics   nput Tab  Coordinates File Name  Coordinates  Cartesian Coordinates  Latitude and  Longitude  Grid File     SaTScan User Guide v7 0 28    Cartesian Coordinates    Cartesian is the mathematical name for the regular planar x y coordinate system taught in high school   These may be specified in two  three or any number of dimensions  The SaTScan program will  automatically read the number of dimensions  which must be the same for all coordinates  If Cartesian  coordinates are used  the coordinates file should contain the following information     location id  Any numerical value or string of characters  Empty spaces may not form part of the id     coordinates  The coordinates must all be specified in the same units  There is no upper limit on the  number of dimensions     x and y coordinates  Required    z1 zN coordinates  Optional    Note  If you have more than 10 dimensions you cannot use the SaTScan Import Wizard for the  coordinates and grid files  but must specify them using the SaTScan ASCII file format     Related Topics   nput Tab  Coordinates  Latitude and Longitude  Coordinates File  Grid File  SaTScan  Import Wizard  SaTScan ASCII File Format     Latitude and Longitude    Latitudes and longitudes should be entered as decimal number of degrees  Latitude represents the  north
41. Health  Japan  Jean Francois Viel  Universit   de Franche Comt    France    SaTScan User Guide v7 0    72    Frequently Asked Questions       Input Data    1  I tried running SaTScan using one of the sample data sets  and all went well  but when I try it  on my own data there is an error  What should I do     SaTScan makes sure that the input data is compatible with each other  and with the options  specified on the windows interface  For example  it complains if there is a location ID in the case  file that is not present in the coordinates file  as it must know where to localize those cases  For  most data sets there is some need for data cleaning and SaTScan is designed to help with this  process by spotting and pointing out any inconsistencies found     2  I have constructed the ASCII input files exactly according to the description in the SaTScan  User Guide  but SaTScan complains that they are not in the correct format  What is wrong     The most likely explanation is that the files are in UNICODE rather than ASCII format  Just  convert to ASCII and it should work     3  In my data  there are zero or only one case in most locations  Can I use SaTScan for such  sparse data     Yes  you certainly can  One of the main reasons for using SaTScan is to avoid arbitrary  geographical aggregation of the data  letting the scan statistic consider different smaller or larger  aggregations through its continuously moving window  With finer geographical resolution of the  input data 
42. M  Heffernan R  Hartman J  Assun    o RM  Mostashari F  A space time permutation scan  statistic for the early detection of disease outbreaks  PLoS Medicine  2 216 224  2005   online    Ordinal Model   6  Jung I  Kulldorff M  Klassen A  A spatial scan statistic for ordinal data  Statistics in Medicine  2006   in press   online    Exponential Model    7  Huang L  Kulldorff M  Gregorio D  A spatial scan statistic for survival data  Biometrics  2006  in  press   online        Normal Model  8  Kulldorff M  et al   2006  manuscript in preparation     Multivariate Scan Statistic    9  Kulldorff M  Mostashari F  Duczmal L  Yih K  Kleinman K  Platt R  Multivariate spatial scan  statistics for disease surveillance  Statistics in Medicine  2006  in press   online        Elliptic Scanning Window    10  Kulldorff M  Huang L  Pickle L  Duczmal L  An elliptic spatial scan statistic  Statistics in Medicine   2006  epub ahead of print     SaTScan User Guide v7 0 79    Monte Carlo Hypothesis Testing   11  Dwass M  Modified randomization tests for nonparametric hypotheses  Annals of Mathematical  Statistics  28 181 187  1957    Recurrence Intervals    12  Kleinman K  Lazarus R  Platt R  A generalized linear mixed models approach for detecting incident  clusters of disease in small areas  with an application to biological terrorism  American Journal of  Epidemiology  159 217 24  2004     Adjustments    Adjusting for Covariates    13  Kulldorff M  A spatial scan statistic  Communications in Stat
43. N coordinate gt    Neighbors File Format    nbr       location ID     location ID of closest neighbor     location ID of 2  closest neighbor   etc  Special Max Circle Size File Format    max       location ID    lt    population    gt    Adjustment File Format    adj       lt location ID     relative risk     start time     end time    Time Formats    Times must be entered in a specific format  The valid formats are     2003   2003 10  2003 10 24  2003 10  2003 10 24  10 2003  10 24 2003  10 2003  10 24 2003    Single digit days and months may be specified with one or two digits  For example  September 9  2002   can be written as 2002 9 9  2002 09 09  2002 09 9  2002 9 09  2002 9 9  etc     Note  SaTScan v7 0 also support a couple of other time formats used in earlier versions  but they are no  longer recommended     Related Topics  Input Tab  Case File  Control File  Population File  Coordinates File  Grid File  Max  Circle Size File  Neighbors File  Adjustments File  SaTScan Import Wizard     SaTScan User Guide v7 0 34    Basic SaTScan Features    Most SaTScan analyses can be performed using the basic analysis and data features  The users specify  these on three different window tabs for input  analysis and output options respectively  These contain all  required specifications for a SaTScan analysis as well as a few optional ones  Additional features  all  optional  can be specified on the advanced features tabs     Related Topics  Statistical Methodology  Input Tab  Analy
44. P VALUE       10  10   13   10   10   14   10   10   10     Observed Cases    oBSERVED   11  11   14   11   11  15         t1    Expected Cases      EXPECTED       12  12   15   12   12  16                esenea epea fone H3 H8  38 pe pepr 124   Relative Risk        REL RISK      14  14  37           18                Mean Inside _   amp         MEKANIN                                   12    Mean Outside  mean oor        1  1 1   1 1 173    Unexplained Variance  VARIANCE                                  14    Standard deviation  DEVIATION                                  15        Table 3  Content of the cluster information output file  with dBase variable names and examples of  column ordering for a few different types of analyses     Related Topics  Cluster Cases Information File  Location Information File Output Tab  Results of  Analysis  Standard Results File    SaTScan User Guide v7 0 67    Cluster Cases Information File    cci       In the cluster cases information file  there is one line for each ordinal category  in each data set for each  cluster  For each cluster category data set combination  there is one column for the observed number of  cases  the expected number of cases  observed divided by expected and the relative risk  If neither the  ordinal model nor multiple data sets are used  then there is only one line for each cluster  and there is no  information in this file that is not also in the Cluster Information File     The file will have the same name as t
45. SaTScan    User Guide    for version 7 0    By Martin Kulldorff    August 2006    http   www satscan org     Contents    jn                                                                        M             4  The SaTScan SoftWare    cett uox HUI er etre oa dua et ertet tr toe EEE eben 4  Download and Installation                      sees nennen nennen rene rennen enne tren nenne 5  Test Run    itte nete BO E PRO E A TE E E E E EEEE teases sights 5  SEWQURMBEACANT I pH PE  gt    Niro coated n  ra AM                                                   9  Bernoulli Model    in eet eh hk ARAL RS IR Rei 9  Poisson Model    sisi tet bee en ta dede iioi rp 10  Space Time Permutation  Modlel             e nette nette rete t RI ERG RS Keis 10  Ordinal Model    eie rt ed lee deut ed eet e cte dd pee 11  Exponential Model    eet ro reset ere QU Me YE Pe ee EUER Ree er togeeebeubeveres seuauubanecbaass 11  Normal Model    nene eH IO RE 12  Probability Model Comparison                     eese nennen ene neeneeeeeenetent trennen trennen 13  Spatial  Temporal and Space Time Scan Statistics                     sse 14  Fikelihood Ratio  Test    eee teat tete ie Ie ic erre ie tebe ed E ee c ay 15  Secondary C lust  ts          1 tee rebate o iba eR e tbe een 17  Adjusting for More Likely Clusters    temeritate e ter ipo Eees I VERESS tos 17  Covariate Adjustments  aeoe eee a e e dedu eer ed o e needs 17  Spatial and Temporal Adjustments                   essere nnne 20  Missing Data reseni 
46. a  In the latter case  the percent increase or  decrease must be calculated using standard statistical regression software such as SAS or S plus  and  then inserted on the Risk Adjustments Tab     SaTScan User Guide v7 0 20    For space time analyses  it is also possible to adjust for a temporal trend non parametrically  This adjusts  the expected count separately for each aggregated time interval  removing all purely temporal clusters   The randomization is then stratified by time interval to ensure that each time interval has the same  number of events in the real and random data sets     The ability to adjust for temporal trends is much more limited for the Bernoulli  ordinal and exponential  models  as none of the above features can be used  Instead  the time must be divided into discrete time  periods  with the cases and controls in each period corresponding to a separate data set with separate case  and control files  The analysis is then done using multiple data sets     Related Topics  Spatial and Temporal Adjustments Tab  Time Aggregation  Poisson Model   Adjusting for Purely Spatial Clusters    In a space time analysis with the Poisson model  it is also possible to adjust for purely spatial clusters  in  a non parametric fashion  This adjusts the expected count separately for each location  removing all  purely spatial clusters  The randomization is then stratified by location ID to ensure that each location  has the same number of events in the real and random data
47. a that is not true  Does this  mean that the null hypothesis is wrong     When accepting the notion of statistical hypothesis testing one must also accept the fact that the  null hypothesis is never true  For example  when comparing the efficacy of two different surgical  procedures in a clinical trial we know for sure that their efficacy cannot be equal  but we still use  equality as the null hypothesis since we are interested in finding out whether one is better than  the other  Likewise  with geographical data we know that disease risk is not the same everywhere  but we still use it as the null hypothesis since we are interested in finding locations with excess  risk  Hence  the null hypothesis is wrong in the sense that we know it is not true but it is not  wrong in the sense that we should not use it     16  Does SaTScan assume that there is no spatial auto correlation in the data   Note  Spatial  auto correlation means that the location of disease cases is dependent on the location of other  disease cases  such as with an infectious disease where an infected individual is likely to infect  those living close by      SaTScan User Guide v7 0 75    No  SaTScan does not assume that there is no spatial auto correlation in the data  Rather  it is a  test of whether there is spatial auto correlation or other divergences from the null hypothesis  In  this sense it is equivalent to a statistical test for normality  which does not assume that the data is  normally distributed but
48. anagement Services  Inc  SaTScan    v7 0  Software for the  spatial and space time scan statistics  http   www satscan org   2006     Users of SaTScan should in any reference to the software note that     SaTScan    is a trademark of Martin  Kulldorff  The SaTScan   software was developed under the joint auspices of  i  Martin Kulldorff   ii   the National Cancer Institute  and  iii  Farzad Mostashari of the New York City Department of Health  and Mental Hygiene      Related Topics  SaTScan Bibliography  Methodological Papers     SaTScan User Guide v7 0 78    SaTScan Methodology Papers    Statistical Methodology    General Statistical Theory  Bernoulli and Poisson Models    1  Kulldorff M  A spatial scan statistic  Communications in Statistics  Theory and Methods  26 1481   1496  1997   online        Spatial Scan Statistic  Bernoulli Model   2  Kulldorff M  Nagarwalla N  Spatial disease clusters  Detection and Inference  Statistics in Medicine   14 799 810  1995   online    Retrospective Space Time Scan Statistic    3  Kulldorff M  Athas W  Feuer E  Miller B  Key C  Evaluating cluster alarms  A space time scan  statistic and brain cancer in Los Alamos  American Journal of Public Health  88 1377 1380  1998    online     Prospective Space Time Scan Statistic    4  Kulldorff M  Prospective time periodic geographical disease surveillance using a scan statistic   Journal of the Royal Statistical Society  A164 61 72  2001   online        Space Time Permutation Model   5  Kulldorff 
49. ar window plus five different elliptic shapes where the ratio of the longest to the shortest  axis of the ellipse is 1 5  2  3  4 or 5  For each shape  a different number of angles of the ellipse are used   equal to 4  6  9  12 and 15 respectively  The north south axis is always one of the angles included  and  the remainder is equally spaced around the circle  For each shape and angle  all possible sizes of the  ellipses are used  up to an upper limit specified by the user in the same way as for the circular window     When using an elliptic window shape  it is possible to request a non compactness  eccentricity  penalty   which will favor more compact over less compact ellipses even when they have slight lower likelihood  ratios but the less compact ellipses when the difference is larger  The formula for the penalty is     4s  s4  1    where s is the elliptic window shape defined as the ratio of the length of the longest to the  shortest axis of the ellipse  With a strong penalty a    with a medium penalty a   2 and with no penalty  a 0     Note  In batch mode  it is possible to request SaTScan to use any other collection of ellipses to define the  scanning window and any value of the eccentricity penalty parameter greater than zero     Note  The elliptic window option can only be used when regular two dimensional Cartesian coordinates  are used  but not when they are specified as latitude longitude  If you have the latter  you must first do a  planar map projection from th
50. ars  months or days  All case times must fall within the study  period as specified on the Input Tab     attribute  A variable describing some characteristic of the case  These may be covariates  Poisson  and space time permutation models   category  ordinal model   survival time  exponential model    censored  exponential model  or a continuous variable value  normal model   The covariates are  optional variables  and any number of categorical covariates may be specified as either numbers or  through characters  The categories for the ordinal model can be specified as any positive or negative  numerical value  The survival times must be positive numbers  Censored is a 0 1 variable with  censored 1 and uncensored 0     Example  If on April 1  2004 there were 17 male and 12 female cases in New York  the following  information would be provided     NewYork 12 2004 4 1 Female  NewYork 17 2004 4 1 Male    Note  Multiple lines may be used for different cases with the same location  time and attributes  SaTScan  will automatically add them     Related Topics  Input Tab  Case File Name  Multiple Data Sets Tab  Covariate Adjustment Using Input  Files  SaTScan Import Wizard  SaTScan ASCII File Format     Control File    The control file is only used with the Bernoulli model  It should contain the following information     location id  Any numerical value or string of characters  Empty spaces may not form part of the id    controls  The number of controls for the specified location and
51. ata  an exponential model for survival time data with or without censored  variables  or a normal model for other types of continuous data  The data may be either aggregated at the  census tract  zip code  county or other geographical level  or there may be unique coordinates for each  observation  SaTScan adjusts for the underlying spatial inhomogeneity of a background population  It  can also adjust for any number of categorical covariates provided by the user  as well as for temporal  trends  known space time clusters and missing data  It is possible to scan multiple data sets  simultaneously to look for clusters that occur in one or more of them     Developers and Funders    The SaTScan    software was developed by Martin Kulldorff together with Information Management  Services Inc  Financial support for SaTScan has been received from the following institutions     e National Cancer Institute  Division of Cancer Prevention  Biometry Branch  v1 0  2 0  2 1    e National Cancer Institute  Division of Cancer Control and Population Sciences  Statistical  Research and Applications Branch  v3 0  part   v6 1  part     e Alfred P  Sloan Foundation  through a grant to the New York Academy of Medicine  Farzad  Mostashari  PI   v3 0  part   3 1  4 0  5 0  5 1    e Centers for Disease Control and Prevention  through Association of American Medical Colleges  Cooperative Agreement award number MM 0870  v6 0  6 1  part      e National Institute of Child Health and Development  7 0     T
52. atial aggregation of  the data is typically better  which means more geographical locations           Analysis   6  With latitude longitude coordinates  what planar projection is used   No projection is used  SaTScan draws perfect circles on the spherical surface of the earth    7  When should I use the Bernoulli versus the Poisson model   Use the Bernoulli model when you have binary data  such as cases and controls  late and early  stage cancer or people with and without a disease  Use the Poisson model when you have cases  and a background population at risk  such as population numbers from the census    8  SaTScan adjusts for categorical covariates  but I want to adjust for a continuous variable  Is   that possible    One way to do this is to categorize the continuous variable  A better approach is to  1  calculate  the adjustment using a regular statistical software package such as SAS   ii  use the result from  that analysis to calculate the covariate adjusted expected number of cases at each location  and   iii  use these expected values instead of the population in the population file  With this  approach  there should not be any covariates in either the case or the population files    9  What should I use as the maximum geographical cluster size  Is that an arbitrary choice   If you don t want to be arbitrary  choose 50  of the population as the maximum geographical  cluster size  SaTScan will then evaluate very small and very large clusters  and everything in   betwe
53. bors may  be sorted according to distance along a subway network or a water distribution system     The first column of this file contains the location IDs defining the centroids of the scanning window  The  subsequent entries on each row are then the centroids neighbors in order of closeness  The scanning  window will expand in size until there are no more neighbors provided for that row  That means that this  file also defines the maximum window size  It is allowed to have multiple rows for the same location ID  centroid  each with a different set of closest neighbors     Related Topics  Coordinates File  Input Tab  Non Euclidean Neighbors Tab  SaTScan ASCII File  Format     Max Circle Size File    This optional file is used to determine the maximum circle size of the scanning window  when the  maximum is defined as a percentage of the  population   Normally  the percentage is based on the  population in the population file  but by using the max circle size file  a different    population    can be  specified for this purpose  One important reason for using the max circle size file is for prospective  space time analyses  where the regular population file may change over time  but one wants to evaluate  the same set of geographical circles each time  This is critical in order to properly adjust the prospective  space time scan statistic for earlier analyses  It can also be used for other purposes     The file should contain one line for each location  with the following inf
54. cations specified on the Analysis Tab     Note  With this option  the p values obtained after an early termination will be slightly conservative  The  interpretation does not change for p values obtained from a full run     Related Topics  Inference Tab  Monte Carlo Replications  Results of Analysis  Computing Time     SaTScan User Guide v7 0 52    Adjust for Earlier Analyses in Prospective Surveillance    When doing prospective purely temporal or prospective space time analyses repeatedly in a time periodic  fashion  it is possible to adjust the statistical inference  p values  for the multiple testing inherent in the  repeated analyses done  To do this  simply mark the    adjust for earlier analyses    box  and specify the date  for which you want to adjust for all subsequent analyses  This date must be greater or equal to the study  period start date and less than or equal to the study period end date  as specified on the Input Tab     For the adjustment to be correct  it is important that the scanning spatial window is the same for each  analysis that is performed over time  This means that the grid points defining the circle centroids must  remain the same  If the location IDs in the coordinates file remain the same in each time periodic  analysis  then there is no problem  On the other hand  if new IDs are added to the coordinates file over  time  then you must use a special grid file and retain this file through all the analyses  Also  when you  adjust for earlier anal
55. ce R  Lovegrove MC   Lafontant JG  Lammie PJ  Hightower AW  Spatial clustering of filarial transmission before and  after a Mass Drug Administration in a setting of low infection prevalence  Filaria Journal  3 3  2004    online     Dreesman J  Scharlach H  Spatial statistical analysis of infectious disease notification data in Lower  Saxony  Gesundheitswesen  66  783 789  2004     Bakker MI  Hatta M  Kwenang A  Faber WR  van Beers SM  Klatser PR  Oskam L  Population  survey to determine risk factors for Mycobacterium leprae transmission and infection  International  Journal of Epidemiology  33  1329 1336  2004     Jennings JM  Curriero FC  Celentano D  Ellen JM  Geographic identification of high gonorrhea  transmission areas in Baltimore  Maryland  American Journal of Epidemiology  161  73 80  2005     Polack SR  Solomon AW  Alexander NDE  Massae PA  Safari S  Shao JF  Foster A  Mabey DC   The household distribution of trachoma in a Tanzanian village  an application of GIS to the study of  trachoma  Transactions of the Royal Society of Tropical Medicine and Hygiene  99  218 225  2005     Wylie JL  Cabral T  Jolly AM  Identification of networks of sexually transmitted infection  a  molecular  geographic  and social network analysis  J Infect Diseases  191 899 906  2005     SaTScan User Guide v7 0 82    4T     48     49     50     51     52     53     Moore GE  Ward MP  Kulldorff M  Caldanaro RJ  Guptill LF  Lewis HB  Glickman LT   Identification of a space time cluster of cani
56. cts a certain  risk mass such as total person years lived in an area  The cases are then included as part of the population  count     Bernoulli Model  The Bernoulli model should be used when the data set contains individuals who may  or may not have a disease and for other 0 1 type variables  Those who have the disease are cases and  should be listed in the case file  Those without the disease are  controls   listed in the control file  The  controls could be a random set of controls from the population  or better  the total population except for  the cases  The Bernoulli model is a special case of the ordinal model when there are only two categories     Space Time Permutation Model  The space time permutation model should be used when only case  data is available  and when one wants to adjust for purely spatial and purely temporal clusters     Ordinal Model  The ordinal model is used when individuals belong to one of three or more categories   and when there is an ordinal relationship between those categories such as small  medium and large   When there are only two categories  the Bernoulli model should be used instead     Exponential Model  The exponential model is used for survival time data  to search for spatial and or  temporal clusters of exceptionally short or long survival  The survival time is a positive continuous    variable  Censored survival times are allowed for some but not all individuals     Normal Model  The normal model is used for continuous data  Obser
57. de v7 0    Statistical Methodology    Scan statistics are used to detect and evaluate clusters in a temporal  spatial or space time setting  This is  done by gradually scanning a window across time and or space  noting the number of observed and  expected observations inside the window at each location  In the SaTScan software  the scanning window  is either an interval  in time   a circle or an ellipse  in space  or a cylinder with a circular or elliptic base   in space time   Multiple different window sizes are used  The window with the maximum likelihood is  the most likely cluster  that is  the cluster least likely to be due to chance  A p value is assigned to this  cluster     The general statistical theory behind the spatial and space time scan statistics used in the SaTScan  software is described in detail by Kulldorff  1997    for the Bernoulli and Poisson models  by Kulldorff et  al   2005  for the space time permutation model  by Jung et al   2006  for the ordinal model  by Huang  et al   2006   for the exponential model and by Kulldorff et al   2006  for the normal model  Here we  give a brief non mathematical description  For all probability models  the scan statistic adjusts for the  uneven geographical density of a background population  and the analyses are conditioned on the total  number of cases observed     Related Topics  7he SaTScan Software  Basic SaTScan Features  Advanced Features  Analysis Tab   Methodological Papers        Bernoulli Model    With the
58. e is that it gives some information  about the alternatives for which the test can be expected to have good power     21  For the exponential model  it is assumed that the survival times follow an exponential  distribution  Are the results biased if the survival times follow a different distribution     No matter which distribution generated the survival times  the p values from the statistical  inference are still valid and unbiased  no matter which distribution  This 1s because rather than  generating the random data from an exponential distribution  each random data is a spatial  permutation of the survival times  A greatly missspecified distribution may lead to a loss in  power though  For example  if the data is Bernoulli distributed  the exponential model has less  power to detect a cluster than the Bernoulli model  For continuous distributions such as gamma  and lognormal  the exponential model has been shown to work well        Operating Systems    22  Is SaTScan available for Linux machines     A Linux version of SaTScan is available  It can be downloaded from the www satscan org web  site     23  Is SaTScan available for Unix machines     There is a Unix version of SaTScan available for Solaris  It has not been thoroughly tested so it  may not work on all computers  Anyone interested in trying this version should send an email to     kulldorff satscan org        SaTScan User Guide v7 0 77    SaTScan Bibliography    Different SaTScan analysis options were developed a
59. e latest date to be included in the study period     Note  The start and end dates cannot be specified to a higher precision than the precision of the times in  the case and control files     SaTScan User Guide v7 0 36    If the user does not specify month  then by default it will be set to January for the start date and to  December for the end date  Likewise  if day is not specified  then by default it will be set to the first of  the month for the start date and the last of the month for the end date     Related Topics   nput Tab  Case File  Control File  Time Precision  Time Aggregation     Population File Name    Specify the name of the input file with population data  This file is only used for analyses using the  Poisson probability model     Related Topics  Input Tab  Population File     Coordinates File Name    Specify the name of the input file with geographical coordinates of all the locations with data on the  number of cases  controls and or population  When multiple data sets are used  the coordinates file must  include the coordinates for all locations found in any of the data sets     Related Topics   nput Tab  Coordinates  Coordinates File     Grid File Name    Specify the name of the optional grid file with the coordinates of the circle centroids used by the spatial  and space time scan statistics  If no special grid file is specified  then the coordinates in the coordinates  file are used are used for this purpose     Related Topics  Input Tab  Coordinates 
60. e latitude longitude coordinates  of which there are many different ones  proposed in the geography literature     Related Topics  Advanced Features  Computing Time  Include Purely Spatial Clusters  Likelihood Ratio  Test  Maximum Spatial Cluster Size  Spatial Temporal and Space Time Scan Statistics  Spatial Window  Tab     Temporal Window Tab    Advanced Analysis Features    Spatial Window Temporal Window   Space and Time Adjustments   Inference      Maximum Temporal Cluster Size       is  50 percent of the study period   lt   90   default   50    C is   years      Include Purely Spatial Clusters  Temporal Size   100      Set Defaults    Flexible Temporal Window Definition      Include only windows with     Start time in range   2006    fr to   2500 fiz E  End time in range   2006    fi to   2000 fiz  7       SaTScan User Guide v7 0 48    Temporal Window Tab Dialog Box    Use the Temporal Window Tab to define the exact nature of the scanning window with respect to  time     Related Topics  Advanced Features  Analysis Tab  Spatial Window Tab  Maximum Temporal Cluster  Size  Include Purely Spatial Clusters  Flexible Temporal Window Definition     Maximum Temporal Cluster Size    For purely temporal and space time analyses  the maximum temporal cluster size can be specified in  terms of a percentage of the study period as a whole or as a certain number days  months or years  The  maximum must be at least as large as the length of aggregated time interval length  If specified as
61. e no reported cluster with its center contained in  a subsequently reported less likely cluster     No Pairs of Centers Both in Each Others Clusters  Secondary clusters are not centered in a previously  reported cluster that contains the center of a previously reported cluster  This means that there will be no  pair of reported clusters each of which contain the center of the other     No Restrictions   Most Likely Cluster for Each Grid Point  The most extensive option is to all  present clusters in the list  with no restrictions  This option reports the most likely cluster for each grid  point  This means that the number of clusters reported is identical to the number of grid points     Note  The criteria for determining overlap is based only on geography  and ignoring time  Hence   Warning     No Restrictions    may create output files that are huge in size     Related Topics  Advanced Features  Inference Tab  Results of Analysis  Maximum Spatial Cluster Size   Report Only Small Clusters     Report Only Small Clusters    When the most likely clusters are very large in size  it is sometimes of interest to know whether they  contain smaller clusters that are statistically significant on their own strength  One way to find such  clusters is to play around with the maximum circle size parameter  but that leads to incorrect statistical  inference as the maximum circle size is then chosen based on the results of the analysis  leading to pre   selection bias  To avoid this problem
62. e of the population  then the special max circle size file must be used  This is to  ensure that the evaluated geographical circles do not change over time     Related Topics  Advanced Features  Spatial Window Tab  Max Circle Size File  Include Purely  Temporal Clusters  Computing Time     Include Purely Temporal Clusters    A purely temporal cluster is one that includes the whole geographic area but only a limited time period   When doing a space time analysis  it is possible to allow potential clusters to contain the whole  geographical area under study  as an exception to the maximum spatial cluster size chosen  In this way   purely temporal clusters are included among the collection of windows evaluated     Note  This option is not available for the space time permutation model  as that model automatically  adjusts for purely temporal clusters  When adjusting for purely temporal clusters using stratified  randomization  all purely temporal clusters are adjusted away  and this parameter has no effect on the    analysis     Related Topics  Advanced Features  Spatial Window Tab  Maximum Spatial Cluster Size  Include  Purely Spatial Clusters  Temporal Trend Adjustment  Computing Time     Elliptic Scanning Window    As an advanced option  it is possible to use a scanning window that is consists not only of circles but also  of ellipses of different shapes and angles  When the elliptic spatial scan statistic is requested  SaTScan    SaTScan User Guide v7 0 47    uses the circul
63. e scan statistic with an  application to syndromic surveillance  Epidemiology and Infection  2005  133 409 419     110  Nordin JD  Goodman MJ  Kulldorff M  Ritzwoller DP  Abrams AM  Kleinman K  Levitt MJ     Donahue J  Platt R  Simulated anthrax attacks and syndromic surveillance  Emerging Infectious  Diseases  2005  11 1394 98   online     SaTScan User Guide v7 0 87    111     112     Yih K  Abrams A  Kleinman K  Kulldorff M  Nordin J  Platt R  Ambulatory care diagnoses as  potential indicators of outbreaks of gastrointestinal illness     Minnesota  Morbidity and Mortality  Weekly Report  54 Suppl 157 62  2005   online     Besculides M  Heffernan R  Mostashari F  Weiss D  Evaluation of school absenteeism data for early  outbreak detection  New York City  BMC Public Health  2006  5 105   online    Veterinary Medicine  Domestic Animals    113     114     115     116     117     118     119     120     121     122     123     124     125     Norstr  m M  Pfeiffer DU  Jarp J  A space time cluster investigation of an outbreak of acute  respiratory disease in Norwegian cattle herds  Preventive Veterinary Medicine  47  107 119  2000     Ward MP  Blowfly strike in sheep flocks as an example of the use of a time space scan statistic to  control confounding  Preventive Veterinary Medicine  49  61 69  2001     United States Department of Agriculture  West Nile virus in equids in the Northeastern United  States in 2000  USDA  APHIS  Veterinary Services  2001   online     Doherr MG  Hett
64. e within the Study Period     C Ignore cases and controls that are outside the Study Period     Geographical Coordinates Check    f Check to ensure that all locations in the case  control and population files  are present in the coordinates or neighbors file     C Ignore data in the case  control and population files that do not correspond  to a location ID listed in the coordinates or neighbors file        Data Checking Tab Dialog Box    SaTScan User Guide v7 0 44    Study Period Check    By default  SaTScan will check that all the cases and all the controls are within the specified temporal  study period  On this tab  it is possible to turn this off  Cases and controls outside the study period will  then be ignored  This may be used if  for example  you only want to analyze a temporal subset of the data  in the case and control input files     Geographical Coordinates Check    By default  SaTScan will check that all the cases  controls and population numbers are within one of the  locations specified in the coordinates file  On this tab  it is possible to turn this off  Data in other  locations not present in the coordinates file are then ignored  This may be used if  for example  you only  want to analyze a geographical subset of the data  in which case only the geographical coordinates file  has to be modified while the other files can be used as they are     Related Topics  Advanced Features  Case File  Input Tab  Study Time Period     Non Euclidean Neighbors Tab    Adva
65. ead sum  up the log likelihood ratios of the data sets with fewer than expected number of cases within the window  in question  When searching for both high and low clusters  both sums are calculated  and the maximum  of the two is used to represent the log likelihood ratio for that window     Note  All data sets must use the same probability model and the same geographical coordinates file     Related Topics  Multiple Data Sets Tab  Covariate Adjustment Using Multiple Data Sets  Coordinates  File     SaTScan User Guide v7 0 23    Comparison with Other Methods       Scan Statistics    Scan statistics were first studied in detail by Joseph Naus    A major challenge with scan statistics is to    find analytical results concerning the probabilities of observing a cluster of a specific magnitude and  there is a beautiful collection of subsequent mathematical theory obtaining approximations and bounds  for these probabilities under a variety of settings  Excellent reviews of this theory have been provided by  Glaz and Balakrishnan    and Glaz  Naus and Wallenstein    Two common features for most of this  work are   1  they use a fixed size scanning window  and  ii  they deal with count data where under the  null hypothesis  the observed number of cases follow a uniform distribution in either a continuous or  discrete setting  so that the expected number of cases in an area is proportional to the size of that area     In disease surveillance  neither of these assumptions is met  si
66. eeeeene eene nhennren eren rennen enne tnne nnne 79  Selected SaTScan Applications by Field of Study                      sse 81  Other References in the User  Guides  erre Eee EUR reve Ue eene es 90       SaTScan User Guide v7 0    Introduction       The SaTScan Software    Purpose    SaTScan is a free software that analyzes spatial  temporal and space time data using the spatial  temporal   or space time scan statistics  It is designed for any of the following interrelated purposes     e Perform geographical surveillance of disease  to detect spatial or space time disease clusters  and  to see if they are statistically significant    e Test whether a disease is randomly distributed over space  over time or over space and time    e Evaluate the statistical significance of disease cluster alarms    e Perform repeated time periodic disease surveillance for early detection of disease outbreaks     The software may also be used for similar problems in other fields such as archaeology  astronomy   botany  criminology  ecology  economics  engineering  forestry  genetics  geography  geology  history   neurology or zoology     Data Types and Methods    SaTScan uses either a Poisson based model  where the number of events in a geographical area is  Poisson distributed  according to a known underlying population at risk  a Bernoulli model  with 0 1  event data such as cases and controls  a space time permutation model  using only case data  an ordinal  model  for ordered categorical d
67. eis IA  Almeida MC  Homicide clusters  and drug traffic in Belo Horizonte  Minas Gerais  Brazil from 1995 to 1999  Cadernos de Sa  de  Publica  17 1163 1171  2001   online     145  Ceccato V  Haining R  Crime in border regions  The Scandinavian case of Oresund  1998 2001   Annals of the Association of American Geographers  94 807 826  2004     Related Topics  Methodological Papers  SaTScan Bibliography  Suggested Citation        Other References in the User Guide    146  Alt KW  Vach W  The reconstruction of  genetic kinship  in prehistoric burial complexes   problems  and statistics  In Bock HH  Ihm P  eds   Classification  data analysis  and knowledge organization   Berlin  Springer Verlag  1991     SaTScan User Guide v7 0 90    147  Baker RD  Testing for space time clusters of unknown size  Journal of Applied Statistics  23 543   554  1996     148  Besag J  Newell J  The detection of clusters in rare diseases  Journal of the Royal Statistical Society   A154 143 155  199      149  Bithell JF  The choice of test for detecting raised disease risk near a point source  Statistics in  Medicine  14 2309 2322  1995     150  Cuzick J  Edwards R  Spatial clustering for inhomogeneous populations  Journal of the Royal  Statistical Society  B52 73 104  1990     151  Diggle PJ  Chetwynd AD  Second order analysis of spatial clustering for inhomogeneous  populations  Biometrics  47 1155 1163  199      152  Diggle P  Chetwynd AG  H  ggkvist R  Morris SE  Second order analysis of space time
68. en    10  Why can t I select a maximum geographical cluster size that is larger than 50  of the   population   Clusters of excess risk that are larger than 50  of the population at risk are better viewed as  cluster with lower risk outside the scanning window  and the area outside will always have a very  irregular geographical shape  If there is interest in clusters with lower risk than expected  it is  more appropriate to select the low rates option on the analysis tab   Results  11  I get an error stating that the output file could not be created  Why     Windows 2000 and Windows XP have tighter default security settings than Windows  95 98 NT ME  and under these newer versions of Windows  permission to write to the  Program  Files  folder is given only to administrators and power users of that machine  If the output file  path includes the  Program Files  folder and you do not have administrative or power user  privileges on your computer  Windows prevents SaTScan from creating the output file in the    SaTScan User Guide v7 0 74    designated location  The solution is to specify a different output file name using a different  directory     12  Since the SaTScan results are based on Monte Carlo simulated random data  why are the p   values the same when I run the analysis twice     All computer based simulations are based on pseudo random number generators  When the same  seed is used  exactly the same sequence of pseudo random numbers will be generated  Since  SaTScan us
69. er     P values listed for secondary clusters are calculated in the same way as for the most likely cluster   by comparing the log likelihood ratio of secondary clusters in the real data set with the log  likelihood ratios of the most likely cluster in the simulated data sets  This means that if a secondary  cluster is significant  it can reject the null hypothesis on its own strength without help of any other  clusters  It also means that these p values are conservative      PARAMETER SETTINGS  A reminder of the parameter settings used for the analysis     Additional results files  The name and location of additional results files are provided  when  applicable     Related Topics  Output Tab  Clusters Reported Tab  Cluster Information File  Location Information  File  Risk Estimates for Each Location  Simulated Log Likelihood Ratios  Cartesian Coordinates   Additional Output Files     Cluster Information File    col       In the cluster information file  each cluster 1s on one line  with different information about the cluster in  different columns  For each cluster there is information about the location and size of the cluster  its log  likelihood ratio and the p value  Except for the ordinal model and when multiple data sets are used  there  is also information about the observed and expected number of cases  observed expected and relative  risk  For the ordinal model and multiple data sets  these numbers depend on the data set and or category   and the information is ins
70. ercent     This option is only available for the Poisson model     Related Topics  Adjustments File  Spatial and Temporal Adjustments Tab  Time Aggregation  Poisson  Model  Missing Data    SaTScan User Guide v7 0 21    Missing Data    If there is missing data for some locations and times  it is important to adjust for that in the analysis  If  not  you may find statistically significant low rate clusters where there is missing data  or statistically  significant high rate clusters in other locations  even though these are simply artifacts of the missing data     Bernoulli Model    To adjust a Bernoulli model analysis for missing data  do the following  If cases are missing for a  particular location and time period remove the controls for that same location and time  Likewise  if  controls are missing for a particular location and time  remove the cases for that same location and time   This needs to be done before providing the data to SaTScan  If both cases and controls are missing for a  location and time  you are fine  and there is no need for any modification of the input data     Ordinal Model    To adjust an ordinal model analysis for missing data  do the following  If one or more categories are  missing for a particular location and time period  remove all cases in the remaining categories from that  same location and time  This needs to be done before providing the data to SaTScan  If all cases in all  categories are missing for a location and time  you are fine  and
71. ermutation and exponential models   b   16 for the Bernoulli model   b   4 for the ordinal model   b   20 for the normal model   CAT   the number of actegories in the ordinal model  CAT 1 for other models    CONT   3 for exponential model  CONT   4 for normal model  and CONT  1 for all other models   C   the total number of cases  for the ordinal model or multiple data sets  C20    R   1 when scanning for high rates only or low rates only  R22 when scanning for either high or low  rates   D   number of data sets    P   number of processors available on the computer for SaTScan use    SaTScan User Guide v7 0 62    For purely spatial analyses and most space time analyses  TI is much less than G  and so it is the  expression to the left of the first plus sign above that is critical in terms of memory requirements  Table 2  provides estimates of the memory requirements when G L  mg 0 5 and TI 1                                SG L Memory SG L Memory   Needed Needed  3 500   32Mb   44 000   2Gb    6 500   64Mb   63 000   4Gb    10 000   128Mb   89 000   16Gb    15 000   256Mb   126 000   32Gb    22 000   512Mb   178 000   64Gb    32 000   1Gb   250 000   128Gb         Table 2  Approximate memory requirements for a purely spatial analysis  when the maximum geographical cluster size is 50  of the population     Special Memory Allocation    When the number of locations is very large while the number of cases  time intervals and simulations are  not  SaTScan sometimes uses an alternati
72. ers smaller than      50 percent of the population at risk   lt   50   default   50    lv  50 percent of the population defined in the max circle size file   lt   50      iw acircle with a fi kilometer radius       Clusters Reported Tab Dialog Box  This tab is reached by clicking the Advanced button in the lower right corner of the Output Tab     Related Topics  Advanced Features  Output Tab  Results of Analysis  Criteria for Reporting Secondary  Clusters  Report Only Small Clusters     Criteria for Reporting Secondary Clusters    SaTScan evaluates an enormous amount of different circles cylinders in order to find the most likely  cluster  All of these may be considered secondary clusters with either a high or a low rate  To present all  of these secondary clusters is impractical and unnecessary since many of them will be very similar to  each other  For example  to add one location with a very small population to the most likely cluster will  not decrease the likelihood very much  even if that location contains no additional cases  Such a  secondary cluster is not interesting even though it could have the second highest likelihood among all the  clusters evaluated     Rather than reporting information about all evaluated clusters  SaTScan only reports a limit number of  secondary clusters using criteria specified by the user  A three stage procedure is used to select the  secondary clusters to report     1  For each circle centroid  SaTScan will only consider the cluster wit
73. es are displayed in the top box of the job status window  Warnings and  error messages are displayed in the bottom box of the job status window  Upon successful completion of  the calculations  the standard results file will be shown in the job status window     Related Topics  Launching the Analysis  Warnings and Errors     Warnings and Errors           You are running SaTScan v4 0 Beta 98         SaTScan is free  available for download from http    www satscan org    It may be used free of charge as long as proper citations are given  ito both the SaTScan software and the underlying statistical methodology     Reading the geographic coordinates file   Reading the population file   Reading the case file   Job cancelled  Please review  Warnings Errors  window below        Warnings   Errors   Error  Date  1977  in record 1 of case file is not   within study period beginning 1900 1 1 and ending 1900 12 31   Error  Date  1974  in record 2 of case file is not   within study period beginning 1900 1 1 and ending 1900 12 31  zl    E Mail      Print      Close              SaTScan Status Messages and Warnings Errors Dialog Box    Warning Messages    SaTScan may produce warnings as the job is executing  If a warning occurs  a message is displayed in  the Warnings Errors box on the bottom of the job status window  A warning will not stop the execution  of the analysis  If a warning occurs  please review the message and access the help system if further  information is required     If y
74. es the same seed for every run  you obtain the same result for two runs when the  input data is the same     13  I ran exactly the same data using two different versions of SaTScan v2 1 and SaTScan  v3 0 3 1 4 0 5 0 5 1 6 0 7 0  but the p values are different  Why  Which one is the correct one     Compared to v2 1  the pseudo random number generation is done slightly differently in SaTScan  v3 0 and later  typically resulting in slightly different p values  While both are valid and correct   only one p value should be used  We recommend always using the p value that was calculated  first     14  I ran exactly the same data using SaTScan v2 1 3 0 3 1 4 0 and SaTScan v5 0 5 1 6 0 7 0  but  the results are different  Why     In earlier version  SaTScan defined overlapping clusters based on whether the two circles where  overlapping  In SaTScan v5 0 and later  two clusters overlap if they have at least one location ID  in common  These two definitions are usually the same  but in rare cases they may be different  If  you were running the Poisson model  another possible reason for the difference is that SaTScan  v5 0 and later uses a more precise algorithm for calculating the expected number of cases when  the population dates in the population file are specified using days rather than months or years        Interpretation    15  In SaTScan  after adjusting for population density and covariates such as age  the null   hypothesis is complete spatial randomness  For most disease dat
75. ess 66  Cluster Cases Information File    cci          ccccccccsssccesseececeesececseseececseeeecsesaececseceeeeesueeeesesaeeeeseeaeeess 68  Location Information File    gis    eaer iaie a E EEE RE E E E EE EE 68  Risk Estimates for Each Location File    rr           ccccccsscccssssececeesceceesnececsesaececseseeeesaeeecnesaeeesseeeeeees 68  Simulated Log Likelihood Ratios File    Ilr                      eese eese eee nnne 69  zn                                                                 70  NEW  Versions  uei pe REPE npe e REDE 70  Analysis History File    ite pner ee Donostia otim pedes sets 70  Random  Number Generator   coit inosine eroe a iieis iii ii 70  Contact US fi ice EROR EEE EE eo USER PEU QUEE SERERE E EO EAEE EE e ues 70  Acknowledgements    ttt Hee p eee bre ep teen sas EP ERU E eo Foret EA 71  Frequently Asked Questions                    eee eee esee essen ena en enata tuse ta sins ta sts tasses suse tasse ta sens enses tasses sone soseo 73  Input  Data cosets toe Rote RU eed eet see RU Ite On deeem E o INTER EUE 73  LUCR Mc sudeusceenncubie Woke doeviaueebebawiaceteneecud etebecoaiieesveeb inte 74  Results eene eer eth tiere E ntque roter tet di ie istius 74  Interpretation  otio e Determine sees 75  Operating Systems    ose ete ted tree p tef otn eroe te nc tede TI  SaTScan Bibliography  RR 78  Suggested Citations    i eit o Disk Lesa bese d eite wach oth e pete i etur Siete eve ceri estan 78  SaTScan Methodology Papers                  ssessssseeeeee
76. eters with a     prm    file extension  The parameter file is stored  in an ASCII text file format     To save analysis parameters    1  If the parameters have not previously been saved  select Save As from the File menu  A    Save  Parameter File As  dialog will open     2  Select a directory location from the    Save In    drop down menu at the top of the dialog box     3  Enter a name for your parameter file in the    File Name    text box  It is recommended that the   Save As Type  selection remain as Parameter Files    prm      4  Press the Save button     Once the parameter file is initially saved  save changes to the file by selecting     Save    on the File menu   The file will save without opening the  Save Parameter File As  dialog     To open a saved parameter file    1  Select    Open    from the File menu or click on the   button in the toolbar  A Select Parameter  File dialog will open     2  Locate the desired file using the Look in drop down menu   3  Once the file is located  highlight the file name by clicking on it   4  Press the Open button     A Parameter tab dialog will open containing the saved parameter settings  The location and name of the  parameter file is listed in the title bar of this dialog     Related Topics  Specifying Analysis and Data Options  Basic SaTScan Features  Advanced Features   Batch Mode     SaTScan User Guide v7 0 59    Parallel Processors    If you have parallel processors on your computer  SaTScan can take advantage of this by 
77. g the study period     Annual rate per 100 000  Poisson model   This is calculated taking leap years into account and is  based on the average length of a year of 365 2425  If calculated by hand ignoring leap years  the  numbers will be slightly different  but not by much     Variance  normal model   This is the variance for all observations in the data assuming a common  mean     MOST LIKELY CLUSTER  Summary information about the most likely cluster  that is  the  cluster that is least likely to be due to chance     Radius  When latitude and longitude are used  the radius of the circle is given in kilometers  When  regular Cartesian coordinates are used  the radius of the circle is given in the same units as those  used in the coordinates file     Population  This is the average population in the geographical area of the cluster  The average is  taken over the whole study period even when it is a space time cluster whose temporal length is only  a part of the study period     Unexplained Variance  normal model   This is the estimated common variance for all observations  in the data that cannot be explained by this particular cluster  It is calculated by using the different    estimated means inside and outside the cluster     P value  The p values are adjusted for the multiple testing stemming from the multitude of  circles cylinders corresponding to different spatial and or temporal locations and sizes of potential    SaTScan User Guide v7 0 65    clusters evaluated  This 
78. gy    137  Sudakin DL  Horowitz Z  Giffin S  Regional variation in the incidence of symptomatic pesticide  exposures  Applications of geographic information systems  Journal of Toxicology   Clinical  Toxicology  40 767 773  2002     Psychology    138  Margai F  Henry N  A community based assessment of learning disabilities using environmental and  contextual risk factors  Social Science and Medicine  56  1073 1085  2003     Brain Imaging    139  Yoshida M  Naya Y  Miyashita Y  Anatomical organization of forward fiber projections from area  TE to perirhinal neurons representing visual long term memory in monkeys  Proceedings of the  National Academy of Sciences of the United States of America  100 4257 4262  2003   online     History    140  Witham CS  Oppenheimer C  Mortality in England during the 1783 4 Laki Craters eruption   Bulletin of Volcanology  67 15 25  2004     Criminology    141  Jefferis ES  A multi method exploration of crime hot spots  SaTScan results  National Institute of  Justice  Crime Mapping Research Center  1998     142  Kaminski RJ  Jefferis ES  Chanhatasilpa C  A spatial analysis of American police killed in the line  of duty  In Turnbull et al   eds    Atlas of crime  Mapping the criminal landscape  Phoenix  AZ   Oryx Press  2000     143  LeBeau JL  Demonstrating the analytical utility of GIS for police operations  A final report   National Criminal Justice Reference Service  2000   online     144  Beato Filho CC  Assun    o RM  Silva BF  Marinho FC  R
79. h the highest likelihood  among those that share that same centroid  grid point      2  These clusters will be ordered in descending order by the value of their log likelihood ratios   creating a list with the same number of clusters as there are grid points     SaTScan User Guide v7 0 54    3   The most likely cluster will always be reported  Options for reporting secondary clusters follow   Except under the last option  secondary clusters will only be reported if p lt 1     No Geographical Overlap  Default  Secondary clusters will only be reported if they do not overlap with  a previously reported cluster  that is  they may not have any location IDs in common  Therefore  no  overlapping clusters will be reported  This is the most restrictive option  presenting the fewest number of  clusters     No Cluster Centers in Other Clusters  Secondary clusters are not centered in a previously reported  cluster and do not contain the center of a previously reported cluster  While two clusters may overlap   there will be no reported cluster with its centroid contained in another reported cluster     No Cluster Centers in More Likely Clusters  Secondary clusters are not centered in a previously  reported cluster  This means that there will be no reported cluster with its center contained in a  previously reported more likely cluster     No Cluster Centers in Less Likely Clusters  Secondary clusters do not contain the center of a  previously reported cluster  This means that there will b
80. he standard results file  but with the extensions   cci txt and    cci dbf respectively  and will be located in the same directory     Related Topics  Cluster Information File  Location Information File  Output Tab  Results of Analysis   Standard Results File     Location Information File    gis       As an option  a special output file may be created describing the various clusters in a way that is easy to  incorporate into a geographical information system  GIS   This file may be requested in ASCII and or  dBase format  and can be accessed using any text editor or spreadsheet program  It will have the same  name as the results file  but with the extensions   gis txt and   gis dbf respectively  and it will be located  in the same directory  This file has one row for each location belonging to a cluster  The columns shown  depends on the chosen analysis  including among other the following information       Location ID       Cluster Number       P Value of Cluster      Observed Cases in Cluster     Expected Cases in Cluster     Observed Expected in Cluster     Observed Cases in Location     Expected Cases in Location        Observed Expected in Location      Note  The second  third  fourth  fifth and sixth column entries are the same for all locations belonging to  the same cluster     Related Topics  Output Tab  Results of Analysis  Standard Results File  Cluster Information File     Risk Estimates for Each Location File    rr       If the option to include risk estimates fo
81. heir financial support is greatly appreciated  The contents of SaTScan are the responsibility of the  developer and do not necessarily reflect the official views of the funders     SaTScan User Guide v7 0 4    Related Topics  Statistical Methodology  SaTScan Bibliography    Download and Installation    To install SaTScan  go to the SaTScan Web site at  http   www satscan org  and select the SaTScan  download link  After downloading the SaTScan installation executable to your PC  click on its icon and  install the software by following the step wise instructions     Related Topics  New Versions     Test Run    Before using your own data  we recommend trying one of the sample data sets provided with the  software  Use these to get an idea of how to run SaTScan  To perform a test run     1  Click on the SaTScan application icon   2  Click on    Open Saved Session        3  Select one of the parameter files  for example    nm prm     Poisson model      NHumberside prm      Bernoulli model  or    NYCfever prm     space time permutation model      4  Click on the Execute    button  A new window will open with the program running in the top  section and a Warnings Errors section below  When the program finishes running the results will  be displayed     Note  The sample files should not produce warnings or errors     Related Topics  Sample Data Sets     Sample Data Sets    Six different sample data sets are provided with the software  They are automatically downloaded to your  compu
82. hman ST  MacDougal L  Danley R  Mrosszczyk M  Sorensen AM  Kulldorff M   Geographical surveillance of breast cancer screening by tracts  towns and zip codes  Journal of  Public Health Management and Practice  6  48 57  2001     New York State Department of Health  Cancer Surveillance Improvement Initiative  2001   online     Gregorio DI  Kulldorff M  Barry L  Samociuk H  Zarfos K  Geographic differences in primary  therapy for early stage breast cancer  Annals of Surgical Oncology  2001  8 844 849  2001   online     Roche LM  Skinner R  Weinstein RB  Use of a geographic information system to identify and  characterize areas with high proportions of distant stage breast cancer  Journal of Public Health  Management and Practice  8 26 32  2002     Jemal A  Kulldorff M  Devesa SS  Hayes RB  Fraumeni JF  A geographic analysis of prostate  cancer mortality in the United States  International Journal of Cancer  101 168 174  2002     Michelozzi P  Capon A  Kirchmayer U  Forastiere F  Biggeri A  Barca A  Perucci CA  Adult and  childhood leukemia near a high power radio station in Rome  Italy  American Journal of  Epidemiology  155 1096 1103  2002     Zhan FB  Lin H  Geographic patterns of cancer mortality clusters in Texas  1990 to 1997  Texas  Medicine  99 58 64  2003     Thomas AJ  Carlin BP  Late detection of breast and colorectal cancer in Minnesota counties  an  application of spatial smoothing and clustering  Statistics in Medicine  22 113 127  2003     Buntinx F  Geys H  Lousbe
83. i AAT D XR tec ede abu I EGRE E E ES 22  Multivariate Scan with Multiple Data Sets                 sese 23   Comparison with Other Methods                      eee e esee eee ee eene tenentes thats tuse ta suse ta sene ta statu sees tosta sse ta an 24  Nur lE 24  Spatial and Space Time Clustering                   eese ene entren trennen rennen 24   I idirirec e                                             26  D  ta Requirements    eee ober poU TU RHET regt tae edes 26  erum                                               27  Control M                        S 27  Population File  uice eren PIRA UD E 28  Coordinates Eie    nre et ee e PR PU DR e RR HEN Ihre Pee te eerte egens 28  Sung EE 30  Neighbots File  i casters kat cole RUE t ieee hws deii e Ie Bava 30  Max Citcle Size Filen onte babeo Benet ebur eet iban e 30  Adjustments File    one ete e poit e eR PRU oi tete UR sey 31  SaT Scan  Import  Wizards  isee ed ete ee ded rd be ated nc eyes 32  SaTScan ASCII File Format           tet tenir steer epe ee testet ee rete ceva erbe taie 33   Basic SaTScan T                                                                   35  Iput Tab M                                           35  Analysis T  b z oe teo ERU SR UU eR eco isti ite s 38  Output Tab    e hee et ety ete Nia i e ret Ee a cand 41   Advanced Feature                                      M 43  Multiple  Data Sets  Tab  5i Reap Ee eS RS 43  Data  Checking Tab    rechnen oU D InRERIPR DEDI Dae ess 44  Non E  chd  an Neighbors
84. ined by repeating the same analytic exercise on  a large number of random replications of the data set generated under the null hypothesis  The p value is  obtained through Monte Carlo hypothesis testing  by comparing the rank of the maximum likelihood  from the real data set with the maximum likelihoods from the random data sets  If this rank is R  then p    R   1    simulation   In order for p to be a    nice looking    number  the number of simulations is restricted  to 999 or some other number ending in 999 such as 1999  9999 or 99999  That way it is always clear  whether to reject or not reject the null hypothesis for typical cut off values such as 0 05  0 01 and 0 001     The SaTScan program scans for areas with high rates  clusters   for areas with low rates  or  simultaneously for areas with either high or low rates  The latter should be used rather than running two  separate tests for high and low rates respectively  in order to make correct statistical inference  The most  common analysis is to scan for areas with high rates  that is  for clusters     Non Compactness Penalty Function    When the elliptic window shape is used  there is an option to use a non compactness  eccentricity   penalty to favor more compact clusters   The main reason for this is that the elliptic scan statistic will  under the null hypothesis typically generate an elliptic most likely cluster since there are more elliptic  than circular cluster evaluated  At the same time  the concept of clus
85. ional variables that you selected have been matched  click on the  Execute button to import the file  This will create a temporary file in SaTScan ASCII file format     5  If the input file has headings that are exactly the same as the SaTScan variable names  you can  click on the Auto Align button to match these automatically     When importing the case file  the variables to match varies depending on the probability model used  By  selecting the probability model at the top of the w the import wizard will only display the variables  relevant to that model     Step 4  Saving the Imported File    The imported file  which is in SaTScan ASCII file format  must be saved at least temporarily  The default  is to save it to the TEMP directory and after the analysis is completed you may erase the file  You can  also save it to some other directory of your choice and use it for future analyses without having to  recreate it by using the Import Wizard again     Related Topics  Input Tab  Case File  Control File  Population File  Coordinates File  Grid File  Max  Circle Size File  Adjustments File        SaTScan ASCII File Format    As an alternative to using the SaTScan Import Wizard  is it also possible to directly write the name of the  input files in the text fields provided on the Input Tab  or to browse the file directories for the desired  input files using the button to the right of that box  The files must then be in SaTScan file format  which  are space delimited ASCII files 
86. istical Regression Software    SaTScan cannot in itself do an adjustment for continuous covariates  Such adjustments can still be done  for the Poisson model   5  but it is a little more complex  The first step is to calculate the covariate  adjusted expected number of cases for each location ID and time using a standard statistical regression  software package like SAS  These expected numbers should then replace the raw population numbers in  the population file  while not including the covariates themselves     The use of external regression software is also an excellent way to adjust for covariates in the  exponential model     The first step is to fit an exponential regression model without any spatial  information  in order to obtain risk estimates for each of the covariates  The second step is to adjust the  survival and censoring time up or down for each individual based on the risk estimates his or her  covariates     For the normal model  covariates can be adjusted for by first performing a linear regression model using  standard statistical software  and then replacing the observed value with their residuals     Related Topics  Covariate Adjustments  Covariate Adjustment Using the Input Files  Covariate  Adjustment Using Multiple Data Sets  Exponential Model  Methodological Papers  Poisson Model   Population File     Covariate Adjustment Using Multiple Data Sets    It is also possible to adjust for categorical covariates using multiple data sets     The cases and  c
87. istics  Theory and Methods  26 1481   1496  1997   online     14  Kulldorff M  Feuer EJ  Miller BA  Freedman LS  Breast cancer in northeastern United States  A  geographical analysis  American Journal of Epidemiology  146 161 170  1997   online     15  Kleinman K  Abrams A  Kulldorff M  Platt R  A model adjusted space time scan statistic with an  application to syndromic surveillance  Epidemiology and Infection  2005  133 409 419     16  Klassen A  Kulldorff M  Curriero F  Geographical clustering of prostate cancer grade and stage at  diagnosis  before and after adjustment for risk factors  International Journal of Health Geographics   2005  4 1   online     17  Huang L  Kulldorff M  Gregorio D  A spatial scan statistic for survival data  Biometrics  2006  in  press   online        Adjusting for More Likely Clusters  18  Zhang Z  Kulldorff M  Assun    o R  Spatial scan statistics adjusted for multiple clusters  Manuscript    under review     Computational Aspects    Algorithms    19  Kulldorff M  Spatial scan statistics  Models  calculations and applications  In Balakrishnan and Glaz   eds   Recent Advances on Scan Statistics and Applications  Boston  USA  Birkhauser  1999    online     Random Number Generator    20  Lehmer DH  Mathematical methods in large scale computing units  In Proceedings of the second  symposium on large scale digital computing machinery  Cambridge  USA  Harvard Univ  Press   1951     2   Park SK  Miller KW  Random number generators  Good ones are hard
88. it is possible to define the scanning window as any time period that start within a predefined    start  range    and ends within a predefined    end range        This option is only available when a retrospective purely temporal or a retrospective space time analysis  is selected on the Analysis Tab     Related Topics  Temporal Window Tab  Maximum Temporal Cluster Size  Include Purely Spatial  Clusters  Study Period  Time Aggregation     SaTScan User Guide v7 0 49    Spatial and Temporal Adjustments Tab    Advanced Analysis Features   Spatial Window   Temporal Window Space and Time Adjustments   Inference   Temporal Adjustment Close    None   C Nonparametric  with time stratified randomization _Set Defauis          Log linear trend with  5 X per year          Log linear with automatically calculated trend         Spatial Adjustments         None    C Nonparametric  with spatial stratified randomization    Temporal  Spatial and or Space Time Adjustments      Adjust for known relative risks  Adjustments File      CASaTScan User Guide 7 0 adjustments  adj      a4          Spatial and Temporal Adjustments Tab Dialog Box    Covariates are adjusted for either by including them in the case and population files or by using multiple  data sets  depending on the probability model used  The features on this tab are used to adjust for  temporal  spatial and space time trends and variation  They are only available when using the Poisson  probability model     Related Topics  Advanced Fea
89. justed with a relative risk of 2 2 4     SaTScan User Guide v7 0 31    Related Topics  Adjustments with Known Relative Risk  Missing Data  Spatial and Temporal  Adjustments Tab  SaTScan Import Wizard  SaTScan ASCII File Format     SaTScan Import Wizard    The SaTScan Import Wizard can be used to import dBase  comma delimited  or space delimited files  It  works for all import files except the optional Neighbors File  Launch the Import Wizard by clicking on  the File Import 49 button to the right of the text field for the file that you want to import  Use the Next  and Previous buttons to navigate between the dialogs  Follow the steps below to import files     Step 1     Selecting the Source File    1     3     At the bottom of the Select Source File dialog  select the file type extension you are looking for   If you are unsure  select the All Files option     Browse the folders and highlight the file you want to open  It will appear in the File Name text  field     Click on Open  The SaTScan Import Wizard will now appear     Step 2  Specifying the File Format    If you are importing a dBase file  this step is automatically skipped  For all other source files  you need to  specify the file structure using the File Format dialog box     1     5     First specify whether you have a character delimited or fixed column file format  using the radio  buttons under the Source File Type heading     If there are extraneous lines in the beginning of the file  type the number lines that 
90. lthough it could be used for other continuous  type data as well  Each observation is a case  and each case has one continuous variable attribute as well  as a 0 1 censoring designation  For survival data  the continuous variable is the time between diagnosis  and death or depending on the application  between two other types of events  If some of the data is  censored  due to loss of follow up  the continuous variable is then instead the time between diagnosis and  time of censoring  The 0 1 censoring variable is used to distinguish between censored and non censored  observations     SaTScan User Guide v7 0 11    Example  For the exponential model  the data may consist of everyone diagnosed with prostate cancer  during a ten year period  with information about either the length of time from diagnosis until death or  from diagnosis until a time of censoring after which survival is unknown     When using the temporal or space time exponential model for survival times  it is important to realize  that there are two very different time variables involved  The first is the time the case was diagnosed  and  that is the time that the temporal and space time scanning window is scanning over  The second is the  survival time  that is  time between diagnosis and death or for censored data the time between diagnosis  and censoring  This is an attribute of each case  and there is no scanning done over this variable  Rather   we are interested in whether the scanning window includes exce
91. mal models  When there are multiple data sets  the maximum is  defined as a percentage of the combined total population cases in all data sets     It is also possible to specify the maximum circle size in terms of actual geographical size rather than  population  If latitude longitude coordinates are used  then the maximum radius should be specified in  kilometers  If Cartesian coordinates are used  the maximum radius should be specified in the same units  as the Cartesian coordinates     Alternatively  for either probability model  it is possible to specify a max circle size file to define the  maximum circle size  This file must contain a    population    for each location  and the maximum circle  size is then defined as a percentage of this population rather than the regular one  This feature may be  used when  for example  you want to define the circles in the Bernoulli or space time population models  based on the actual population rather than the locations of cases and controls  It may also be used if you  want the geographical circles to include for example at most 10 counties out of a total of 100   irrespectively of the population in those counties  This is accomplished by assigning a    population    of 1  to each county in the special max circle size file and then set the maximum circle size to be 10  of this     population        If a prospective space time analysis is performed  adjusting for earlier analyses  and if the max circle size  is defined as a percentag
92. means that under the null hypothesis of complete spatial randomness there  is a 5  chance that the p value for the most likely cluster will be smaller than 0 05 and a 95   chance that it will be bigger  Under the null hypothesis there will always be some area with a rate  higher than expected just by chance alone  Hence  even though the most likely cluster always has an  excess rate when scanning for areas with high rates  the p value may actually be very close or  identical to one     Recurrence Interval  For prospective analyses  the recurrence interval   or  null occurrence rate   is shown as an alternative to the p value  The measure reflects how often a cluster of the observed or  larger likelihood will be observed by chance  assuming that analyses are repeated on a regular basis  with a periodicity equal to the specified time interval length  For example  if the observed p value is  used as the cut off for a signal and if the recurrence interval is once in 14 months  than the expected  number of false signals in any 14 month period is one     If no adjustments are made for earlier analysis  then the recurrence interval is once in D p days   where D is the number of days in each time interval  If adjustments are made for a number of earlier  analyses  then the recurrence interval is once every D     1      1 p      days     SECONDARY CLUSTERS  Summary information about other clusters detected in the data  The  information provided is the same as for the most likely clust
93. memory for the 32 bit windows version of SaTScan   The Linux version of SaTScan can be used to analyze larger data sets     Related Topics  Coordinates File  Grid File  Spatial Temporal and Space Time Scan Statistics  Spatial    Window Tab  Temporal Window Tab  Monte Carlo Replications  Multiple Data Sets Tab  Warnings and  Errors     SaTScan User Guide v7 0 64    Results of Analysis    As output  SaTScan creates one standard text based results file in ASCII format and up to five different  optional output files in column format  that can be generated in either ASCII or dBase format  Some of  the optional files are useful when exporting output from SaTScan into other software such as a  spreadsheet or a geographical information system     Related Topics  Output Tab  Clusters Reported Tab  Standard Results File  Cluster Information File   Location Information File  Risk Estimates for Each Location  Simulated Log Likelihood Ratios  Analysis  History File        Standard Results File    out       The standard results file is automatically shown after the calculations are completed  It is fairly self   explanatory  but for proper interpretation it is recommended to read either the section on the statistical  method  or even better  one of the methodological papers listed in the bibliography     SUMMARY OF DATA  Use this to check that the input data files contain the correct number of  cases  locations  etc     Total population  Poisson model   This is the average population durin
94. mes in between those dates  SaTScan will estimate  the population through linear interpolation  If all population counts have the same date  the population is  assumed to be constant over time     Multiple Data Sets  It is possible to specify multiple case files  each representing a different data set   with information about different diseases or about men versus women respectively  For the Bernoulli  model  each case file must be accompanied with its own control file  and for the Poisson model  each  case file must be accompanied with its own population file  The maximum number of data sets that  SaTScan can analyze is twelve     Covariate Adjustments  With the Poisson and space time permutation models  it is possible to adjust  for multiple categorical covariates by including them in the case and population files  For the Bernoulli   ordinal or exponential models  covariates can be adjusted for using multiple data sets     Related Topics   nput Tab  Multiple Data Sets Tab  Case File  Control File  Population File     Coordinates File  Grid File  SaTScan Import Wizard  SaTScan ASCII File Format  Covariate  Adjustments     SaTScan User Guide v7 0 26    Case File    The case file provides information about cases  It should contain the following information     location id  Any numerical value or string of characters  Empty spaces may not form part of the id    cases  The number of cases for the specified location  time and covariates     time  Optional  May be specified in ye
95. morrhagic fever with renal syndrome in China   BMC Infectious Diseases  6 77  2006   online     Cancer    54     55     56     57     58     59     Hjalmars U  Kulldorff M  Gustafsson G  Nagarwalla N  Childhood leukemia in Sweden  Using GIS  and a spatial scan statistic for cluster detection  Statistics in Medicine  15 707 715  1996     Kulldorff M  Feuer EJ  Miller BA  Freedman LS  Breast cancer in northeastern United States  A  geographical analysis  American Journal of Epidemiology  146 161 170  1997   online     Imai J  Spatial disease clustering in Kochi prefecture in Japan  National Institute of Public Health  Epidemiology and Biostatistics Research  57 96  1998  in Japanese      VanEenwyk J  Bensley L  McBride D  Hoskins R  Solet D  McKeeman Brown A  Topiwala H   Richter A  Clark R  Addressing community health concerns around SeaTac Airport  Second Report   Washington State Department of Health  1999   online     Hjalmars U  Kulldorff M  Wahlquist Y  Lannering B  Increased incidence rates but no space time  clustering of childhood malignant brain tumors in Sweden  Cancer  85 2077 2090  1999     Viel JF  Arveux P  Baverel J  Cahn JY  Soft tissue sarcoma and non Hodgkin   s lymphoma clusters  around a municipal solid waste incinerator with high dioxin emission levels  American Journal of  Epidemiology  152 13 19  2000     SaTScan User Guide v7 0 83    60     61     62     63     64     65     66     67     68     69     70     71     72     73     74     Sheehan TJ  Gers
96. n of the hypothesized  cluster  For example  a cluster around a toxic waste site in one country may spur an investigation about  clusters around a similar toxic waste site in another country  The spatial scan statistic or other cluster  detection tests should then not be used  as they will have low power due to the evaluation of all possible  locations even though the hypothesized location is already known  Examples of focused tests are Stone s  Test 9   Lawson Waller   s Score Test          and Bithell   s Test       Focused tests should never be used when the foci were defined using the data itself  This would lead to  pre selection bias and the resulting p values would be incorrect  It is then better to use the spatial scan  statistic  If on the other hand  the point source was defined without looking at the data  than it is better to  use the focused test rather than the spatial scan statistic  as the former will have higher power as it  focuses on the location of interest     In addition to various scan statistics  the SaTScan software can also be used to do a focused test in order  to evaluate whether there is a disease cluster around a pre determined focus  ref  2  p809   This is done  by using a grid file with only a single grid point reflecting the coordinates of the focus of interest     Global Clustering Tests    Most proposed tests for spatial clustering are tests for global clustering  These include among many  others the methods proposed by Alt and Vach 5  Besag
97. nates file  The Poisson model also requires a population file while the Bernoulli model requires a  control file     Optional Files  One may also specify an optional special grid file that contains geographical coordinates  of the centroids defining the circles used by the scan statistic  If such a file is not specified  the  coordinates in the coordinate file will be used for that purpose  As part of the advanced features  there is  also an optional max circle size file and an optional adjustments file     File Format  The data input files must be in SaTScan ASCII file format or you may use the SaTScan  import wizard for dBase  comma delimited or space delimited files  Using such files  the wizard will  automatically generate SaTScan file format files  Both options are described below     Spatial Resolution  Separate data locations may be specified for individuals or data may be aggregated  for states  provinces  counties  parishes  census tracts  postal code areas  school districts  households  etc     Temporal Information  To do a temporal or a space time analysis  it is necessary to have a time related  to each case  and if the Bernoulli model is used  for each control as well  This time can be specified as a  day  month or year  When the Poisson model is used the background denominator population is assumed  to exist continuously over time  although not necessarily at a constant level  The population file requires  a date to be specified for each population count  For ti
98. nce we do not know the size of a cluster a  priori and since the population at risk is geographically inhomogeneous  Under the null hypothesis of  equal disease risk one expects to see more disease cases in a city compared to a similar sized area in the  country side  just because of the higher population density in the city  The scan statistics in the SaTScan  software were developed to resolve these two problems  Since no analytical solutions have been found to  obtain the probabilities under these more complex settings  Monte Carlo hypothesis testing is instead  used to obtain the p values          Spatial and Space Time Clustering    Descriptive Cluster Detection Methods    In 1987  Openshaw et al     developed a Geographical Analysis Machine  GAM  that uses overlapping  circles of different sizes in the same way as the spatial scan statistic  except that the circle size does not  vary continuously  With the GAM  a separate significance test is made for each circle  leading to multiple  testing  and in almost any data set there will be a multitude of    significant clusters    when defined in this  way  This is because under the null hypothesis  each circle has a 0 05 probability of being  significant  at  the 0 05 level  and with 20 000 circles we would expect 1 000    significant    clusters under the null   hypothesis of no clusters  GAM is hence very useful for descriptive purposes  but should not be used for  hypothesis testing     Another nice method for descriptive
99. nced Input Features    Multiple Data Sets   Data Checking Non Eucledian Neighbors      Special Neighbor File      Specify geographical neighbors through a user defined file       l       Non Euclidean Neighbors Tab Dialog Box    Rather than using circles or ellipses defined by the Euclidean distances between the locations specified in  the coordinates and grid files  it is possible to manually specify a neighborhood matrix  For each  centroid  its closest  2  closest  3  closest neighbors are specified in turn and so on  This option is  activated by checking the box on this tab and specifying the name of the neighbors file containing the    SaTScan User Guide v7 0 45    neighbor matrix information  The format of the neighbors file is described in the ASCII File format  section     Related Topics  Advanced Features  Input Tab  Neighbors File  ASCII File Format     Spatial Window Tab    Advanced Analysis Features    Spatial Window   Temporal Window   Space and Time Adjustments   Inference     gt  Maximum Spatial Cluster Size      ercent of the population at risk   lt   50   default   50   RO E e Set Defaults      Iv  50 percent of the population defined in the max circle size file   lt   502z     E s     WV isacircle with a  1 kilometer radius      Include Purely Temporal Clusters  Spatial Size   100    Spatial Window Shape        Circular       Spatial Window Tab Dialog Box    Use the Spatial Window Tab to define the exact nature of the scanning window with respect to  space  
100. nd the maximum of the two is used to represent the log likelihood  ratio for that window     Related Topics  Multiple Data Sets Tab  Covariate Adjustment  Covariate Adjustment Using the Input  Files  Covariate Adjustment using Statistical Regression Software  Methodological Papers  Bernoulli  Model     Spatial and Temporal Adjustments    Adjusting for Temporal Trends    If there is an increasing temporal trend in the data  then the temporal and space time scan statistics will  pick up that trend by assigning a cluster during the end of the study period  If there is a decreasing trend   it will instead pick up a cluster at the beginning of the time period  Sometimes it is of interest to test  whether there are temporal and or space time clusters after adjusting for a temporal trend     For the space time permutation model  the analysis is automatically adjusted for both temporal trends and  temporal clusters  and no further adjustments are needed  For the Poisson model  the user can specify  whether a temporal adjustment should be made  and if so  whether to adjust with a percent change or  non parametrically     Sometimes  the best way to adjust for a temporal trend is by specifying the percent yearly increase or  decrease in the rate that is to be adjusted for  This is a log linear adjustment  Depending on the  application  one may adjust either for a trend that SaTScan estimates from the data being analyzed  or  from the trend as estimated from national or other similar dat
101. ne rabies vaccine associated adverse events using a very  large veterinary practice database  Vaccine  epub ahead of print  2005     Gosselin PL  Lebel G  Rivest S  Fradet MD  The Integrated System for Public Health Monitoring of  West Nile Virus  ISPHM WNV   a real time GIS for surveillance and decision making   International Journal of Health Geographics  4 21  2005   online     Gaudart J  Poudiougou B  Ranque S  Doumbo O  Oblique decision trees for spatial pattern  detection  optimal algorithm and application to malaria risk  BMC Medical Research Methodology   5 22  2005   online     Nisha V  Gad SS  Selvapandian D  Suganya V  Rajagopal V  Suganti P  Balraj V  Devasundaram J   Geographical information system  GIS  in investigation of an outbreak  of dengue fever   Journal of  Communicable Diseases  37 39 43  2005     Jones RC  Liberatore M  Fernandez JR Gerber SI  Use of a prospective space time scan statistic to  prioritize shigellosis case investigations in an urban jurisdiction  Public Health Reports  121 133 9   2006     Pearl DL  Louie M  Chui L  Dore K  Grimsrud KM  Leedell D  Martin SW  Michel P  Svenson LW   McEwen SA  The use of outbreak information in the interpretation of clustering of reported cases of  Escherichia coli O157 in space and time in Alberta  Canada  2000 2002  Epidemiology and  Infection  pud ahead of print  2006     Fang L  Yan L  Liang S  de Vlas SJ  Feng D  Han X  Zhao W  Xu B  Bian L  Yang H  Gong P   Richardus JH  Cao W  Spatial analysis of he
102. ntensive than the analysis of a single  data set  Except for the ordinal model  the computing time for two data sets is much more than twice the  time for a single data set  The computing time for s gt 2 data sets is approximately s 2 times longer than the  computing time for two data sets     Related Topics  Coordinates File  Grid File  Spatial Window Tab  Temporal Window Tab  Monte Carlo  Replications  Early Termination of Simulations  Multiple Data Sets Tab     Memory Requirements    SaTScan uses dynamic memory allocation  Depending on the nature of the input data  SaTScan will  automatically choose one of two memory allocation schemes  the standard one and a special one for data  sets with very many spatial locations but few time intervals and few simulations     Standard Memory Allocation    Using the standard memory allocation scheme  the amount of memory needed for large data sets is  approximately     2xLxGxmg  bo 4x CONT x P xLxTIx CATxD 8xCxRxP bytes if L lt 65 536     and    4xLxGxmg  b 4x CONT x P  x Lx TIx CATxD 8xCxRxP bytes if L265 536    where    L   the number of location IDs in the coordinates file  G   the number of coordinates in the grid file  G L if no grid file is specified     mg   maximum geographical cluster size  as a proportion of the population   0  lt  mg      mg 1 fora  purely temporal analysis     TI   number of time intervals into which the temporal data is aggregated  TI 1 for a purely spatial  analysis    b   12 for the Poisson  space time p
103. o  reflect a population size at risk rather than an actual number of people     covariates  Optional  Any number of categorical covariates may be specified  each represented by a  different column separated by empty spaces  May be specified numerically or through characters   The covariates must be the same as in the case file     Example  If age and sex are the covariates included  with 18 different age groups  then there should  be 18x2 36 rows for each year and census area  With 3 different census years  and 32 census areas   the file will have a total of 3456 rows and 5 columns     Note  Multiple lines may be used for different population groups with the same location  time and  covariate attributes  SaTScan will automatically add them     Note  For a purely temporal analysis with the Poisson model  it is not necessary to specify a population  file if the population is constant over time     Related Topics   nput Tab  Population File Name  Multiple Data Sets Tab  Covariate Adjustment Using  Input Files  Max Circle Size File  SaTScan Import Wizard  SaTScan ASCII File Format     Coordinates File    The coordinates file provides the geographic coordinates for each location ID  Each line of the file  represents one geographical location  Area based information may be aggregated and represented by one  single geographical point location  Coordinates may be specified either using the standard Cartesian  coordinate system or in latitude and longitude  If two different location I
104. o Test  Analysis Tab  Probability Model Comparison  Methodological  Papers     Space Time Permutation Model    The space time permutation model  requires only case data  with information about the spatial location  and time for each case  The number of observed cases in a cluster is compared to what would have been  expected if the spatial and temporal locations of all cases were independent of each other so that there is  no space time interaction  Therefore  we get a cluster in a geographical area if  during a specific time  period  that area has a high proportion of excess cases or a smaller deficiency of cases than surrounding  areas  This means that if  during a specific week  all geographical areas have twice the number of cases  than normal  none of these areas constitute a cluster  On the other hand  if one geographical area has  twice the number of cases while other areas have a normal amount of cases  then there will be a cluster in  that first area  The space time permutation model automatically adjusts for both purely spatial and purely  temporal clusters  Hence there are no purely temporal or purely spatial versions of this model     Example  In the space time permutation model  cases may be daily occurrences of ambulance dispatches  to stroke patients     It is important to realize that space time permutation clusters may be due either to an increased risk of  disease  or to different geographical population distribution at different times  where for example the
105. on Input data     Related Topics  Spatial and Temporal Adjustment Tab  Spatial and Temporal Adjustments  Temporal  Trend Adjustment  Spatial Adjustment  Adjustments File  Poisson Model     SaTScan User Guide v7 0 51    Inference Tab    Advanced Analysis Features  Spatial Window   Temporal Window   Space and Time Adjustments Inference    Early Termination    jw Terminate the analysis early for large p values  Set Defaults    Prospective Surveillance    r    Critical Values    Iterative Scan    r       Inference Tab Dialog Box  This tab is reached by clicking the Advanced button in the lower right corner of the Analysis Tab     Related Topics  Advanced Features  Analysis Tab  Early Termination of Simulations  Adjust for Earlier  Analyses in Prospective Surveillance     Early Termination of Simulations    With more Monte Carlo replications  the power of the scan statistic is higher  but it is also more time  consuming to run  When the p value is small  this is often worth the effort  but for large p values it is  often irrelevant whether for example p 0 7535 or p 0 8545  SaTScan provides the option to terminate  the simulations early when the p value is large  With this option  SaTScan will terminate after 99  simulations when p gt 0 5 at that time  after 199 simulations when p gt 0 4  after 499 simulations when p gt 0 2  and after 999 simulations when p gt 0 1  If it passes all of these without terminating early  it will run the  full length with the number of Monte Carlo repli
106. one  by generating simulated data from the normal distribution  but rather  by permuting the space time  locations and the continuous attribute  e g  birth weight  of the observations  While still being formally  valid  the results can be greatly influenced by extreme outliers though  so it may be wise to truncate such  observations before doing the analysis     Note  If all values are multiplied with or added to the same constant  the statistical inference will not  change  meaning that the same clusters with the same log likelihoods and p values will be found  Only    the estimated means and variances will differ     Related Topics  Likelihood Ratio Test  Analysis Tab  Probability Model Comparison  Methodological  Papers     SaTScan User Guide v7 0 12    Probability Model Comparison    For count data  there are three different probability models available in SaTScan  Poisson  Bernoulli and  space time permutation  The ordinal model is designed for categorical data with an inherent ordering  from for example low to high  There are two models for continuous data  Normal and Exponential  The  latter is primarily designed for survival type data     The Poisson model is usually the fastest to run  The ordinal model is typically the slowest     With the Poisson and space time permutations models  an unlimited number of covariates can be  adjusted for  by including them in the case and population files  With the Bernoulli  ordinal  exponential  and normal models  covariates can be
107. ontrols population are then divided into categories  and a separate data set is used for each category   This type of covariate adjustment is computationally much slower than the one using the input files  and  is not recommended for large data sets  One advantage is that it can be used to adjust the ordinal model  for covariates  for which other adjustment procedures are unavailable  A disadvantage is that since the    SaTScan User Guide v7 0 19    maximum number of data sets allowed by SaTScan is twelve  the maximum number of covariate  categories is also twelve     The adjustment approach to multiple data sets is as follows  when searching for clusters with high rates    1  For each window location and size  the log likelihood ratio is calculated for each data set     2  The log likelihood ratio for all data sets with less than expected number of cases in the window is  multiplied with negative one     3  The log likelihood ratios are then summed up  and this sum is the combined log likelihood for  that particular window     4  The maximum of all the combined log likelihood ratios  taken over all the window locations and  sizes  constitutes the most likely cluster  and this is evaluated in the same way as for a single data  set     When searching for clusters with low rates  the same procedure is performed  except that it is then the  data sets with more than expected cases that we multiply by one  When searching for both high and low  clusters  both sums are calculated  a
108. order of   Lx SG x mg x TIS x mt x MC P    where     L   number of geographical data locations in the coordinates file  L 1 for purely temporal analyses   SG   number of geographical coordinates in the special grid file  If there is no such file  SG L     mg   maximum geographical cluster size  as a proportion of the population  0  lt  mg      mg 1 fora  purely temporal analysis     TI   number of time intervals into which the temporal data is aggregated  TI 1 for a purely spatial  analysis     mt   maximum temporal cluster size  as a proportion of the study period   0    mt   0 9  mt 1 for  purely spatial analysis     MC z number of Monte Carlo replications  P   number of processors available on the computer for SaTScan use  k   0 for a purely spatial analysis    k   1 for prospective temporal and prospective space time analyses without adjustments for earlier  analyses    k   2 for retrospective temporal and retrospective space time analyses    The unit of the above formula depends on the probability model used and on the speed of the computer   When the total number of cases is very large compared to the number of locations and time intervals  the  computing time is instead on the order of     CxMC P    Where     C   the total number of cases  MC z number of Monte Carlo replications    P   number of processors available on the computer for SaTScan use    SaTScan User Guide v7 0 61    Multiple Data Sets    An analysis using multiple data sets is considerably more computer i
109. ormation     SaTScan User Guide v7 0 30    location id  Any numerical value or string of characters  Empty spaces may not form part of the id        population     Any non negative number     The name of the special max circle size file is specified on the Analysis Tab Advanced Features  Spatial Window Tab     Note  If a location ID is missing from this file  the population is assumed to be zero  If a location ID  occurs more than once  the population numbers will be added     Related Topics  Input Tab  Population File  Spatial Window Tab  SaTScan Import Wizard  SaTScan  ASCII File Format        Adjustments File    The adjustments file can be used to adjust a Poisson model analysis for any temporal  spatial and space   time anomalies in the data  with a known relative risk  It can for example be used to adjust for missing or  partially missing data   Note  Covariates are adjusted for by using the case and population files or by  analyzing multiple data sets  not with this file   The adjustments file should contain one or more lines for  each location for which adjustments are warranted  with the following information     location id  Any numerical value or string of characters  Empty spaces may not form part of the id   Alternatively  it is possible to specify    All     in which all location will be adjusted with the same relative  risk     relative risk  Any non negative number  The relative risk representing how much more common disease  is in this location and time period
110. ou do not want to see the warning messages  they can be turned off by clicking    Session  gt  Execute  Options  gt  Do not report warning messages        Error Messages    If a serious problem occurs during the run  an error message will be displayed in the Warnings Errors box  on the bottom of the job status window and the job will be terminated  The user may resolve most errors  by reviewing the message and using the help system     SaTScan User Guide v7 0 58    If the error message cannot be resolved  you may press the email button on the job status window  This  will generate an automatic email message to SaTScan technical support  The contents of the     Warnings Errors    box will be automatically placed in the e mail message  All a user needs to do is press  their e mail Send key  Users may also print the contents of the Warnings Errors box and even select   copy  ctrl c  and paste  ctrl v  the contents if necessary     One of the most common errors is that the input files are not in the required format  or that the file  contents are incompatible with each other  When this occurs  an error message will be shown specifying    the nature and location of the problem  Such error messages are designed to help with data cleaning     Related Topics   nput Data  Data Requirements  SaTScan Support     Saving Analysis Parameters    Analysis parameters  specified on the Parameter tab dialog  can be saved and reused for future analyses   It is recommended that you save the param
111. ournal  97 16 18  2004     Andrade AL  Silva SA  Martelli CM  Oliveira RM  Morais Neto OL  Siqueira Junior JB  Melo LK   Di Fabio JL  Population based surveillance of pediatric pneumonia  use of spatial analysis in an  urban area of Central Brazil  Cadernos de Sa  de P  blica  20  411 421  2004   online     Ozdenerol E  Williams BL  Kang SY  Magsumbol MS  Comparison of spatial scan statistic and  spatial filtering in estimating low birth weight clusters  International Journal of Health Geographics   4 19  2005   online        Viel JF  Floret N  Mauny F  Spatial and space time scan statistics to detect low clusters of sex ratio   Environmental and Ecological Statistics  12 289 299  2005     Ali M  Asefaw T  Byass P  Beyene H  Karup Pedersen F  Helping northern Ethiopian communities  reduce childhood mortality  population based intervention trial  Bulletin of the World Health  Organization  83 27 33  2005   online     SaTScan User Guide v7 0 86    Geriatrics    100  Yiannakoulias N  Rowe BH  Svenson LW  Schopflocher DP  Kelly K  Voaklander DC  Zones of  prevention  the geography of fall injuries in the elderly  Social Science and Medicine  57 2065 73   2003     Parasitology    101  Enemark HL  Ahrens P  Juel CD  Petersen E  Petersen RF  Andersen JS  Lind P  Thamsborg SM   Molecular characterization of Danish Cryptosporidium parvum isolates  Parasitology  125 331 341   2002     102  Washington CH  Radday J  Streit TG  Boyd HA  Beach MJ  Addiss DG  Lovince R  Lovegrove MC   Lafon
112. ower     In SaTScan  the number of replications must be at least 999 to ensure excellent power for all types of  data sets  For small to medium size data sets  9999 replications are recommended since computing time is    not a major issue     Related Topics  Analysis Tab  Likelihood Ratio Test  Computational Speed  Random Number  Generator     SaTScan User Guide v7 0 40    Output Tab    Input   Analysis Output      Results File      ess aTScanM ser Guide 7 0 results  txt         ptional Output Files     mE  ia  iu  e  m    Cluster Information    Cluster Case Information    r  rc  lv    Location Information    Risk Estimates for Each Location    H i    Simulated Log Likelihood Ratios T est Statistics    Advanced              Output Tab Dialog Box  Use the Output Tab is used to set parameters defining the output information provided by SaTScan     Related Topics  Results of Analysis  Standard Results File  Results File Name  Additional Output Files   Clusters Reported Tab     Results File Name    Specify the output file name to which the results of the analysis are to be written  This is the standard  results file  automatically shown after the completion of the calculations  Four optional output files may  also be created  but must be opened manually by the user     Warning  If you specify the name of a file that already exits  the old file will be overwritten and lost     Related Topics  Output Tab  Additional Output Files  Standard Results File     SaTScan User Guide v7 0 41
113. parameters  additional options are warranted for some types of analyses  and these  are available as advanced features  These features are reached through the Advanced button on the lower  right corner of each of the three main tabs     Advanced    should be interpreted as    additional    or     uncommon    rather than    complex        difficult    or    better        Since many of the advanced options depend on the selections made on the Input and Analysis Tabs  it is  recommended that those two tabs be filled in first     Related Topics  Basic SaTScan Features  Multiple Data Sets Tab  Spatial Window Tab  Temporal  Window Tab  Spatial and Temporal Adjustments Tab  Inference Tab  Clusters Reported Tab     Multiple Data Sets Tab    Advanced Input Features    Multiple Data Sets   Data Checking   Non Eucledian Neighbors      Additional Input Data Sets  Case File     za  3d    Control File   Bernoulli Madel     Population File   Poisson Model       Sd       Multiple Data Sets Tab Dialog Box    It is possible to seach and evaluate clusters in multiple data sets  as described in the Statistical  Methodology section  The first data set is defined on the Input Tab  Up to eleven additional data sets can  be defined on the Multiple Data Sets Tab  These files must be of the same class as the first one  That is   if the first data set consists of a case and a control file  so must all the others as well  The time precision  and study period must also be the same as on the Input Tab    
114. ptionally many cases with a small or  large value of this attribute     It is important to note  that while the exponential model uses a likelihood function based on the  exponential distribution  the true survival time distribution must not be exponential and the statistical  inference  p value  is valid for other survival time distributions as well  The reason for this is that the  randomization is not done by generating observations from the exponential distribution  but rather  by  permuting the space time locations and the survival time censoring attributes of the observations     Related Topics  Likelihood Ratio Test  Analysis Tab  Probability Model Comparison  Methodological  Papers     Normal Model    The normal model is designed for continuous data  For each individual  called a case  there is a single  continuous attribute that may be either negative or positive  The model can also be used for ordinal data  when there are very many categories  That is  ties are allowed     Example  For the normal model  the data may consist of the birth weight and residential census tract for  all newborns  with an interest in finding clusters with lower birth weight     It is important to note that while the normal model uses a likelihood function based on the normal  distribution  the true distribution of the continuous attribute must not be normal  The statistical inference   p value  is valid for any continuous distribution  The reason for this is that the randomization is not d
115. r a single retrospective analysis  using historic data  or  for time periodic prospective surveillance  where the analysis is repeated for example every day  week   month or year     Related Topics  Analysis Tab  Spatial Window Tab  Temporal Window Tab   Temporal Scan Statistic    The temporal scan statistic uses a window that moves in one dimension  time  defined in the same way as  the height of the cylinder used by the space time scan statistic  This means that it is flexible in both start  and end date  The maximum temporal length is specified on the Temporal Window Tab     Related Topics  Analysis Tab  Temporal Window Tab  Space Time Scan Statistic     Likelihood Ratio Test    For each location and size of the scanning window  the alternative hypothesis is that there is an elevated  risk within the window as compared to outside  Under the Poisson assumption  the likelihood function  for a specific window is proportional to        c C c   aa    a     E c  C     E  c   where C is the total number of cases  c is the observed number of cases within the window and E c  is the  covariate adjusted expected number of cases within the window under the null hypothesis  Note that  since the analysis is conditioned on the total number of cases observed  C E c  is the expected number of  cases outside the window  I   is an indicator function  When SaTScan is set to scan only for clusters with  high rates  I   is equal to 1 when the window has more cases than expected under the null h
116. r each location is selected  a file with a list of all data locations  and the corresponding number of observed cases  number of expected cases  the observed expected ratio  and the relative risk for each location is provided  This may be useful when examining a cluster area in    SaTScan User Guide v7 0 68    more detail  The information is purely descriptive  There is one line for each Location ID  and the  content of the five columns is as follows       Location ID gt  lt Observed Cases     Expected Cases gt  lt Observed Expected gt  lt Relative Risk      This file may be accessed using any text editor or spreadsheet program  It will have the same name as the  results file  but with the extension   rr txt or   rr dbf  and it will be located in the same directory     Related Topics  Output Tab  Results of Analysis  Standard Results File     Simulated Log Likelihood Ratios File    Ilr       The log likelihood ratio test statistics from the random data sets are not provided as part of the standard  output  If desired  they can be printed to a special file which by default has the same name as the output  file but with the extension   Ilr txt or   llr dbf  There is typically no need for this file  but it can be useful  for statistical researchers who may be interested in the distributional properties of the scan statistic under  various scenarios     Related Topics  Output Tab  Results of Analysis  Standard Results File  Monte Carlo Replications     SaTScan User Guide v7 0 69 
117. ral analysis ignores  the geographical location of cases  even when such information is provided     Purely temporal and space time data can be analyzed in either retrospective or prospective fashion  In a  retrospective analysis  the analysis is done only once for a fixed geographical region and a fixed study  period  SaTScan scans over multiple start dates and end dates  evaluating both    alive clusters     lasting  until the study period and date  as well as    historic clusters    that ceased to exist before the study period  end date  The prospective option is used for the early detection of disease outbreaks  when analyses are    SaTScan User Guide v7 0 38    repeated every day  week  month or year  Only alive clusters  clusters that reach all the way to current  time as defined by the study period end date  are then searched for     Related Topics  Spatial Temporal and Space Time Scan Statistics  Analysis Tab  Methodological  Papers  Computing Time  Spatial Window Tab  Temporal Window Tab  Time Aggregation     Probability Model    There are five different probability models that can be used  Poisson  Bernoulli  space time permutation   ordinal and exponential  For purely spatial analyses  the Poisson and Bernoulli models are good  approximations for each other in many situations  Temporal data are handled differently  so the models  differ more for temporal and space time analyses     Poisson Model  The Poisson model should be used when the background population refle
118. rgh D  Broeders G  Cloes E  Dhollander D  Op De Beeck L  Vanden  Brande J  Van Waes A  Molenberghs G  Geographical differences in cancer incidence in the  Belgian province of Limburg  European Journal of Cancer  39 2058 72  2003     Santamaria Ulloa C  Evaluaci  n de alarmas por c  ncer utilizando an  lisis espacial  una aplicaci  n  para Costa Rica  Reivista Costarricense de Salud P  blica  12 18 22  2003   online     Sheehan TJ  DeChello LM  Kulldorff M  Gregorio DI  Gershman S  Mroszezyk M  The geographic  distribution of breast cancer incidence in Massachusetts 1988 1997  adjusted for covariates   International Journal of Health Geographics  2004  3 17   online     Fang Z  Kulldorff M  Gregorio DI  Brain cancer in the United States  1986 95  A geographic  analysis  Neuro Oncology  2004  6 179 187     Hsu CE  Jacobson HE  Soto Mas F  Evaluating the disparity of female breast cancer mortality  among racial groups   a spatiotemporal analysis  International Journal of Health Geographics 3 4   2004   online     Han DW  Rogerson PA  Nie J  Bonner MR  Vena JE  Vito D  Muti P  Trevisan M  Edge SB   Freudenheim JL  Geographic clustering of residence in early life and subsequent risk of breast  cancer  United States   Cancer Causes and Control  15 921 929  2004     Campo J  Comber H  Gavin A T  All Ireland Cancer Statistics 1998 2000  Northern Ireland Cancer  Registry   National Cancer Registry  2004   online        SaTScan User Guide v7 0 84    TX     76     77     78     79     8
119. rlton M  Wymer C  Craft AW  A mark 1 analysis machine for the automated    analysis of point data sets  International Journal of Geographical Information Systems  1  335 358   1987     SaTScan User Guide v7 0 91    165  Ranta J  Pitkniemi J  Karvonen M  et al  Detection of overall space time clustering in non uniformly  distributed population  Statistics in Medicine  15 2561 2572  1996     166  Rushton G  Lolonis P  Exploratory Spatial Analysis of Birth Defect Rates in an Urban Population   Statistics in Medicine  7 717 726  1996     167  Stone RA  Investigation of excess environmental risk around putative sources  statistical problems  and a proposed test  Statistics in Medicine  7 649 660  1988     168  Tango T  A class of tests for detecting  general  and  focused  clustering of rare diseases  Statistics in  Medicine  14 2323 2334  1995     169  Tango T  A test for spatial disease clustering adjusted for multiple testing  Statistics in Medicine   19 191 204  2000     170  Turnbull B  Iwano EJ  Burnett WS  et al  Monitoring for clusters of disease  application to  Leukemia incidence in upstate New York  American Journal of Epidemiology  132 8136 8143   1990     171  Waller LA  Turnbull BW  Clark LC  Nasca P  Chronic disease surveillance and testing of clustering  of disease and exposure  Environmetrics  3 281 300  1992     172  Walter SD  A simple test for spatial pattern in regional health data  Statistics in Medicine  13 1037   1044  1994     173  Whittemore AS  Friend N 
120. running  different Monte Carlo simulations using different processors  thereby increasing the speed of the  calculations  The default is that SaTScan will use all processors that the computer has  If you want to  restrict the number  you can do that by clicking on Session    Execute Options  and selecting the  maximum number of processors that SaTScan is allowed to use     Batch Mode    SaTScan is most easily run by clicking the Execute    button at the top of the SaTScan window  after  filling out the various parameter fields in the Windows interface     An alternative approach is to skip the windows interface and launch the SaTScan calculation engine  directly by either     1  Dragging a parameter file onto the    SaTScanBatch exe    executable     2  Writing  SaTScanBatch exe   prm  in a batch file or at the command prompt  where   prm is the  name of the parameter file     Using the batch mode version  it is possible to write special software that incorporates the SaTScan  calculation engine with other applications  such as an automated daily surveillance system for the early  detection of disease outbreaks  To use SaTScan in this manner requires a reasonable amount of computer  skills and sophistication     When running SaTScan in batch mode  the parameter file may still be changed using the SaTScan  windows interface  It is also possible to change the parameter manually using any text editor or  automatically by using some other software product     When the batch mode 
121. s  Environmental and Ecological Statistics  12 301 319  2005     Tango T  Takahashi K  A flexibly shaped spatial scan statistic for detecting clusters  International  Journal of Health Geographics  4 11  2005   online     Kulldorff M  Song C  Gregorio D  Samociuk H  DeChello L  Cancer map patterns  Are they random  or not  American Journal of Preventive Medicine  30 837 49  2006   online     Duczmal L  Kulldorff M  Huang L  Evaluation of spatial scan statistics for irregular shaped clusters   Journal of Computational and Graphical Statistics  15 428 442  2006     Aamodt G  Samuelsen SO  Skrondal A  A simulation study of three methods for detecting disease  clusters  International Journal of Health Geographics  5 15  2006   online     Related Topics  SaTScan Bibliography  Selected Applications by Field of Study  Suggested Citation     Selected SaTScan Applications by Field of Study    Infectious Diseases    33     Cousens S  Smith PG  Ward H  Everington D  Knight RSG  Zeidler M  Stewart G  Smith Bathgate  EAB  Macleod MA  Mackenzie J  Will RG  Geographical distribution of variant Creutzfeldt Jakob  disease in Great Britain  1994 2000  The Lancet  357 1002 1007  2001     SaTScan User Guide v7 0 81    34     35     36     37     38     39     40     41     42     43     44     45     46     Fevre EM  Coleman PG  Odiit M  Magona JW  Welburn SC  Woolhouse MEJ  The origins of a new  Trypanosoma brucei rhodesiense sleeping sickness outbreak in eastern Uganda  The Lancet   358 625 62
122. s in the ordinal model may be a sample from a larger population or they may constitute a  complete set of observations  Ordinal data can be analyzed with the purely temporal  the purely spatial or  the space time scan statistics     Example  For the ordinal model  the data may consist of everyone diagnosed with breast cancer during a  ten year period  with three different categories representing early  medium and late stage cancer at the  time of diagnosis     The ordinal model requires information about the location of each case in each category  Separate  locations may be specified for each case  or the data may be aggregated for states  provinces  counties   parishes  census tracts  postal code areas  school districts  households  etc  with multiple cases in the  same or different categories at each data location  To do a temporal or space time analysis  it is necessary  to have a time for each case as well     With the ordinal model it is possible to search for high clusters  with an excess of cases in the high   valued categories  for low clusters with an excess of cases in the low valued categories  or  simultaneously for both types of clusters  Reversing the order of the categories has the same effect as  changing the analysis from high to low and vice versa     Related Topics  Likelihood Ratio Test  Analysis Tab  Bernoulli Model  Probability Model Comparison   Methodological Papers     Exponential Model    The exponential model    is designed for survival time data  a
123. s in the specified units   Example  If interval units are years and the length is two  then the time intervals will be two years long     Note  If the time interval length is not a fraction of the length of the whole study period  the earliest time  interval will be the remainder after the other intervals have received their proper length  Hence  the first  time interval may be shorter than the specified length     Important  For prospective space time analyses  the time interval must be equal to the length between  the time periodic analyses performed  So  if the time period analyses are performed every week  then the  time interval should be set to 7 days     Related Topics  Analysis Tab  Time Precision  Study Period  Computational Speed   Monte Carlo Replications    For hypothesis testing  the SaTScan program generates a number of random replications of the data set  under the null hypothesis  The test statistic is then calculated for each random replication as well as for  the real data set  and if the latter is among the 5 percent highest  then the test is significant at the 0 05  level  This is called Monte Carlo hypothesis testing  and was first proposed by Dwass    Irrespective of  the number of Monte Carlo replications chosen  the hypothesis test is unbiased  resulting in a correct  significance level that is neither conservative nor liberal nor an estimate  The number of replications does  affect the power of the test  with more replications giving slightly higher p
124. s several years  but not if you only  have one years worth of data     Two more crude approaches to deal with missing data in the space time permutation model is to remove  all data for a particular location if some data are missing for that location or to remove all data for a  particular time period for dates on which there is missing data in any location  The latter is especially  useful in prospective surveillance for missing data during the beginning of the study period  to avoid  removing recent data that are the most important for the early detection of disease outbreaks     SaTScan User Guide v7 0 22    Note  When there are location time combinations with missing data  either remove the whole row from  the case file or assign zero cases to that location time combination  If you only remove the number of  cases  but retain the location ID and time information  there will be a file reading error     Warning  The adjustment for missing data only works if the locations and times for which the data is  missing is independent of the number of cases in that location and time  For example  if data is missing  for all locations with less than five observed cases  the adjustment procedures described above will not  work properly     Related Topics  Adjustments File  Adjusting for Known Relative Risks  Bernoulli Model  Ordinal Model   Poisson Model  Space Time Permutation Model  Spatial and Temporal Adjustments Tab  Time  Aggregation    Multivariate Scan with Multiple Data Sets 
125. set to be identical to the coordinates of the location IDs defined in the coordinates file  The latter  option ensures that each data location is a potential cluster in itself  and it is the recommended option for  most types of analyses     As an alternative to the circle  it is also possible to use an elliptic window shape  in which case a set of  ellipses with different shapes and angles are used as the scanning window together with the circle  This  provides slightly higher power for true clusters that are long and narrow in shape  and slightly lower  power for circular and other very compact clusters     Related Topics  Analysis Tab  Coordinates File  Elliptic Scanning Window  Grid File  Maximum Spatial  Cluster Size  Spatial Window Tab     SaTScan User Guide v7 0 14    Space Time Scan Statistic    The space time scan statistic is defined by a cylindrical window with a circular  or elliptic  geographic  base and with height corresponding to time  The base is defined exactly as for the purely spatial scan  statistic  while the height reflects the time period of potential clusters  The cylindrical window is then  moved in space and time  so that for each possible geographical location and size  it also visits each  possible time period  In effect  we obtain an infinite number of overlapping cylinders of different size  and shape  jointly covering the entire study region  where each cylinder reflects a possible cluster     The space time scan statistic may be used for eithe
126. sis Tab  Output Tab  Advanced Features        Input Tab    Input   Analysis   Dutput      Case File     Control File   Bernoulli Model     Study Period     Year Month Da Year Month Da  Start Date   2000 m i End Date   2000 fiz    EN    Population File   Poisson Model     nee S    Bd  m a  Coordinates  24 Cartesian  mi a4    Latitude Longitude  Advanced           Coordinates File     Grid File   optional        Input Tab Dialog Box    The Input Tab is used to specify the names of the input data files as well as the nature of the data in these   files  If the files are in SaTScan ASCII file format  they may be specified either by writing the name in   the text box or by using the browse button EI If they are not in SaTScan ASCII file format  they must   amp    be specified using the SaTScan import wizard  by clicking on the File Import ad button  Both the   SaTScan ASCII file format and the SaTScan import wizard are described in the Input Data section     Related Topics  Basic SaTScan Features  Input Data  Multiple Data Sets Tab     SaTScan User Guide v7 0 35    Case File Name    Specify the name of the input file with case data  This file is required for all analyses  irrespectively of  the probability model used     Related Topics   nput Tab  Case File     Control File Name    Specify the name of the input file with control data  This file is only used for analyses with the Bernoulli  probability model     Related Topics  Input Tab  Control File     Time Precision    Indicate
127. so of interest to evaluate  secondary clusters after adjusting for other clusters in the data     As an advanced option  SaTScan is able to adjust the inference of secondary clusters for more likely  clusters in the data   This is done in an iterative manner  In the first iteration SaTScan runs the standard  analysis but only reports the most likely cluster  That cluster is then removed from the data set  including  all cases and controls  Bernoulli model  in the cluster while the population  Poisson model  is set to zero  for the locations and the time period defining the cluster  In a second iteration  a completely new analysis  is conducted using the remaining data  This procedure is then repeated until there are no more clusters  with a p value less than a user specified maxima or until a user specified maximum number of iterations  have been completed  whichever comes first     For purely spatial analyses it has been shown that the resulting p values for secondary clusters are quite  accurate and at most marginally biased     Related Topics  Clusters Reported Tab  Criteria for Reporting Secondary Clusters  Iterative Scan   Likelihood Ratio Test  Secondary Clusters  Standard Results File     Covariate Adjustments    A covariate should be adjusted for when all three of the following are true     e The covariate is related to the disease in question     SaTScan User Guide v7 0 17    e The covariate is not randomly distributed geographically   e You want to find clusters tha
128. stry  Aggregation  191 Postal Codes  most with only a single individual   Precision of case and control times  None  Coordinates  Cartesian    Covariates  None    Data source  Drs  Ray Cartwright and Freda Alexander  Published by J  Cuzick and R  Edwards   Journal of the Royal Statistical society  B 52 73 104  1990    Space Time Permutation Model  Hospital Emergency Room Admissions Due to Fever at  New York City Hospitals  Case file  NYCfever cas  Format   lt zip gt   lt  cases 1 gt   lt date gt   Coordinates file  NYCfever geo    Format   lt zip gt   lt latitude gt   lt longitude gt     SaTScan User Guide v7 0 6    Study period  Nov 1  2001     Nov 24  2001  Aggregation  Zip code areas   Precision of case times  Days  Coordinates  Latitude Longitude  Covariates  None    Data source  New York City Department of Health    Ordinal Model  Purely Spatial  Education Attainment Levels in Maryland  Case file  MarylandEducation cas  Format    county    lt   individuals     category   gt   Coordinates file  MarylandEducation geo  Format    county     latitude     longitude   Study period  2000  Aggregation  24 Counties and County Equivalents  Precision of case times  None  Coordinates  Latitude   Longitude    Covariates  None    Categories  1   Less than 9  grade  2  9  to 12  grade  but no high school diploma  3   High school diploma  but no bachelor degree  4   Bachelor or higher degree    Data source  United States Census Bureau  Information about education comes from the long  Cen
129. sus 2000 form  filled in by about 1 6 households     Note  Only people age 25 and above are included in the data  For each county  the census provides  information about the percent of people with different levels of formal education  The number of  individuals reporting different education levels in each county was estimated as this percentage times the  total population age 25  divided by six to reflect the 1 6 sampling fraction for the long census form   Exponential Model  Space Time   Artificially Created Survival Data   Case file  SurvivalFake cas   Format   lt location id gt   lt   individuals gt   lt time of diagnosis gt   lt survival time gt   lt censored gt    Coordinates file  SurvivalFake geo   Format   lt location id gt   lt x coordinate gt   lt y coordinate gt    Study period  2000 2005   Aggregation  5 Locations   Precision of times of diagnosis  Year    Precision of survival censoring times  Day    SaTScan User Guide v7 0 7    Coordinates  Cartesian    Covariates  None    Data source  Artificially created data     Normal Model  Purely Spatial   Artificially Created Continuous Data  Case file  NormalFake cas  Format    location id    lt   individuals     weight increase    Coordinates file  NormalFake geo  Format    location id    lt x coordinate gt   lt y coordinate gt   Study period  2006  Aggregation  26 Locations  Coordinates  Cartesian    Covariates  None    Data source  Artificially created data     Related Topics  Test Run  Input Data     SaTScan User Gui
130. t cannot be explained by that covariate   Here are three examples     e If you are studying cancer mortality in the United States  you should adjust for age since  i   older people are more likely to die from cancer  ii  some areas such as Florida have a higher  percent older people  and  iii  you are presumably interested in finding areas where the risk  of cancer is high as opposed to areas with an older population     e  f you are interested in the geographical distribution of birth defects  you can but do not need  to adjust for gender  While birth defects are not equally likely in boys and girls  the  geographical distribution of the two genders is geographically random at time of birth     e If you are studying the geography of lung cancer incidence  you should adjust for smoking if  you are interested in finding clusters due to non smoking related risk factors  but you should  not adjust for smoking if you are interested in finding clusters reflecting areas with especially  urgent needs to launch an anti smoking campaign     When the disease rate varies  for example  with age  and the age distribution varies in different areas   then there is geographical clustering of the disease simply due to the age covariate  When adjusting for  categorical covariates  the SaTScan program will search for clusters above and beyond that which is  expected due to these covariates  When more than one covariate is specified  each one is adjusted for as  well as all the interaction terms
131. t different times and they are described in different  scientific publications  The following bibliography contains selected papers and reports intended to help  you find information on the following     1  Find the methodological paper s  in which the various analysis options are presented and  discussed in more detail than what is available here in the SaTScan User Guide     2  Find applications in different scientific areas     3  Determine the relevant scientific papers to cite        Suggested Citations    The SaTScan software may be used freely  with the requirement that proper references are provided to  the scientific papers describing the statistical methods  For the most common analyses  the suggested  citations are     Bernoulli and Poisson Models  Kulldorff M  A spatial scan statistic  Communications in Statistics   Theory and Methods  26 1481 1496  1997   online     Space Time Permutation Model  Kulldorff M  Heffernan R  Hartman J  Assun    o RM  Mostashari F  A  space time permutation scan statistic for the early detection of disease outbreaks  PLoS Medicine  2 216   224  2005   online     Ordinal Model  Jung I  Kulldorff M  Klassen A  A spatial scan statistic for ordinal data  Manuscript   2005   online     Exponential Model  Huang L  Kulldorff M  Gregorio D  A spatial scan statistic for survival data   Manuscript  2005   online     Normal Model  Manuscript in preparation  Until available  please cite this User Guide     Software  Kulldorff M  and Information M
132. t exceptional since the City has  about 3 percent of the U S  population  If we accept that there is a cluster in Seattle though  and if  we adjust for that by removing Seattle from the analysis  then 30 cases in the City out of 30  nationwide is statistically significant  This is similar to a regular multiple regression  where if we  adjust for one variable  another variable may suddenly become statistically significant  Note that    SaTScan User Guide v7 0 76    the opposite is also true  If we remove an area with significantly fewer cases than expected  than  a significant cluster with an excess number of cases may become non significant     20  For count data  the spatial scan statistic uses a particular alternative hypothesis with an  excess risk in a circular cluster  where the number of cases follows a Poisson or Bernoulli  distribution  Does this mean that it can only be used to detect such alternative hypotheses     Many proposed and widely used test statistics do not specify an alternative hypothesis at all  This  neither means that they cannot be used for any alternative hypotheses nor that they are good for  all alternatives  Likewise  if an explicit alternative is defined  as with the spatial scan statistic   that does not mean that it cannot be used for other alternative hypotheses as well  It is simply a  question of the test statistic having good power for some alternative hypotheses and low power  for other  The advantage of having a well specified alternativ
133. tant JG  Lammie PJ  Hightower AW  Spatial clustering of filarial transmission before and  after a Mass Drug Administration in a setting of low infection prevalence  Filaria Journal  3  3   2004   online     103  Odoi A  Martin SW  Michel P  Middleton D  Holt J  Wilson J  Investigation of clusters of giardiasis  using GIS and a spatial scan statistic  International Journal of Health Geographics  3 11  2004    online     104  Reperant LA  Deplazes P  Cluster of Capillaria hepatica infections in non commensal rodents from  the canton of Geneva  Switzerland  Parasitology Research  96 340 342  2005     Alcohol and Drugs    105  Hanson CE  Wieczorek WF  Alcohol mortality  a comparison of spatial clustering methods  Social  Science and Medicine  55 791 802  2002     Accidents    106  Nkhoma ET  Hsu CE  Hunt VI  Harris AM  Detecting spatiotemporal clusters of accidental  poisoning mortality among Texas counties  U S   1980   2001  International Journal of Health  Geographics  3 25  2004   online     Syndromic Surveillance    107  Heffernan R  Mostashari F  Das D  Karpati A  Kulldorff M  Weiss D  Syndromic surveillance in  public health practice  The New York City emergency department system  Emerging Infectious  Diseases  10 858 864  2004   online     108  Minnesota Department of Health  Syndromic Surveillance  A New Tool to Detect Disease  Outbreaks  Disease Control Newsletter  32 16 17  2004   online        109  Kleinman K  Abrams A  Kulldorff M  Platt R  A model adjusted space tim
134. tead provided in the Cluster Cases Information File     The exact columns shown depend on the chosen analysis  as shown in Table 3  The file will have the    same name as the standard results file  but with the extensions   col txt and   col dbf respectively  and  will be located in the same directory     SaTScan User Guide v7 0 66        s     5  z     5    E                 2      Ee  EA      Bernoulli  Circular  Cartesian  One Data Set    Bernoulli  Circular  Lat Long  Mulitple Data Sets    Poisson  Bernoulli  Circular  Cartesian 5 Dimensions   ace T ime Permutation  Circular  Lat Long     Output Variable    2  Laiude     ATTUDE   3   Xcoordinate K O    e  Yoona eooo eaa  SEO          i      1     51 1    um       ECCO ES  Circie Radius mous   5 5 8  5 5  T     IE MNOR     Ea    Exponential  Circular  Lat Long  One Data Set  Poisson  Bernoulli  Elliptic  Cartesian  One Data Set  Ordinal  Circular  Lat Long  One Data Set    Normal  Circular  Lat Long  One Data Set       Sp    acento em  Bus  TIERE   HERR   HET HE   HEEERRH    EHE ES  s HESSE    Ellipse  Length of Minor Axis   Ellipse  Length of Major Axis  Ellipse Angle Ex    1 3   EI EN  MEESESERERESRREREREN  ClusterStatDate            STARTDATE   8  8  9   815  9  51 5 15   RM Oae  Lxx owe   AFER ER    E Location IDs      numee Loc e  a      8  8     8  8  5    Log Likelihood Ratio  LR      amp    amp   9 9 12   jej t2 e  98   e    Test Statistic      TEST_STAT             9    13             P ValueofCluster             
135. ted  Under the null  hypothesis  and when there are no covariates  the expected number of cases in each area is proportional  to its population size  or to the person years in that area  Poisson data can be analyzed with the purely  temporal  the purely spatial or the space time scan statistic     Example  For the Poisson model  cases may be stroke occurrences while the population is the combined  number of person years lived  calculated as 1 for someone living in the area for the whole time period  and  4 for someone dying or moving away in the middle of the time period     The Poisson model requires case and population counts for a set of data locations such as counties   parishes  census tracts or zip code areas  as well as the geographical coordinates for each of those  locations     The population data need not be specified continuously over time  but only at one or more specific   census times   For times in between  SaTScan does a linear interpolation based on the population at the  census times immediately proceeding and immediately following  For times before the first census time   the population size is set equal to the population size at that first census time  and for times after the last  census time  the population is set equal to the population size at that last census time  To get the  population size for a given location and time period  the population size  as defined above  is integrated  over the time period in question     Related Topics  Likelihood Rati
136. ted with biased p values that are too small   providing    statistically significant    results when none exist  Here  the null hypothesis should be  that there is spatial auto correlation and the alternative hypothesis that there are geographical  differences in the risk of food poisoning     On the other hand  if we are interested in quickly detecting food poisoning outbreaks  we should  not adjust for the spatial auto correlation since we are interested in detecting clusters due to such  correlation  and if they are adjusted away  important clusters may go undetected  Here  the null  hypothesis is that the food poisoning cases are geographically randomly distributed  adjusted for  population density etc   and the alternative hypothesis is that there is some clustering either due  to differences in underlying risk factors or spatial auto correlation  Once the location of a cluster  has been detected  it is for the local health officials to determine the source of the cluster to  prevent further illness     19  If there are multiple clusters in the data  does that mean that the p values are more likely to  be significant than their 0 05 nominal significance level suggests  so that chance clusters are  detected too often     No  The opposite is actually true  Looking at United States mortality  suppose we have 1000  cases of a disease in Seattle and 30 in New York City  Seattle is clearly a significant cluster but  30 cases in New York City out of 1030 in all of the USA is no
137. ter together with the software itself  These and other sample data sets are also available at  http   www satscan org datasets    Poisson Model  Space Time  Brain Cancer Incidence in New Mexico  Case file  nm cas  Format   lt county gt   lt cases 1 gt   lt year gt   lt age group gt   lt sex gt   Population file  nm pop  Format   lt county gt   lt year gt   lt population gt   lt age group gt   lt sex gt     Coordinates file  nm geo    SaTScan User Guide v7 0 5    Format    county     x coordinate     y coordinate     Study period  1973 1991   Aggregation  32 counties   Precision of case times  Years   Coordinates  Cartesian   Covariate  1  age groups      0 4 years  2   5 9 years      18   85  years  Covariate  2  gender  1   male  2   female   Population years  1973  1982  1991    Data source  New Mexico SEER Tumor Registry    This is a condensed version of a more complete data set with the population given for each year from  1973 to 1991  and with ethnicity as a third covariate  The complete data set can be found at   http   www satscan org datasets     Bernoulli Model  Purely Spatial   Childhood Leukemia and Lymphoma Incidence in North  Humberside  Case file  NHumberside cas  Format   lt location id gt   lt   cases gt   Control file  Nhumberside ctl  Format   lt location id gt   lt   controls gt   Coordinates file  Nhumberside geo  Format   lt location id gt   lt x coordinate gt   lt y coordinate gt   Study period  1974 1986  Controls  Randomly selected from the birth regi
138. tering is based on a compactness  criterion in the sense that the cases in the cluster should be close to each other  so that we may be more  interested in more compact clusters  When the non compactness penalty is used  the pure likelihood ratio  is no longer used as the test statistic  Rather  the test statistic is defined as the log likelihood ratio    multiplied with a non compactness penalty of the form  4s  s  1       where s is the elliptic window shape  defined as the ratio of the length of the longest to the shortest axis of the ellipse  For the circle  s    The  parameter a is a penalty tuning parameter  With a 0  the penalty function is always 1 irrespectively of s   so that there is never a penalty  When a goes to infinity  the penalty function goes to 0 for all s gt 1  so that  only circular clusters are allowed  Other than this  there is no clear intuitive meaning of the penalty  tuning parameter a  In SaTScan  it is possible to use either a strong penalty  a 1  or a medium size  penalty  a 1 2      Related Topics  Batch Mode  Bernoulli Model  Covariate Adjustments  Elliptic Scanning Window     Exponential Model  Monte Carlo Replications  Ordinal Model  Poisson Model  Secondary Clusters   Space Time Permutation Model  Standard Results File     SaTScan User Guide v7 0 16    Secondary Clusters    For purely spatial and space time analyses  SaTScan also identifies secondary clusters in the data set in  addition to the most likely cluster  and orders them according 
139. tion model automatically adjusts for purely spatial and purely temporal  variation  there is no need to adjust for covariates in order to account for different spatial or temporal  densities of these covariates  For example  there is no need to adjust for age simply because some places  have a higher proportion of old people than other  Rather  covariate adjustment is used if there is space   time interaction due to this covariate rather than to the underlying disease process  For example  if  children get sick mostly in the summer and adults mostly in the winter  then there will be age generated  space time interaction clusters in areas with many children in the summer and vice versa  When  including child adult as a covariate  these clusters are adjusted away     Note  Too many covariate categories can create problems  For the space time permutation model  the  adjustment is made at the randomization stage  so that each covariate category is randomized  independently  If there are too many covariate categories  so that all or most cases in a category belong to  the same spatial location or the same aggregated time interval  then there is very little to randomize  and  the test becomes meaningless     Related Topics  Covariate Adjustments  Covariate Adjustment using Statistical Regression Software   Covariate Adjustment Using Multiple Data Sets  Methodological Papers  Poisson Model  Space Time  Permutation Model  Case File  Population File     Covariate Adjustment Using Stat
140. to their likelihood ratio test statistic  There  will almost always be a secondary cluster that is almost identical with the most likely cluster and that  have almost as high likelihood value  since expanding or reducing the cluster size only marginally will  not change the likelihood very much  Most clusters of this type provide little additional information  but  their existence means that while it is possible to pinpoint the general location of a cluster  its exact  boundaries must remain uncertain     There may also be secondary clusters that do not overlap with the most likely cluster  and they may be a  great interest  The user must decide to what extent overlapping clusters are reported in the results files   The default is that the geographically overlapping clusters are not reported     For purely temporal analyses  only the most likely cluster is reported     Related Topics  Adjusting for More Likely Clusters  Likelihood Ratio Test  Clusters Reported Tab   Criteria for Reporting Secondary Clusters  Standard Results File     Adjusting for More Likely Clusters    When there are multiple clusters in the data set  the secondary clusters are evaluated as if there were no  other clusters in the data set  That is  they are statistically significant if and only if they are able to cause  a rejection of the null hypothesis on their own strength  whether or not the other clusters are true clusters  or not  That is often the desired type of inference  Sometime though  it is al
141. ts away all such clusters  to see if there are any space time clusters not  explained by purely spatial clusters  This is done in a non parametric fashion  through stratified  randomization by location  so that the total number of cases in each specific location is the same in the  real and random data sets  That is  only the time of a case is randomized     The default is no spatial adjustment     Note  It is not possible to simultaneously adjust for spatial clusters and purely temporal clusters using  stratified randomization  If both types of adjustments are desired  the space time permutation model  should be used instead  It is possible to adjust for purely spatial clusters with stratified randomization  together with a temporal adjustment using a log linear trend     Related Topics  Spatial and Temporal Adjustment Tab  Spatial and Temporal Adjustments  Temporal  Trend Adjustment  Adjustment with Known Relative Risk  Poisson Model     Adjustment with Known Relative Risks    The most flexible way to adjust a Poisson model analysis is to use the special adjustments file  In this  file  a relative risk is specified for any location and time period combination  and SaTScan will adjust the  expected counts up or down based on this relative risk  One use of this option is to adjust for missing  data  by specifying a zero relative risk for those location and time combinations for which data is  missing     The required format of the Adjustments File is described in the section 
142. tures  Analysis Tab  Spatial and Temporal Adjustments  Temporal Trend  Adjustment  Spatial Adjustment  Adjustment with Known Relative Risk  Poisson Model     Temporal Trend Adjustment    Temporal trends can be adjusted for in three different ways     Non parametric  When the adjustment is non parametric  SaTScan adjusts for any type of purely  temporal variation  This is done by stratifying the randomization by the aggregated time intervals  so that  each time interval has the same number of cases in the real and random data sets  That is  it is only the  spatial location of a case that is randomized     SaTScan User Guide v7 0 50    Log linear trend  specified by user  Specify an annual percent increase or decrease in the risk  A  decreasing trend is specified with a negative number  For example  if the rate decreases by 1 4 percent  per year  then write   1 4  in the  46 per year  box     Log linear trend  automatically calculated  Rather than the user specifying the adjusted relative risk   SaTScan can calculate the observed trend in the data and then adjust for exactly that amount of increase  or decrease     The default is no temporal trend adjustment     Related Topics  Spatial and Temporal Adjustment Tab  Spatial and Temporal Adjustments  Spatial  Adjustment  Adjustment with Known Relative Risk  Poisson Model     Spatial Adjustment    When a purely spatial analysis is performed the purpose is to find purely spatial clusters  For space time  analyses  this feature adjus
143. vations may be either positive or  negative     Related Topics  Analysis Tab  Bernoulli Model  Exponential Model  Methodological Papers  Ordinal  Model  Poisson Model  Probability Model Comparison  Space Time Permutation Model       Scan for High or Low Rates    It is possible to scan for areas with high rates only  clusters   for areas with low rates only  or  simultaneously for areas with either high or low rates  The most common analysis is to scan for areas  with high rates only  that is  for clusters  For the exponential model  high corresponds to short survival   For the ordinal and normal models  high corresponds to large value categories observations     Related Topics  Analysis Tab  Likelihood Ratio Test  Methodological Papers     SaTScan User Guide v7 0 39    Time Aggregation    Space time analyses are sometimes very computer intensive  To reduce the computing time  case times  may be aggregated into time intervals  Another reason for doing so is to adjust for cyclic temporal trends   For example  when using intervals of one year  the analysis will automatically be adjusted for seasonal  variability in the counts  and when using time intervals of 7 days  it will automatically adjust for weekday  effects     Units  The units in which the length of the time intervals are specified  This can be in years  months or  days  The units of the time intervals cannot be more precise than the time precision specified on the input  tab     Length  The length of the time interval
144. ve memory allocation scheme to reduce the total memory  requirement  This selection is done automatically  The amount of memory needed for large data sets is  then approximately     4xLxTIx CAT x EXP xD x MC  8xMC xC x R xP bytes    where MC is the number of Monte Carlo simulations and the other variables are defined as above     Insufficient Memory    If there is insufficient memory available on the to run the analysis using either memory allocation  scheme  there are several options available for working around the limitation    e Close other applications    e Aggregate the data into fewer data locations  reduce L     e Decrease the number of circle centroids in the special grid file  reduce G     e Reduce the upper limit on the circle size  reduce mg     e Run the program on a computer with more memory   It is highly desirable that there is sufficient RAM to cover all the memory needs  as SaTScan runs  considerable slower when the swap file is used  so these technciques may also be used to avoid the swap    file  Not all of these above options will work for all data sets  Please note that the following SaTScan  options do not influence the demand on memory     SaTScan User Guide v7 0 63    e The length of the study period   e The maximum temporal cluster size   e Type of space time clusters to include in the analysis   Note  The 32 bit windows operating system can allocate a maximum of 2 GBytes of memory to a single    application  and that is hence the upper limit on the 
145. version of SaTScan is run  the standard results file does not automatically pop up  on the screen  but must be opened manually using any available text editor such as Notepad     Opportunity  There are some parameter options that are not allowed when SaTScan is run under the  windows interface but which can be set when run in batch mode  A few such examples are the number of  Monte Carlo replications  the use ellipses rather than circles  and an unlimited number of multiple data  sets  Parameter options not allowed by the windows interface have not all been thoroughly tested though   so there is some risk involved when running such analyses     Related Topics  Launching the Analysis  Basic SaTScan Features  Advanced Features  Saving Analysis  Parameters     SaTScan User Guide v7 0 60    Computing Time    The spatial and space time scan statistics are computer intensive to calculate  The computing time  depends on a wide variety of variables  and depending on the data set and the analytical options chosen   it could range from a few seconds to several days or weeks  The ordinal model is in general much more  computer intensive than the other probability models  Other than that  the three main things that increase  the computing time is the number of locations in the coordinates and special grid files  the number of  time intervals  for space time analyses  and the number of data sets used     Single Data Set    For a single data set  the computing time is approximately on the 
146. with one row for each location covariate combination and with columns  as defined below  Such files can be created using any text editor and most spreadsheets  The order of the  columns in the file is very important  but the rows can be in any order  The optional variables  defined  above  are optional columns in the SaTScan file format     Case File Format    cas      location id    lt  cases gt    time    lt attribute 1 gt       lt attribute N gt     The number of attributes and their meaning depends on the probability model  as shown in Table 1     Probability Model attribute 1 attribute 2 attribute N       Poisson covariate   covariate 2 covariate N  Bernoulli not used not used not used  Space Time Permutation covariate   covariate 2 covariate N  Ordinal category not used not used    SaTScan User Guide v7 0 33    Exponential survival time censored not used  Normal continuous variable not used not used    Table 1  Attributes used for different probability models     Control File Format    ctl       location ID    lt  controls gt    time     Population File Format    pop       location ID     time     population    lt covariate 1 gt       lt covariate N gt    Coordinates File Formats    geo       location ID     latitude     longitude   OR     location ID     x coordinate     y coordinate    lt z1 coordinate gt        lt zN coordinate gt   Special Grid File Formats    grd       latitude     longitude   OR     x coordinate    lt y coordinate gt    z1 coordinate          lt z
147. you would  like to ignore in the text field in the upper right corner     If you have a character delimited file  use the scrolling menus to select the field separator to be  either a comma  a semicolon or white space     If you have a fixed column file  define the fields using the Field Information box  For each field  type the name  the start column  and the length  maximum number of characters  into the  appropriate spaces  Click on the Add button to add another field  The information will appear in  the panel on the right  Continue adding fields until you have the appropriate number  To change  the information in the right panel  highlight the line you want to change  The information will  appear in the Field Information box  Edit the information and click on the Update button when  you are done  The updated information will appear in the right panel     Click on Next to proceed to the next dialog box     Step 3  Matching Source File Variables with SaTScan Variables    The top grid in this dialog box links the SaTScan variables with the input file variables from the source  file  The bottom grid displays sample data from the chosen input file     SaTScan User Guide v7 0 32    1  If there are headers in your file  click the checkbox in the lower left corner   2  To match the variables  click on one of the places where it says    unassigned      3  Select the appropriate variable form the input file to go with the chosen SaTScan variable     4  When all the required and opt
148. ypothesis  and    0 otherwise  The opposite is true when SaTScan is set to scan only for clusters with low rates  When the  program scans for clusters with either high or low rates  then I  21 for all windows        The space time permutation model uses the same function as the Poisson model  Due to the conditioning  on the marginals  the observed number of cases is only approximately Poisson distributed  Hence  it is no  longer a formal likelihood ratio test  but it serves the same purpose as the test statistic     For the Bernoulli model the likelihood function is       SaTScan User Guide v7 0 15          Gt  e  C  c e  n   C 2  10  n n N n N n    where c and C are defined as above  n is the total number of cases and controls within the window  while  N is the combined total number of cases and controls in the data set     The likelihood function for the ordinal  exponential and normal models are more complex  due to the  more complex nature of the case data  We refer to papers by Jung  Kulldorff and Klassen  Huang   Kulldorff and Gregorio     and Kulldorff et al  for the likelihood functions for these models     The likelihood function is maximized over all window locations and sizes  and the one with the  maximum likelihood constitutes the most likely cluster  This is the cluster that is least likely to have  occurred by chance  The likelihood ratio for this window constitutes the maximum likelihood ratio test  statistic  Its distribution under the null hypothesis is obta
149. yses  and if the max circle size is defined as a percentage of the population  then the  special max circle size file must be used     Related Topics  Inference Tab  Computing Time  Type of Analysis  Spatial Temporal and Space Time  Scan Statistics     Iterative Scan    The iterative scan option is used to adjust the p values of secondary clusters for more likely clusters that  are found and reported  This is done by doing the analysis in several iteration  removing the most likely  cluster found in each iteration  and then reanalyzing the remaining data  The user must specify the  maximum number of iterations allowed  in the range 1 32000  The user may also request that the  iterations stop when the cluster has a p value greater than a specified lower bound     In terms of computing time  each iteration takes approximately the same amount of time as a regular  analysis with the same parameters     Related Topics  Adjusting for More Likely Clusters  Inference Tab  Computing Time     SaTScan User Guide v7 0 53    Clusters Reported Tab    Advanced Output Features    Clusters Reported      Criteria for Reporting Secondary Clusters      No Geographical Overlap   C No Cluster Centers in Other Clusters Set Defaults    C No Cluster Centers in More Likely Clusters   No Cluster Centers in Less Likely Clusters   C No Pairs of Centers Both in Each Others Clusters   C No Restrictions   Most Likely Cluster for Each Grid Point    Maximum Reported Spatial Cluster Size   jw Report only clust
    
Download Pdf Manuals
 
 
    
Related Search
    
Related Contents
  MANUEL D`INSTRUCTIONS Binoculaire stéréoscopique  LG Electronics 47LA62**-Z* Flat Panel Television User Manual  Requirements and possibilities of a new microcontroller in an  indukční vařič indukčný varič kuchenka indukcyjna indukciós  MR - MRE  CONDITIONS GENERALES DE LOCATION  Dell Inspiron 14 3420 Specifications : Free Download, Borrow, and Streaming : Internet Archive  取付説明書      Copyright © All rights reserved. 
   Failed to retrieve file