Home

PermuCLUSTER 1.0 User's Guide

1. References Anderberg M 1973 Cluster analysis for applications New York Academic Press Backeljau T De Bruyn L De Wolf H Jordaens K Van Dongen S amp Winnepen ninckx B 1996 Multiple UPGMA and neighbour joining trees and the performance of some computer packages Molecular Biology and Evolution 13 309 313 SPSS Inc 2001 SPSS base 11 0 User s guide Chicago Ill SPSS Inc Van der Kloot W Bouwmeester S amp Heiser W 2003 Cluster instability as a result of data input order In H Yanai A Okada K Shigemasu Y Kano amp J Meulman Eds New developments in psychometrics Proceedings of the international meeting of the psychometric society IMPS2001 p 569 576 Tokyo Springer 16
2. LEIDEN PSYCHOLOGICAL REPORTS PSYCHOMETRICS AND RESEARCH METHODOLOGY PRM 04 01 PermuCLUSTER 1 0 User s Guide Alexander Spaans Willem van der Kloot DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LEIDEN THE NETHERLANDS PermuCLUSTER 1 0 User s Guide Alexander Spaans Willem van der Kloot Faculteit Sociale Wetenschappen Studierichting Psychologie Universiteit Leiden Postbus 9555 2300 RB Leiden Nederland Copyright 2004 Leiden University Leiden The Netherlands LICENSE AGREEMENT This Limited Use Software License Agreement is a legal agreement between you the end user and Leiden University for the use of PermuCLUSTER Software By using this software or storing this program on a computer hard drive or other media you are agreeing to be bound by the terms of this Agreement License This license allows you to install and use the Software on a single computer OR install and store the Software on a storage device such as a network server used only to run or install the Software on your other computers over an internal network You are allowed to make one copy of the Software in machine readable form solely for backup purposes You must reproduce on any such copy all copyright notices and any other proprietary legends on the original copy of the Software Restrictions You may not decompile reverse engineer disassemble or otherwise reduce the Software to a human perceivable form You may not rent lease or sublic
3. References 16 1 Introduction Hierarchical cluster analysis as implemented in most of the well known statistical computer programs neglects the phenomenon of input order instability That is cluster solutions may differ when the rows and colums of the proximity matrix are permuted This phenomenon is not widely known and is caused by ties that are present in the initial dis similarity matrix or arise during the process of clustering Backeljau et al 1996 Van der Kloot Bouwmeester amp Heiser 2003 To tackle this phenomenon PermuCLUSTER has been developed PermuCLUSTER repeats the analysis a large number of times by permuting the rows and columns of the proximity matrix In order to compare the solutions and find the optimal solution a goodness of fit measure is used The number of times the matrix should be permuted is variable and is user defined PermuCLUSTER is an SPSS add in and offers all but the same functionality as CLUSTER in SPSS The main exception is that PermuCLUSTER cannot be run using the SPSS syntax command language After installation PermuCLUSTER is accessible from the Analyze gt Classify Menu in SPSS Generated output will be displayed in the SPSS Output Viewer 2 Getting Started 2 1 Starting the program After a typical installation PermuCLUSTER can be started in two ways i e from the Windows Start Menu and from the Analyze gt Classify Menu in SPSS Note that Permu CLUSTER will only start up when an instan
4. general tab contains the mandatory settings and the options tab the more optional ones After the settings have been specified the analysis can be started by clicking the OK button 3 1 General Tab In PermuCLUSTER the same clustering methods are implemented as in SPSS Anderberg 1973 These are between groups linkage between average within groups linkage within average nearest neighbor single linkage furthest neighbor complete linkage centroid clustering median clustering and Ward s method The Number of Permutations indicates how many sequential runs repeated analyses should be performed see Figure 1 In each run the rows and colums of the original proximity matrix will be permuted randomly The first permutation can be the identity permutation see also Section 3 2 1 If the first permutation is chosen to be the identity permutation the outcome of the first run will be equivalent to the outcome of a CLUSTER analysis in SPSS 3 1 1 Proximities The input data can be a raw data set as well as a proximity data set If a raw data set is specified then the set is converted by PermuCLUSTER to proximities using PROXIMITIES in SPSS The location of a proximity data set can be specified with help of the browse button Such a proximity data set should be in the SPSS SAV format i e created with PROXIMITIES or DISTANCES If a raw data set is taken as the input data also the Analyze Data section should specified as described in
5. optimal solution 3 2 4 Save The option Permutation fit table indicates whether or not to save the permutation fit table to disk The table will be written in the SPSS SAV format and will contain the following columns permu permutation sid solution id ssdif sum of squared differences nss dif normalized sum of squared differences cophcorr cophenetic correlation coefficient randseed random seed 4 Program Output The output that PermuCLUSTER generates will appear in the SPSS Output Viewer By default there will be output for Fit and Solution The solution related output may appear multiple times and consists of fit object order agglomeration schedule and dendrogram Which of these items should appear can be indicated with help of the statistics and plot options see Section 3 2 2 and 3 2 3 Note that depending on the number of permutations and settings of the ouput options generating output in the SPSS Output Viewer can be time consuming Besides this also the permutation fit table can be output to disk see Section 3 2 4 4 1 Permutation Fit The permutation fit table contains the following columns Permutation Solution ID SSDif Normalized SSDif Cophenetic Correlation and Random Seed see Table 1 Permutation Fit Normalized Cophenetic Solution ID SSDif SSDif Correlation Random Seed faa07be1095b8577ad157ae400b92ece 4 374875000000000E 03 062292 847304 e05f7a03291 2045087 76233e551 cc92d 4 37487 5000000
6. permutation is identity in section Permutation Randomiza tion b Further finetune the analysis and outcome by setting options see Section 3 2 for a description of all available options 6 Click the OK button to start the analysis In case of analyzing a proximity matrix 1 Start SPSS 2 Start PermuCLUSTER from the Analyze gt Classify menu 3 On the General Tab a b c d Specify the Cluster Method which should be used Set the Number of Permutations to 1 Select Read and analyze a with SPSS Proximities created matriz Gr Ns Ns NAS Specify the location of the proximity matrix with help of the browse button Note The proximity matrix should be in the SPSS SAV format i e created with PROXIMITIES or Distances 4 On the Options Tab a Enable option First permutation is identity in section Permutation Randomiza tion 14 b Further finetune the analysis and outcome by setting options see Section 3 2 for a description of all available options Click the OK button to start the analysis How do I inspect a solution permutation listed in the Per mutation Fit table that is not an optimal solution permu tation Select in the Permutation Fit table in the SPSS Output View the Random Seed value for the permutation solution you want to inspect and copy it to the clipboard Go to the SPSS Data View Start PermuCLUSTER from the Analyze gt Classify menu On the General Tab a Set th
7. 000E 03 062292 847304 185086820 eb5f7a032912045d8776233e551cc92d 4 374875000000000E 03 062292 847304 905185470 878e5145350665de1 27 2594654922654 5 41 0000000000000E 03 077030 806360 655175664 e05f7a03291 2045087 76233e551cc92d 4 374875000000000E 03 062292 547304 739568306 878e5145350665de1 27259465a22654 5 410000000000000E 03 077030 506960 1588812412 faa07be1095b8577ad157ae400b92ece 4 374875000000000E 03 062292 847304 1383575782 878e5145350665de1 27259465a22654 5 410000000000000E 03 077030 806960 350330536 878e5145350665de1 27259465a22654 5 410000000000000E 03 077030 806960 1317678938 5878e5145350665de127259465a22654 410000000000000E 03 077030 806960 1648547692 a Identity permutation Table 1 Permutation Fit The permutation column displays the permutation or run number to which the other values in corresponding table row relate In Table 1 the first permutation is the idenity permutation The solution identifier Solution ID is a summary of the solution for a given permu tation based on the agglomeration schedule of that solution and is cluster method inde pendent Solutions with the same solution id have the same agglomeration schedule and therefore are equal t The sum of squared differences SSDif between the distances d in the proximity matrix and the cophenetic or ultrametric distances c in the solution is used as a goodness of fit measure in order to compare solutions see Equation 1 The lower the sum the
8. better the fit In theory it is possible that two different agglomeration schedules yield the same solution identifier however the probability for this to happen is negligible i j gt i The normalized sum of squared differences Normalized SSDif is the normalized version of SSDif Normalization was done by dividing SSDif by the sum of the squared distances in the proximity matrix see Equation 2 Note that the Normalized SSDif is not constrained to be less or equal to 1 Xi Loja dij Cig Doi ji dj The cophenetic correlation coefficient Cophenetic Correlation is the product moment correlation between the distances in the proximity matrix and the cophenetic or ultrametric distances in the solution The random seed Random Seed describes the state of the random generator which generated the permutation Feeding a permutation s random seed back into the random generator will reproduce the permutation This is useful when performing experiments see also Section 3 2 1 SSDIFN 2 4 2 Solution In case of only one optimal solution this item will be listed only once In case of multiple optimal solutions this item will be listed for each of the optimal solutions 4 2 1 Fit The fit table contains the Solution ID SSDif Normalized SSDif Cophenetic Correlation and Random Seed for the optimal solution see Table 2 This is an exact copy of the corresponding row in the permutation fit table see Section 4 1 Fit N
9. ce of SPSS is already running If PermuCLUSTER is not accessible from the Analyze gt Classify Menu after instal lation it can be added manually registered by running Add PermuCLUSTER To SPSS Analyze Menu from the Start gt Program Files gt PermuCLUSTER Menu This registra tion tool will add PermuCLUSTER for the current user and the default user Windows NT Windows 2000 and Windows XP on the system After registration restart SPSS to see effect As an alternative PermuCLUSTER can be registered to SPSS as an add in by making use of the Menu Editor in SPSS accessible from the Utilities Menu To unregister PermuCLUSTER run Remove PermuCLUSTER From SPSS Analyze Menu from the Start gt Program Files gt PermuCLUSTER Menu This unregistration tool will remove PermuCLUSTER for the current user and the default user on the system After unregistration restart SPSS to see effect Alternatively it can be unregistered with help of the Menu Editor in SPSS 2 2 System Requirements PermuCLUSTER will run on computer systems that meet the following minimum hardware and software requirements e Windows 98 Windows ME Windows NT 4 0 Windows 2000 or Windows XP 2 e Pentium or Pentium class processor e 16MB or more of random access memory Graphics adapter with 800 x 600 resolution SVGA or higher SPSS 11 0 or higher 3 Program Input The input for PermuCLUSTER can be specified at the general and options tab see Figure 1 The
10. e Number of Permutations to 1 b Make sure that the other settings are exactly the same as in the analysis to which the Permutation Fit table in step 1 belongs On the Options Tab a In section Permutation Randomization i Disable option First permutation is identity ii Enable option Custom Seed and paste the Random Seed you copied to the clipboard b Further finetune the analysis and outcome by setting options see Section 3 2 for a description of all available options Click the OK button to start the analysis How do I replicate an earlier performed analysis 1 Select in the Permutation Fit table in the SPSS Output View the first listed Random Seed value and copy it to the clipboard If the first permutation is the identity permutation this will be the random seed of the second permutation If the first permutation is not the identity this will be the random seed of the first permutation Go to the SPSS Data View Start PermuCLUSTER from the Analyze gt Classify menu On the General Tab 15 a Make sure that the settings are exactly the same as in the analysis to which the Permutation Fit table in step 1 belongs 5 On the Options Tab a Make sure that the settings are exactly the same as in the analysis to which the Permutation Fit table in step 1 belongs b Enable option Custom Seed and paste the Random Seed you copied to the clip board 6 Click the OK button to start the analysis
11. ense the Software You may not modify the Software or create derivative works based upon the Software Other than as set forth above license you may not make or distribute copies of the Software or electronically transfer the Software from one computer to another or over a network Any such unauthorized use shall result in immediate and automatic termination of this license and may result in criminal and or civil prosecution Ownership The foregoing license gives you limited rights to use the Software Leiden University retains all right title and interest including all copyrights in and to the Software and all copies thereof All rights not specifically granted in this Agreement including Federal and International Copyrights are reserved by Leiden University Use of produced and derived data You may use data produced with or derived from running PermuCLUSTER in publications presentations etcetera provided you clearly refer to the use of PermuCLUSTER DISCLAIMER This software PermuCLUSTER is provided AS IS without any warranty express or implied for fitness for any particular purpose merchantability or non infringement of rights of third parties Whilst effort has been made to ensure that this software PermuCLUSTER is accurate in all respects no responsibility can be accepted for any loss damage injury or any other occurrence relative to the use of this software By using the software the user accepts the
12. entire risk arising out of the use or performance of this software and documentation Contents 1 Introduction 2 2 Getting Started 2 2 1 Starting the propTami os se ewo See te e ee Se War DE 2 2 2 System Requirements o an 54 4e eee done eek Ba A Sey 2 3 Program Input 3 acl General Tab laa a a a Er eee es he mla 3 PARAS ad al ae laa b oie as 3 ales Analyze Datas ia Koerd Shek Koa dk Se AS 3 3 2 A ala aga ka A nag ee kG Sn NLB 5 3 2 1 Permutation Randomization is ee A He ad Et 5 3 2 2 GOLALISLIEST a a A A oee Ga dek Re e 8 Boece o suse etwas O 8 Ore TONG e a A EA AE SE Bt 8 4 Program Output 8 A Perm tation Eit cs ts lA ENG Bock hee Se ee Be a 9 452 OOU he aed Spf aoe Eenes DN an nga Witte dine inbe a lan oe init gs E ot 10 APA EEN A eal de ae ol IER dS MS OA le det eee aa 10 AD HAD OC Ord r ida Ge hee sk Er wed it Coe teas Gen 11 4 2 3 Agglomeration Schedule 020000 11 424A Dendrogram E Ye Ot ok Bake een okt ebr Ge de os Gl S 11 5 Frequently Asked Questions FAQ 12 5 1 How do I perform a cluster analysis using raw data 12 5 2 How do I perform a cluster analysis using a proximity matrix 13 5 3 How do I perform an SPSS CLUSTER equivalent analysis 13 5 4 How do I inspect a solution permutation listed in the Permutation Fit table that is not an optimal solution permutation 15 5 5 How do I replicate an earlier performed analysis 15
13. ormalized Cophenetic Solution ID SsDif SSDif Correlation Random Seed e05f7a03291 2045d8776233e551cc92d 4 374875000000000E 03 062292 847304 185086820 Table 2 Fit 10 4 2 2 Object Order The object order table contains the order of the objects to be clustered in the original proximity matrix after they have been permuted see Table 3 Object Order 546123 Note The above entities are the object identifiers in the original proximity matrix Table 3 Object Order 4 2 3 Agglomeration Schedule The agglomeration schedule lists which clusters are combined at each stage in the clustering process together with other useful information e g fusion coefficients merge value see Table 4 Agglomeration Schedule Cluster Cluster el Stage Cluster Stage Cluster JI Combined Combined First Appears First Appears Stage Cluster 1 Cluster 2 Coefficients Cluster 1 Cluster 2 Next Stage 0 0 2 1 0 4 0 0 5 2 0 5 4 3 0 Table 4 Agglomeration Schedule 4 2 4 Dendrogram The dendrogram gives a visual presentation of the agglomeration schedule see Figure 6 Note that the coefficients are translated into values between 1 and 25 11 Dendrogram using Average Linkage Between Groups Rescaled Distance Cluster Combine CASE 0 5 10 15 20 25 Label Num de He Penn a WARI WARS VARZ VARG VAR4 VARS nb A ND WO pH Figure 6 Dendrogram Frequently Asked Questions FAQ How do I perform a cluster analy
14. sis using raw data Start SPSS Load the raw data set into the SPSS Data View Start PermuCLUSTER from the Analyze gt Classify menu On the General Tab Specify the Cluster Method which should be used Specify the Number of Permutations which should be performed Select Analyze original data in SPSS Data View a b c d REO NE NE In the Analyze data section i On the Variables Tab A Specify if cases or variables should be clustered B Specify the variables you want to analyze by moving them to the Vari able s listbox C Optional If cases are to be clustered specify the label variable in the Label Cases by listbox ii On the Measure Tab A Specify the distance measure to be used B Optional Specify one or more transformations iii On the Standardize Tab A Specify the standardization method to be used 12 B Specify if standardization should be performed on cases or variables 5 On the Options Tab a Further finetune the analysis and outcome by setting options see Section 3 2 for a description of all available options 6 Click the OK button to start the analysis 5 2 How do I perform a cluster analysis using a proximity ma trix 1 Start SPSS 2 Start PermuCLUSTER from the Analyze gt Classify menu 3 On the General Tab a b c d Specify the Cluster Method which should be used Specify the Number of Permutations which should be performed Select Read and anal
15. the next section 3 1 2 Analyze Data With PermuCLUSTER cases as well as variables can be clustered See Figure 2 and the next three sections for the Variables Measure and Standardize settings PermuCLUSTER SEE Between groups linkage Figure 1 General Tab 3 1 2 1 Variables The leftmost listbox will contain the numeric and text variables as specified in the SPSS Data View Variables to be clustered or for which cases are to be clustered must be placed in the Variable s listbox at the right If cases are to be clustered also a label variable can be specified It must be placed in the Label Cases by listbox see Figure 2 Analyze Data Cluster Cases Variables Variables Measure Standardize vl AL v2 ariablefs Label Cases by ak v3 Figure 2 Analyze Data Variables 3 1 2 2 Measure PermuCLUSTER supports all interval counts and binary measures which can also be analyzed by SPSS Hierarchical Cluster Analysis see Figure 3 These measures can also be transformed See SPSS documention SPSS Inc 2001 for an elab oration on the different measures and transformation 3 1 2 3 Standardize PermuCLUSTER supports all standardization methods that can also be found in SPSS see Figure 4 Consult the SPSS documentation for more informa tion 3 2 Options Tab In PermuCLUSTER options can be set regarding the input and output of an analysis see Figure 5 and the next four sections for more informa
16. tion 3 2 1 Permutation Randomization Options regarding permutation randomization can be set here With first permutation is identity one can indicate whether or not the the original proximity matrix should be Analyze Data Cluster Cases Variables Variables Measure Standardize Measure Transform Measures gt H Euclidean distance y 7 Absolute values Power E Root E Change sign I Rescale to 0 1 range Phi square measure Counts C Binary s Q y Present fi Absent 2 Figure 3 Analyze Data Meausure Analyze Data Cluster Cases Variables Variables Measure Standardize Standardize e By variable C By case Figure 4 Analyze Data Standardize 2 PermuCLUSTER File Help General Options gt Permutation Randomization IW First permutation is identity e Random seed Custom seed Statistics F Permutation fit Object order Y Agglomeration schedule Plots M Dendrogram Save Permutation fit table Figure 5 Options Tab permuted randomly at the first run The matrix will not be permuted at the first run if the first permutation is the identity In that case the first run will analyze the data in their original order which will produce the same solotion as an analysis by SPSS With random and custom seed one can indicate whether or not the seed to initialize the random generator sho
17. uld be randomly chosen based on the current time or will be custom based on input The random generator is used to generate the random permutations for each run in an analysis Enabling the custom seed option may be useful in an attempt to replicate an earlier performed analysis see also Section 4 1 and 5 5 3 2 2 Statistics The following output related options can be set here proximity matrix permutation fit object order and agglomeration schedule With proximity matriz one can indicate whether or not the proximity matrix will be displayed in the SPSS Output Viewer This option is only available when analysing a raw data set The option permutation fit indicates whether or not a table will be displayed in the SPSS Output Viewer containing for each permutation run the solution identifier Solution ID squared sum of differences SSDif normalized squared sum of differences Normalized SSDif cophenetic correlation coefficient Cophenetic Correlation and random seed The option object order indicates whether or not an overview will be displayed in the SPSS Output Viewer of the order in the permuted proximity matrix of the objects to be clustered for each optimal solution The option Agglomeration schedule indicates whether or not the agglomeration schedule will be displayed in the SPSS Output Viewer 3 2 3 Plots The the option dendrogram indicates whether or not a dendrogram will be displayed in the SPSS Output Viewer for each found
18. yze a with SPSS Proximities created matriz Wa NS Na Nad Specify the location of the proximity matrix with help of the browse button Note The proximity matrix should be in the SPSS SAV format i e created with PROXIMITIES or Distances 4 On the Options Tab a Further finetune the analysis and outcome by setting options see Section 3 2 for a description of all available options 5 Click the OK button to start the analysis 5 3 How do I perform an SPSS CLUSTER equivalent analysis In case of analyzing raw data 1 Start SPSS 2 Load the raw data set into the SPSS Data View 3 Start PermuCLUSTER from the Analyze gt Classify menu 4 On the General Tab a Specify the Cluster Method which should be used b Set the Number of Permutations to 1 13 c Select Analyze original data in SPSS Data View d In the Analyze data section i On the Variables Tab A Specify if cases or variables should be clustered B Specify the variables you want to analyze by moving them to the Vari able s listbox C Optional If cases are to be clustered specify the label variable in the Label Cases by listbox ii On the Measure Tab A Specify the distance measure to be used B Optional Specify one or more transformations iii On the Standardize Tab A Specify the standardization method to be used B Specify if standardization should be performed on cases or variables 5 On the Options Tab a Enable option First

PermuCLUSTER 1.0 User's Guide

Contents

Download Pdf Manuals

Related Search

Related Contents