Home

User's Manual - Research

image

Contents

1. T ARGUS 2 2 user s manual 18 lt DECIMALS gt 2 An example of a codelist file Region CDL Here represents Groningen etc Groningen Friesland Drenthe Overijssel Flevoland Gelderland y Uicidecione Noord Holland Zuid Holland 107 Ld Hi 2 E 4 5 6 8 9 Zeeland Noord Brabant 12 Limburg Nr North Os East Ws West Zd South Example of a file with the hierarchical structure Regio HRC Here Regions 1 2 and 3 Groningen Friesland and Drenthe are part of the North etc m 7 0 QO m mco Z I un n B O OO JOY Os WN ER p QO ON Ce D d Os I N 3 1 3 Specification When the metadata is ready the user can specify the tables he she wants to protect Via Specify Tables you come into a window to specify the tables In the current version of ARGUS you can specify more than one table but each table is protected separately unless you have a set of linked tables This version of ARGUS has a first implementation of linked tables In the upper part of the window the first pane shows the explanatory variables and the third pane shows the response variables The explanatory variables On the left is the listbox with the explanatory variables 19 T ARGUS 2 2 user s manual When you click on gt or lt you transport the selected variabl
2. or boxes which results in showing or omitting codes from the table In this version of X ARGUS this is the only allowable recoding of a hierarchical codelist Pressing the Apply button followed by Close will actually apply the selecting recoding The undo button is possible to go back to the original recoding scheme Below Region has been recoded into 4 groups and the table now created is displayed T ARGUS 2 2 user s manual 30 E Table GK x Regio Yar2 4 373 279 3 703 896 4 576 116 4 193 971 JV 3 dig separator Output View 20 25 2 711 808 5 15 5 5 15 719 049 642 238 648 972 701 549 Select Table Change View Table Summary 2 320 534 659 680 515 003 543 570 602 281 2 505 043 688 962 534 147 663 897 618 037 2 739 074 756 529 620 392 775 132 647 021 inl xi Cell Information Value 15847262 Status Sate Cost 16847262 Shadow 15847262 contributions 42713 Topnof 175 677 shadow 141 482 Up Low levels Change status Set to Safe Set to Unsafe Set to Protected Read Hist Recode Suppress HyperCube Singleton C Modul XPress Undo Singleton C Modul CPlex C pt Netw Suppress Undo Suppress C Opt XPress C Opt CPlex Write table Close In the case of a non hierarchical codelist the right hand pane will be an editbox In this editbox a recoding sc
3. 27 T ARGUS 2 2 user s manual lid TauARGUS File Specify Modify Output Help os BE BE aa Regio Groningen Friesland Drenthe East Overijssel Flevoland Gelderland Utrecht West Noord Holla Zuid Holland Zeeland South Noord Brab Limburg PUES coc ccccocccococccooccocococcc C5 C C3 C9 C3 C C r5 c r3 no c cgocroroc Status 3 6 02 11 52 AM Y 3 2 1 Table protection Via Modify View table you will come to the table window the heart of ARGUS Table GK x Regio Yar2 4373273 5 5 1 986 129 5 5 1 808 861 578 289 3 703 896 15 5 124336 5 526 279 2 234 995 10 5 818 286 4 576 116 485 326 3 664 560 426 230 4 193 971 15 2 752 743 15 1 441 228 v 3 dig separator A Output View T ARGUS 2 2 user s manual 2 711 808 719 049 398 062 223 990 96 997 642 238 36 311 93 589 345 803 166 535 648 972 63 767 537 911 47 294 701 549 488 613 212 936 2 320 534 659 680 348 039 221 332 90 309 515 003 32 132 94 957 251 358 136 556 543 570 75 442 430 851 37 277 602 281 392 395 209 886 Select Table Change View Table Summary 2 505 043 688 962 354 711 241 913 92 338 534 147 25 770 110 930 251 188 146 259 663 897 87 305 515 020 61 572 618 037 363 490 254 547 2 739 074 756 529 418 778 258 233 79 518 620 392 18 150 81 799 303 377 217 066 775 132 59 953
4. F 100 im a0 z 5 percentage N EN range Iv Minimum frequency i Min frequency range m 4 4 The Modify menu A T ARGUS 2 2 user s manual 4 4 1 Modify Select Table In this dialog box you can select the table you want to see If you have specified only one table this table will be selected automatically and this option cannot be accessed 4 4 2 Modify View Table Table GK x Regio Yar2 0 x A Value E 847 26 2 711 808 2320534 2505043 2799074 65 lt 16847262 4 373 279 5 5 719 049 659 680 688 962 756 529 1 5 tatus Safe 1 986 129 5 5 398 062 348 039 354711 418 778 Al Cost 15847262 1 808 861 223 990 221 332 241913 258 233 8l Shadow 15847262 578 289 96 997 90 309 92 338 79518 2 3 703 896 15 5 642238 515 003 534147 620392 1 3 Pontus scat 124 338 5 36 311 32 132 25 770 18 150 7 isi a 526 279 93 589 94 957 110 930 81 799 1 2 234 995 10 5 345 803 251 358 251 188 303 377 1 0 ipina 818 286 166 535 136 556 146 259 217 066 1 i 4 576 116 648 972 543 570 663 897 775 132 1 9 485 326 63 767 75 442 87 305 59 953 1 3 664 560 537 911 430 851 515 020 643 762 1 5 Change status 426 230 47 294 37 277 61 572 71417 2 4 193 971 15 701 549 602281 618 037 647 021 1 6 Set to Safe 2 752 743 15 488 613 392 395 363 490 402 925 111 SIUE 1 441 228 212 936 209 886 254 547 244096 5 Seto Unsafe Set to Protected Read Hist Recode Suppress meti
5. Explanatory var 2 4 6 The Help menu 4 6 1 Help Contents Shows the contents page of the help file This program has context sensitive help 4 6 2 Help Options 49 TARGUS 2 2 user s manual Options 88 There are a number of options which can be changed here Firstly if the CPlex optimisation routine is being used the location of the licence file can be specified here Also the default colours for the differently specified cells can be altered 4 6 3 Help About Shows the about box T ARGUS 2 2 user s manual 50
6. Help option Overview of the menu items me sey Meis ome u Open Microdata Select Table Save table Contents Open Table Specify Tables ViewTable View Options e IA d Table Metadata T ARGUS 2 2 user s manual 34 4 2 The File menu 4 2 1 FileljOpen Microdata The File Open microdata menu allows you to specify the Microdata file and the meta data file Open micro data Microdata JG Projects Casc Anco T au amp rgusvB Datata tau testw asc v Metadata G Projects Casc Anco T audrgusVB Datata tau_testw rda me Cancel In this dialog box you can select the microdata file and the corresponding metafile By default the microdata file has extension asc and the metafile rda When you click on Es you get an open file dialog box In this box you can search for the files you want to use You can choose other file types when you click on the file types listbox When you have selected the microdata file a suggestion for the metafile with the same name but with the extension rda is given but only when this file exists Before you click OK you must have filled in the name of the microdata file 4 2 2 File Open Table The File Open table allows a previously written table to a file to be opened see section 3 1 4 for an example of the file format The simplest approach is to generate the metafile when entering the table by applying simple meta data Here the table dimension and the labels
7. Version 2 2 User s Manual Document 4 2 D1 Statistics Netherlands Project CASC project P O Box 4000 Date April 2003 2270 JM Voorburg BPA no 769 02 TMO The Netherlands email ahnl rnd vb cbs nl Contributors Anco Hundepool Aad van de Wetering and Ramya Ramaswamy Peter Paul de Wolf Hitas Sarah Giessing GHMiter Matteo Fischetti Juan Jos Salazar and Alberto Caprara Optimisation Jordi Castro Network solutions Contents Preface tectus nuc d M cadens e a e ah fetta d EAT 3 Aboutthe name ARGUS e pd in eter ei e Pe aia 3 Contact cut e conieci Hm 3 Acknowledgmerts 2 retia ato 4 T Introduction iaa 6 2 Producing safe tables oie err ep ehem ertet ee ote trie uetus 6 2 1 Sensitive cells in magnitude tables sse ener nnne 6 2 2 Sensitive cells in frequency count tables eese eene ens 7 23s Table Tedesignis eiecti sae eon em ee tee ese er eade ir redes distr ed de e CIBO e ee E 8 2 4 Secondary Cell suppression 3 inte de ere eee eee Po eee oe eae hee atlas 8 2 5 Information loss in terms of cell weights ener enne 8 ZG Series OF tables eet et Eae net ed retro cti neige Gees NT 9 2 7 The Hypercube GHMITER method sessi ens 9 Dede The method LL E 9 2 8 The ARGUS implementation of GHMITER sese ennt enne 10 2 8 1 References on GH Miter iis err RUE EPOR EP dd Ven eased 11 2 0 Hit ss d oett et teu E den e OE ERE 12
8. 2 10 Network Solution for Large Unstructured 2 dimensional tables suse 13 2 11 Functional design of 1X ARGUS sess eene enne nnne nnne 15 SA tour ot t ARGUS 2 oto deti toda ere ode o elo t Hae E eds reae red ES thee dh 16 3 T Preparation den e o re ne red due ndisse eder dom eas 16 3 1 1 Open a microdata A IA 16 SET 2 The Metatile rete o c eee e ope ente e en 18 3 1 3 Specification ine eet ire d ree pene die era edd Ere lettre rette pe gea 19 3 1 4 Open a Tableta la nas tetra rea re media taste e etude 21 3 2 The process of disclosure control sess nennen nennen nnns 27 3 2 Table protection cite eet rere ure te e e ee re etr eerte 28 3 3 Saving the safe tables dtr et rrt doe dons i ea eee t tee deed da 32 4 Description of the Menu Items iu een nre e soucaesesevaniesenaviayad xunesa aA 34 E ACID ET PEE 34 AD The Fl MEU MP 35 4 2 1 FilelOpen Microdata cccccccesccssscesscesseesseeeseeeseecsaecsaecaecssecnsecesecseeseeeseeeeeaeeeseeeeeeaeeagees 35 4 22 File lOpen Table 5 2 c reed ea o Ter eed us tees dat Heec otis tna 35 4 2 3 File EXIE iion e eer RR ete S e t NER AGITARE ERES ni 36 4 3 The Specify nien cheer e er o mee meri eq cade ere ep ite oa ies 36 4 3 1 Specify Metafile For Microdata essere nnn 36 4 3 2 Specity Specity Tables iia ii ri 39 4 3 3 Specify TableMetadata esses ener ener enne nnne 4
9. The Singleton option is a special call to GHMiter to protect singleton cells only See section 2 9 It is only sensible for the Opt Xpress and Opt Cplex options i e the full optimisation routine It uses a pre processing GHMITER to protect singleton cells only The Read Hist option is an a priori option to be mainly used for microdata which allows you to feed Tau Argus a list of cells where the status of the standard rules can be overruled i e the status of the cells is already specified It is free format The format will be Code of first spanning variable Code of second spanning variable Status of cell Nr 4 u Zd 6 p The hypercube method see section 2 7 will calculate a suppression pattern without using an LP optimisation module So it can be used without any additional licence In more detail Opt Xpress and Opt Cplex are the full optimisation solutions while Modul Xpress and Modul Cplex are modular partial optimisation routines The theory behind these modular routines is expanded in section 2 9 The Xpress and Cplex methods are two different implementation of HITAS the search algorithms described in section 2 9 The Network approach is Jordi Castro s network solution for non hierarchical two dimensional tables See section 2 10 In every case you will in principle end with a safe table indicated by the additional blue secondary cells You are now ready to store the table with the write table button Th
10. before re running the hypercube method the solution will be a feasible solution for TARGUS 2 8 1 References on GHMiter 1 Repsilber R D 1994 Preservation of Confidentiality in Aggregated data paper presented at the Second International Seminar on Statistical Confidentiality Luxembourg 1994 2 Repsilber D 1999 Das Quaderverfahren in Forum der Bundesstatistik Band 31 1999 Methoden zur Sicherung der Statistischen Geheimhaltung in German 3 Repsilber D 2002 Sicherung pers nlicher Angaben in Tabellendaten in Statistische Analysen und Studien Nordrhein Westfalen Landesamt f r Datenverarbeitung und Statistik NRW Ausgabe 1 2002 in German 4 Giessing S and Repsilber D 2002 Tools and Strategies to Protect Multiple Tables with the GHQUAR Cell Suppression Engine in Inference Control in Statistical Databases Domingo Ferrer Editor Springer Lecture Notes in Computer Science Vol 2316 5 Giessing S 2003 Co ordination of Cell Suppressions strategies for use of GHMITER Proceedings of the Joint ECE Eurostat work session on statistical data confidentiality Luxembourg 7 9 April 2003 11 T ARGUS 2 2 user s manual 2 9 Hitas HiTaS is a heuristic approach to cell suppression in hierarchical tables Hierarchical tables are specially linked tables at least one of the spanning variables exhibits a hierarchical structure i e contains many sub totals In Fische
11. level Additionally you can change the coding for the missing values by entering these codes in the relevant textboxes Pressing the Apply button will actually restructure your table if required you can always undo a recoding The Options at the Bottom of the table Change View When you click on Change View in the Table window after clicking on Modify View Table at an earlier stage this dialog box pops up You can specify which variable you want in the row and the column In the two dimensional case you can only transpose the table In the higher dimensional case the remaining variables will be in the layer For these layer variables a combo box will appear at the top of the table where you can select a code This will show the corresponding slice of the table 45 T ARGUS 2 2 user s manual Change View Table summary The table summary will give an overview of the number of cells according to their status Summary for tabel no 1 Safe Safe Manual 8 0 1 0 0 3 Protected Secondary Secondary Manual 0 3 dig separator This removes or inserts the comma separating the thousands for the values in the table Output View This options allows the table to be shown as it will be output with suppressed cells primary and secondary replaced by a X Secondary suppressions We now discuss the actions in the suppress pane in the table window after selecting mo
12. C request protection Codelist automatic Missings codelist filename t is Ea v hierarchical C Levels from microdata Ma Ea as d sed s o e osa an ul New Bi Levels from file Leading sting s pase 2 sPosanFiesTar rgusidatauegone The left pane shows the names of the variables Besides information on the position of the variables you can indicate whether a variable can be used as explanatory variable in a table or as a response variable It is also possible to indicate that one variable can be used as a sample weight variable These sample weights can be used in applying the safety rules see 3 1 2 As is shown on the top left of the screen the input datafile can be entered in fixed format or in free format with a specified separator In the bottom half of the window all kind of details on the codelist can be stored e The codelists t ARGUS will explore the datafile and build the codelist for the explanatory variables This is the automatic option However additionally the user can specify a codelist file This codelist will be used to provide more meaningful labels attached to the codes in some of the screens of 1 ARGUS Missing values t ARGUS needs to know which missing values are attached to a codelist Additionally for each code the missing values at least one should be specified In many surveys two missing values are used e g one for don t know and one for refusal e Hierarchical co
13. Cplex package Also with CPlex there is an arrangement for the use of T ARGUS It is the choice of the users However users having a licence for one of these packages can use their current licence for TARGUS as well The CASC project The CASC project on the one hand can be seen as a follow up of the SDC project of the 4th Framework It will build further on the achievements of that successful project On the other hand it will have new objectives It will concentrate more on practical tools and the research needed to develop them For this purpose a new consortium has been brought together It will take over the results and products emerging from the SDC project One of the main tasks of this new consortium will be to further develop the ARGUS software which has been put in the public domain by the SDC project consortium and is therefore available for this consortium The main software developments in CASC are 1 ARGUS the software package for the disclosure control of microdata while t ARGUS handles tabular data The CASC project will involve both research and software development As far as research is concerned the project will concentrate on those areas that can be expected to result in practical solutions which can then be built into future version of the software Therefore the CASC project has been designed round this software twin ARGUS This will make the outcome of the research readily available for application in the daily practice of th
14. and columns can be combined sensitive cells can be suppressed and additional cells to protect these can be found in some optimum way secondary cell suppression t ARGUS is one of a twin set of disclosure control packages Within the CASC project a tool for microdata called ARGUS is also being developed which is the twin brother of ARGUS This is manifest not only when one looks at the user inter faces of both packages but also when one would look at the source code the bodies of the twins are so much combined that they in fact are like Siamese twins About the name ARGUS Somewhat jokingly the name ARGUS can be interpreted as the acronym of Anti Re identification General Utility System As a matter of fact the name ARGUS was inspired by a myth of the ancient Greeks In this myth Zeus has a girl friend named lo Hera Zeus wife did not approve of this relationship and turned Io into a cow She let the monster Argus guard Io Argus seemed to be particularly well qualified for this job because it had a hundred eyes that could watch over Io If it would fall asleep only two of its eyes were closed That would leave plenty of eyes to watch Io Zeus was eager to find a way to get Io back He hired Hermes who could make Argus fall asleep by the enchanting music on his flute When Hermes played his flute to Argus this indeed happened all its eyes closed one by one When Hermes had succeeded in making Argus fall asleep Argus was deca
15. belonging to multiple sub tables are counted multiple times In our experience this concerns particularly the cases where the protection level was reduced to an infinitely small positive value in step 10 see above Step 10 is usually required to confirm protection of large high level secondary suppressions which are likely to appear in multiple tables especially in processing of linked tables By the way terms reduction of the s iding protection ratio and reduction of the protection level are used synonymously in the report file e Note that step 11 will make cells eligible for secondary suppression that t ARGUS considers as protected so called frozen cells for discussion of this option see for instance 5 The file will not be produced for tables with more than four dimensions or more than 50 000 cells T ARGUS 2 2 user s manual 10 As this is inconsistent with the current view on protected cells in t ARGUS this will lead to the following error message TauARGUS x e The hypercube method could not suppress this table successfully some Frozen protected cells need to be suppressed Codes and cell values of those suppressed frozen cells are then displayed by ARGUS I iol List of suppressed but frozen cells Cell value and the codes 15 00 2t o Os See also file C ADOCUME 1 ahni LOCALS 1 Temp Frozen txt When the status of these cells is changed into unprotected
16. can choose what you like best Additionally you can change the coding for the missing values by entering these codes in the relevant textboxes And you can specify the name of a new codelist with the labels for the new coding scheme Pressing the Apply button will actually restructure your table And if required you can always undo a recoding If you apply a recoding t ARGUS will present you with the results This can be that certain codes could not be found or that you did not follow the above described syntax In that case an error message will be shown Alternatively a warning could be issued e g if you did not recode all original codes t ARGUS will inform you But it can be your purpose and there is no objection to it In the example above t ARGUS informs you that 4 codes have not been changed 4 4 2 2 Recoding a hierarchical variable In the hierarchical case the code scheme is typically a tree To global recode a hierarchical variable means that you manipulate a tree The standard Windows tree view is used to present a hierarchical code T ARGUS 2 2 user s manual 44 Global Recode rele Missing taes You can fold and unfold certain parts of a tree with the standard Windows actions clicking on and The combo box at the top of the screen offers the opportunity to fold and unfold the tree to a certain
17. cells are considered safe or unsafe In this version of ARGUS the q parameter is fixed to 100 Literature refers to this rule as minimum protection of p rule If the intention is to state a prior posterior rule with parameters p and qo where q lt 100 choose the parameter p of the p rule as p p q 100 With these rules as a starting point it is easy to identify the sensitive cells provided that the tabulation package has the facility not only to calculate the cell totals but also to calculate the number of contributors and the n individual contributions of the major contributors Tabulation packages like ABACUS from Statistics Netherlands and the package SuperCross developed in Australia by Space Time Research have that capacity In fact TARGUS not only stores the sum of the n major contributions for each cell but the individual contributions themselves The reason for this is that this is very handy in case rows and columns etc in a table are combined By merging and sorting the sets of individuals contributions of the cells to be combined one can quickly determine to major contributions of the new cell without going back to the original file This implies that one can quickly T ARGUS 2 2 user s manual 6 apply the dominance rule to the combined cells Combining rows and columns table redesign is one of the major instruments to reduce the number of unsafe cells This too is the reason why t ARGUS reads microdata fi
18. chosen by the software More information on the hypercube method can be found in section 2 7 When you are satisfied will the table you can store the table Press the write table button This is virtually the same button as via the menu Output Save table section 4 5 1 4 4 3 Linked Tables This option is available when the tables specified have at least one spanning variable in common and the same response variable An example 1s shown 347 T ARGUS 2 2 user s manual Specify Tables Ox m cell items explanatory variables response variable E Var2 lt EE shadow variable Var2 Sbi GK Regio gt m cost variable ES unity C frequency C variable r safety rule Dominance rule ERIS N 1 percent Tr USE Holgings a range an B A I3 number 75 percentage v Minimum frequency 1 fo Min frequency range A Apply Weights ae wh dh 30 T Ed Appl Weights in Safety Rule 5 MinFreq 1 Shadow Var2 1 5 MinFreq 1 Shadow War2 1 Cancel Compute tables After the tables have been computed under the Modify Tables option Linked Tables is available The following table appears Clicking on Go will protect the tables simultaneously Linked tables x There are 2 tables GK X Regio YEAR X GK Do you want to protect them simultaneously Cancel This procedure is only available when mi
19. from manual manually made safe during this session Unsafe According to the safety rule Unsafe from manual manually made unsafe during this session Suppressed Made unsafe by the secondary cell suppression Protected Cannot be selected as a candidate for secondary cell suppression Zero Value is zero and cannot be suppressed Empty No records contributed to this cell and the cell cannot be suppressed The second pane Change Status on the right will allow you to change the cell status Only a few logical transitions are allowed Changing the cell status can be useful if an unsafe cell may be published because of external reasons permissions to publish Changing to protect will prevent the system to select this cell as a secondary suppression The recode button will bring you to the recoding system Recoding is a very powerful method of protecting a table Collapsed cells tend to have more contributors and therefore tend to be much safer 4 4 2 1 Recoding a non hierarchical variable There is a big difference in recoding a hierarchical variable compared to a non hierarchical variable Global Recode GI xi Al Variable HE G Projects Casc Anco T au amp rgus v BXD atata Gk gre Close Codelist for recode o A Warning sl m Missing Values 1 2 Number of untouched codes 4 In the non hierarchical case you can specify a global recoding manually Either
20. made through Specify table metafile In this dialog box all attributes of the variables can be specified In section 3 1 2 we already have explained the layout of this file If under FileJOpen Microdata a rda file has been specified this dialog box shows the contents of this file If no rda file has been specified the information can be specified in this dialog box after pushing the New button As default newvar is substituted Apart from defining a new variable an existing one can be modified or deleted The following attributes can be specified e name of the variable e its first position in the data file e its length and the number of decimals Furthermore the kind of variable can be specified explanatory variable response variable or weight variable An explanatory variable can be used as a spanning variable in the row or the column of the table a response numerical variable can be used as cell item A weight variable specifies the weight of the record and is based on the post sampling design used The Holding indicator allows several contributions to one cell coming from one holding to be counted together as one contribution in both the safety rules and in the marginals As an example a company in the totals and marginals will be considered as one enterprise record although you might have records for each individual branch of the company A restriction is that the data set is sorted at least all records for one holdi
21. 1 44 The Modity menu teret teta teet Rer e tee 41 4 4 1 Modify Select Table enne enne nennen nnne nennen nnne nnne 42 4 42 Modity View Table il oa P ette ale le de Eis Fede aL agde teta 42 4 4 3 Lanked Tables tio ES Re De RERO E HER EE RENTUR EORR 47 45 T he Output Men wees utei teet tre eere Ge e DE ee e p D E aie 48 4 5 1 Output Save Table t te ater eie i tte te i edd 48 453 2 Output View Report e tse Re rer Ree ER ERR D E OR REEL ERE TERR 49 A 6 PRG Help meni EE 49 4 61 Help Gontents i inei erc ee hr e t e Ree Rr er e e Seles 49 4 6 2 Help Options i ete nl agenda d 49 4 6 3 Help ADBOUC xc ee T ERE ne asesores He Feed see e Tu Eee a aa 50 T ARGUS 2 2 user s manual 2 Preface This is the user s manual of t ARGUS version 2 2 T ARGUS is a software tool designed to assist a data protector in producing safe tables This version is the second release of T ARGUS in the CASC project With respect to the t ARGUS version from the previous project we have made a major step forward and T ARGUS has now facilities to protect hierarchical and linked tables The purpose of t ARGUS is to protect tables against the risk of disclosure i e the accidental or deliberate disclosure of information related to individuals from a statistical table This is achieved by modifying the table so that it contains less or less detailed information t ARGUS allows for several modifications of a table a table can be redesigned meaning that rows
22. 643 762 71 417 647 021 402 925 244 096 15 16847 262 e 16857262 16947262 ans 175 677 141 482 Up Low levels A AA contributions Top n of shadow Change status Set to Safe Set to Unsafe Set to Protected ReadHist Recode Suppress oP Singleton HyperCube C Modul XPress Undo Singleton C Modul CPlex Gpt Netw Suppress C Opt XPress Undo Suppress C Opt CPlex Write table Close 28 This window shows the table you have selected with Modify View Table On the left side the table itself is shown in a spreadsheet view Safe cells are black unsafe cells are red secondary suppressed cells are blue and empty cells have a hyphen The two check boxes on the left bottom give you some control over the layout e Clicking on the 3 digit separator will show the cell values with this separator The separator is chosen according to the general Windows settings e The Output view shows the table with all the suppressed cells replaced by an X this is how the safe table will be published The options at the bottom of the table Via Change view you can transpose the table You simply indicate which variable will be the row variable and which will be the column variable and the table will be transposed If more than two explanatory variables had been selected the other variables will be in the layer and shown as combo
23. RLEVELS gt The hierarchy is derived from the digits of the codes itself The specification is followed by a list of integers denoting the width of each level The sum of these integers should be the width of the total code e lt HIERCODELIST gt The name of the file describing the hierarchical structure Default extension HRC e lt HIERLEADSTRING gt The string character that is used to indicate the depth of a code in the hierarchy An example of a metadata file is shown below Here the variable Year for each record begins on position 1 is 2 characters long and missing values are represented by 99 It is also recodeable The following lines give the information for variable Sbi This variable begins on position 4 and is 5 characters long Missing values are represented by 99999 and as well as being recodeable this variable is hierarchical and the hierarchy levels are shown The first 3 characters are in the top hierarchy level the 4 character in the second level and the 5 character in the lowest level WEAN dL 2 99 lt RECODEABLE gt Slot O RECODEABLE gt lt HIERARCHICAL gt SIU RLEWIhwSs 3 L i O Q Ex 9 2 98 lt RECODEABLE gt Regio 12 2 9 lt RECODEABLE gt lt CODELIST gt Region cdl lt HIERCODELIST gt Region2 hre lt HIERLEADSTRING gt lt HIERARCHICAL gt Wee 14 4 9999 lt NUMERIC gt lt DECIMALS gt 1 lt WEIGHT gt Wari 19 IIS lt NUMERIC gt Vaz 28 10 I9I9999999S lt NUMERIC gt
24. a Singleton C Modul XPress Undo Singleton Modul CPlex Gpt Netw Suppress C pt XPress mme C OptiPlex Uno Sunpress Iv 3 dig separator Select Table Change View Write table Close Output View Table Summary This window shows the table you have selected with Modify View Table On the left side the table Itself is shown in a spreadsheet view Safe cells are black unsafe cells are red secondary suppressed cells are blue and empty cells have a hyphen The two check boxes on the left bottom give you some control over the layout In the example shown here the complete table cannot be seen on the screen The cursor at the bottom of the table can be used to display the remaining columns e The 3 digit separator will show the cell values with this separator See Options at the bottom of the table for more details The separator is chosen according to the general Windows settings e The Output view shows the table with all the suppressed cells replaced by an X this is how the safe table will be published When you click on a cell in the main body of the table information about this cell is visible in Cell Information pane You can see the following information 1 The cell value 2 The cell status 3 The number of contributors to a cell 4 The largest contributors Status is the status of the cell this can be e Safe Does not violate the safety rule T ARGUS 2 2 user s manual 42 Safe
25. a is present Specify metafile Free format Attributes name To C explanatory variable C frequency sgg L C response variable topN variable lenath I2 C weight variable C status indicator decimals fo C holding indicator C request protection Codelist f automatic Code for Total ri m Missings C codelist filename LJ a Ja hierarchia Pede ANTAJANA New Es Levels fram fil Leading string ai Delete RB nec T ARGUS 2 2 user s manual 24 When the Specify Metafile option is followed the Specify Table metadata option is also available and the window is displayed here This will allow the application of safety rules such as the Dominance Rule and the p rule which are described in sections 2 1 and 3 1 3 As also outlined in section 3 1 3 there are options for Cost function for secondary suppression If a minimum frequency for cell safety if required this can be entered in this window as well Specify table I ial xl m Variables Explanatory CostFunction for Gtatus second suppression Frequency ResponsVar TopN 3 C Frequency Number 2 C Unity safety rule Dominance rule C P rule C None 3 number Pfs Ni iento a 55 percentage ranan v Minimum frequency 1 Min frequency range 30 Two examples of datafile for 2 way tables are shown here Note the cells have already been specified manually to
26. a table 3 1 4 3 1 1 Open a microdata file Both a microdata file and the metadata describing this microdata file are required The microdata file must be either a fixed format ASCII file or a free format file with a specified separator If you click FilelOpen Microdata you can specify the name of the microdata file and the name of the file containing the metadata Microdata G Projects Casc Anco T audrgus B Datata tau_testW asc d Metadata G Projects Casc Anco T auArgus VB D atataMau lestW rda Cancel OK The program assumes the extension ASC for the datafile and RDA for the metadata but you can use your own extensions The metadata is stored in a separate file If the name of the metadata file is the same as the datafile except for the extension 1 ARGUS will fill in this file automatically If no metadata file is specified the program has the facility to let the user specify the metadata interactively via the menu option Specify Metafile This is also the place to make changes to the metadata In subsection 3 1 2 we will give a description of the metadata file for TARGUS When you enter or change the metadata interactively using T ARGUS the option Specify Metafile will bring you to this screen T ARGUS 2 2 user s manual 16 Specify metafile Attributes name Regio explanatory variable starting position 12 response variable length 2 C weight variable decimals fo holding indicator
27. able all possible hypercubes with this cell as one of the corner points are constructed For each hypercube a lower bound is calculated for the width of the suppression interval for the primary suppression that would result from the suppression of all corner points of the particular hypercube To compute that bound it is not necessary to implement the time consuming solution to the Linear Programming problem If it turns out that the bound is sufficiently large the hypercube becomes a feasible solution For any of the feasible hypercubes the loss of information associated with the suppression of its corner points is calculated The particular hypercube that leads to minimum information loss is selected and all its corner points are suppressed After all sub tables have been protected once the procedure is repeated in an iterative fashion Within this procedure when cells belonging to more than one sub table are chosen as secondary suppressions in one of these sub tables in further processing they will be treated like sensitive cells in the other sub tables they belong to The same iterative approach is used for sets of linked tables It should be mentioned here that the hypercube criterion is a sufficient but not a necessary criterion for a safe suppression pattern Thus for particular subtables the best suppression pattern may not be a set of hypercubes in which case of course the hypercube method will miss the best soluti
28. ables specified Pressing the Compute tables button will invoke t ARGUS to actually compute the tables requested and you are ready to start the process of disclosure control t ARGUS will come back with the main window showing you the number of unsafe cells per variable per dimension Specify Tables OF ES cell items r explanatory variables response variable ES Sbi gt GK g Regio gt shadow variable Var2 gt cost variable ZE unity frequency C variable safety rule y O Dominance rule C None JT Request rule 7 N 2 t 70 Ee A Zz inch au Use holdings inte 75 percentage i range 30 E Iv Minimum frequency 1 lt a D Apply Weights Manual safety range a0 E Min frequency range o X M Appears m Sate Rule Espl vas mie Resp ver Shadow amp Cost var GK Regio n 3 k 75 MinFreq 1 Var2 Shadow ar2 1 Cancel Compute tables 3 1 4 Open a Table This is the second of these options outlined at the beginning of section 3 1 and is reached by selecting Open a Table on the main window of Tau Argus The datafile containing the table to be opened in the format given below needs to be specified in the top line A metadata file can be entered explicitly by the user in this window or created within the program A simple alternative is to apply the simple metadata option where th
29. about possible disclosure risks that a frequency count table poses and possible disclosure scenarios in order to simulate the behaviour of an intruder Such an analysis would probably come up with different insights than using a simple thresholding rule e g like the one sketched in the reference just mentioned See for instance Leon Willenborg and Ton de Waal 1996 Statistical disclosure control in practice Springer Verlag New York Section 6 3 7 T ARGUS 2 2 user s manual 2 3 Table redesign If a large number of sensitive cells are present in a table it might be an indication that the spanning variables are too detailed In that case one could consider combining certain rows and columns in the table This might not always be possible because of publication policy Otherwise the number of secondary cell suppressions might just be too enormous The situation is comparable to the case of microdata containing many unsafe combinations Rather than eliminating them with local suppressions one can remove them by using global recodings For tabular data we use the phrase table redesign to denote an operation analogous to global recoding in microdata sets The idea of table redesign is to combine rows columns etc by adding the cell contents of corresponding cells from the different rows columns etc It is a property of the dominance rule that a joint cell is safer than any of the individual cells So as a result of this operation th
30. abular data Universitat Rovira i Virgili Spain Microdata The CASC tabular data team IS 5 T ARGUS 2 2 user s manual 1 Introduction The growing demands from researchers policy makers and others for more and more detailed statistical information leads to a conflict The statistical offices collect large amounts of data for statistical purposes The respondents are only willing to provide the statistical offices with the required information if they can be certain that these statistical offices will treat their data with the utmost care This implies that their confidentiality must be guaranteed This imposes limitations on the amount of detail in the publications Practice and research have generated insights into how to protect tables but the problem is certainly not definitively settled Before we go into more details the basic ideas on which t ARGUS is based we give a sketch of the general ideas At first sight one might find it difficult to understand that information presented in tabular form presents a disclosure risk After all one might say that the information is presented only in aggregate form 2 Producing safe tables Safe tables are produced from unsafe ones by applying certain SDC measures to the tables In the current section these SDC measures as far as they are implemented in t ARGUS are discussed in the present section Some key concepts such as sensitive cells information loss and the like are discus
31. al or not are checked for primary suppression Knowing all primary unsafe cells the secondary cell suppressions have to be found in such a way that each sub table of the base table is protected and that the different tables cannot be combined to undo the protection of any of the other sub tables The basic idea behind the top down approach is to start with the highest levels of the variables and calculate the secondary suppressions for the resulting table The suppressions in the interior of the protected table is then transported to the corresponding marginal cells of the tables that appear when crossing lower levels of the two variables All marginal cells both suppressed and not suppressed are then fixed in the calculation of the secondary suppressions of that lower level table i e they are not allowed to be secondarily suppressed This procedure is then repeated until the tables that are constructed by crossing the lowest levels of the spanning variables are dealt with A suppression pattern at a higher level only introduces restrictions on the marginal cells of lower level tables Calculating secondary suppressions in the interior while keeping the marginal cells fixed is then independent between the tables on that lower level 1 e all these sub tables can be dealt with independently of each other Moreover added primary suppressions in the interior of a lower level table are dealt with at that same level secondary suppressions
32. an be set to Safe but not Protected as a cell selected for primary suppression cannot be selected for secondary suppression Of course if it is declared Safe it is still a candidate for secondary suppression The second pane Change Status on the right will allow you to change the cell status Only a few logical transitions are allowed Changing the cell status can be useful if an unsafe cell may be published because of external reasons permissions to publish Changing to protect will prevent the system selecting this cell as a secondary suppression The recode button will bring you to the recoding system Recoding is a very powerful method of protecting a table Collapsed cells usually have more contributors and therefore tend to be much safer Only microdata is available for recoding Specifying and selecting recodings Global Recode iof xi Al Variable Read Maximum level y GK aC ae ndo Close dissing Vales 1 B I In the above example we have selected the Region variable to recode This window will behave differently whether the variable selected is hierarchical or not In the hierarchical case the codes are shown in a hierarchical tree The standard windows facilities to manipulate trees can be applied here as well Folding and unfolding of branches is carried out by clicking on the
33. be unsafe or safe Datafile 1 shows 2 explanatory variables 1 response variable and an indication of whether a cell is safe or unsafe 1200 300 300 300 300 lt 23H NN OM ON NS SN ON A SA A A GN GS GM GN GM GM GM 0x Mox pa em E 3 pa o o sos qeu pesi oo DD Ms Ri o o sos spp OD D DCDAWPHUDVAWPFPHVAWPHVUVAWPH o E 000000 00000000000 E WWWWWNHNNNNE ES a nt em em Datafile 2 shows 2 explanatory variables 1 response variable cell frequency the top 3 values in each cell and an indication of whether a cell is safe or unsafe 25 T ARGUS 2 2 user s manual TL po T200 490 20530590 p T A 300 3 40 30 20 s T B 300 3 40 30 20 s T C 300 3 40 30 20 s T D 300 3 40 30 20 s ly T 400 4 40 30 20 s y A 100 3 40 30 20 s gt B 100 3 40 30 20 s 1 C 100 3 40 30 20 s lx D 100 3 40 30 20 s 2 T 400 4 40 30 20 s 2 A 100 3 40 30 20 s 2 B 100 3 40 30 20 s 2 C 100 3 40 30 20 u 2 D 100 3 40 30 20 u 3 T 400 4 40 30 20 8 3 A 100 3 40 30 20 s 3 B 100 3 40 30 20 s 3 C 100 3 40 30 20 s 3 D 100 3 40 30 20 nes The next stage is to view the table and the operations that follow are as for entering metadata Below is an example of the table for datafile 1 with the unsafe cells declared on the data file highlighted This table will be explained in depth i
34. boxes on the top Clicking on Table summary will give you an overview of variables involved and the number of safe and unsafe cells at the particular stage i e before or after recoding or secondary suppressions have been carried out When you click on a cell information about this cell is visible in Cell Information pane You can see the following information 1 The cell value 2 The cell status 3 The number of contributors to a cell 4 The largest contributors Status is the status of the cell this can be e Safe Does not violate the safety rule Safe from manual manually made safe during this session Unsafe According to the safety rule Unsafe from manual manually made unsafe during this session Suppressed Made unsafe by the secondary cell suppression Summary for tabel no 1 x Explan Var Safe Regio Safe Manual 0 Unsafe 13 Unsafe Manual 0 Protected 0 Secondary 0 Secondary Manual 0 Empty 47 Respons Var Wa2 Shadow Var D Not yet protected Cost Var e Protected Cannot be selected as a candidate for secondary cell suppression e Zero Value is zero and cannot be suppressed e Empty No records contributed to this cell and the cell cannot be suppressed 29 T ARGUS 2 2 user s manual These status options for the cells can only be changed when it is logically correct to do so For example after computing a table an Unsafe cell i e selected for primary suppression c
35. can only occur in the same interior since the marginal cells are kept fixed However when several empty cells are apparent in a low level table it might be the case that no solution can be found if one is restricted to suppress interior cells only Unfortunately backtracking is then needed Obviously all possible sub tables should be dealt with in a particular order such that the marginal cells of the table under consideration have been protected as the interior of a previously considered table To that end certain groups of tables are formed in a specific way see De Wolf 2002 All tables within such a group are dealt separately using the mixed integer approach The number of tables within a group 1s determined by the number of parent categories the variables have one level up in the hierarchy A parent category is defined as a category that has one or more sub categories Note that the total number of sub tables that have to be considered thus grows rapidly Singletons Singleton cells should be treated with extra care The single respondent in this cell could easily undo the protection if no extra measures were taken The most dangerous situation is that there are only two T ARGUS 2 2 user s manual 12 singletons in a row or one singleton and one other primary unsafe cell These singletons could easily disclose the other cell In the current implementation we have made sure that at least two singletons in one row or column
36. cannot disclose each other information For this we will increase the protection margins of these singletons such that the margin of the largest is greater than the cell value of the smallest References on HITAS Fischetti M and J J Salazar Gonzalez 1998 Models and Algorithms for Optimizing Cell Suppression in Tabular Data with Linear Constraints Technical Paper University of La Laguna Tenerife P P de Wolf 2002 HiTaS a heuristic approach to cell suppression in hierarchical tables Proceedings of the AMRADS meeting in Luxembourg 2002 Additional reading on the optimisation models can be found at the CASC website http neon vb cbs nl casc RelatedPapers html 99wol heu r pdf 2 10 Network Solution for Large Unstructured 2 dimensional tables Here only the introduction and references to a large document by Jordi Castro are shown The network flows package for cell suppression NF CSP implements two heuristics for the protection of statistical data in two dimensional tables The heuristics are improved versions i e faster than those originally presented in 5 and 7 for the secondary cell suppression problem General details about the algorithms implemented in NF CSP can be found in 3 a thorough description will be provided in a future paper In the first heuristic derived from 5 only flows 0 or 1 are sent through the network We will refer to it as the 0 1 flows heuristic The second will be denoted as the n f
37. crodata is read into the program Currently when tabular data are entered only one table can be entered thus making the possibility of linked tables impossible 4 5 The Output menu 4 5 1 Output Save Table Basically you have three options of storing the tables 1 As a CSV file This Comma separated file can easily be read into Excel Often it is better to click on this file in the windows explorer than reading it into Excel by using the Filelopen T ARGUS 2 2 user s manual 48 option of Excel 2 A CSV file for a pivot table This offers you the opportunity to make use of the facilities of pivot table in Excel The status of each cell can be added here as an option 3 A file in the format code value separated by commas Here the cell status is again an option Also empty cells can be suppressed from the output file if required Finally a report will be generated to a directory specified by the user This report will be shown when the table has been written As this is an HTML file it can be viewed easily later 4 5 2 Output View Report Views the report file which has been generated with Output Save Table View Report OF x T ARGUS Report Table created date 03 07 2002 time 15 31 34 Original file GAProjectsiCasciAncolTauArgusVBlDatatallau_testw asc Meta file GAProjectsiCasciAncoVauArgusVBDafataMau test rda Table file GAProjectsiCasciAncoV au Argus VB Datatalx csv Table structure Explanatory var f
38. denotes a level in the hierarchy This is the traditional system for common code lists like industry code NACE etc Additionally in t ARGUS you can specify that sometimes two or more digits together denote one level 2 If the coding system used does not contain itself the information about the hierarchy this information should be supplied to T ARGUS in an extra file default extension hrc The layout of this file has been explained already in section 3 1 2 In this file the tree structure of the hierarchy is stored A special character is used to indicate the depth in the hierarchy of a code In this example an has been chosen Additionally for each code the missing values at least one should be specified In many surveys two missing values are used e g one for don t know and one for refusal Specify Metafile For Tabular data The window here is similar to that for microdata with a few changes Noticeably the Status Indicator can be altered along with the code for the total This window is not operational if the simple metadata option is chosen T ARGUS 2 2 user s manual 38 Specify metafile Free format gm EVENS TOM megaa in E Ej mj a la _ El El EVENS Onn ile 4 3 2 Specify Specify Tables In this dialog box you can specify the tables you want to protect In one run of t ARGUS more than one table can be specified but the tables will be protected separately Also you have to specify the
39. des As TARGUS has now the facilities to protect hierarchical tables you can instruct T ARGUS here on the nature of these hierarchical codelists There are basically two options 1 The hierarchy can be derived from the digits of the individual codes Each digit denotes a level and if required some digits can be grouped together 2 A file containing the hierarchical structure 1s specified In this file the level of the nesting is indicated by a special character string In the example above the had been selected 17 T ARGUS 2 2 user s manual 3 1 2 The Metafile The metafile describes the variables in the microdata file both the record layout and some additional information necessary to perform the SDC process Each variable is specified on one main line followed by one or more option lines 1 The first line gives the name of the variable followed by the starting position for each record the width of the field and either one or two missing value indicators for the record 2 The following lines specify specific characteristics of the variable e lt RECODEABLE gt This variable can be recoded and used as an explanatory variable in pou ane ee ee e lt CODELIST gt This explanatory variable has a codelist The name of the codelist A 0o 7 al e lt NUMERIC gt This variable can be used as cell item e lt DECIMALS gt The number of decimal position for this variable e WEIGHT This variable contains the weighting scheme e lt HIE
40. dify table TARGUS 2 2 user s manual 46 With suppress you can protect your table by adding additional cells to be suppressed This is necessary to make a safe table In this version of t ARGUS you can choose between the hypercube method and the optimal solutions In this version we have implemented both the full optimal and modular partial search algorithms for hierarchical tables which use the HITAS approach See section 2 9 This partial method will break the hierarchical table down to several non hierarchical tables protect them and compose a protected table from the smaller pieces As this method uses the optimisation routines an LP solver is required Either Xpress or CPlex is required It is the responsibility of the users of t ARGUS to apply for a licence of one of these commercial packages themselves Information on obtaining one of these licences will be found in a read me file that will be supplied with the software Just select one of the options and press the Suppress button TARGUS will start working for you and finally it will show you a protected table The secondary suppressed cells will be shown in blue If you had selected the hypercube method t ARGUS will ask you to complete the following information GHMiter specifications Additional parameters for the use of GHMiter v Protection against inferential disclosure required 100 external a priori bounds on the cell values The ratio parameter is
41. e Manual Library ILOG 2000 7 Kelly J P Golden B L Assad A A Cell Suppression disclosure protection for sensitive tabular data Networks 22 1992 28 55 8 Castro J User s and programmer s manual of the network flows heuristics package for cell suppression in 2D tables Technical Report DR 2003 07 Dept of Statistics and Operations Research Universitat Polit cnica de Catalunya Barcelona Spain 2003 See http neon vb cbs nl casc deliv 41D5 NF Tau Argus pdf T ARGUS 2 2 user s manual 14 2 10 2 11 Functional design of ARGUS Microdata Microdata description Specify Table s Specify safety criteria TABULATION Select Table s INTERACTIVE TABLE REDESIGN IDENTIFY SENSITIVE CELLS SEC CELL SUPPRESSION Modular XPress CPlex SEC CELL SUPPRESSION Hypercube GENERATE SAFE TABULAR DATA Safe table s Tabular data Table description READ TABLE SEC CELL SUPPRESSION Optimal XPress CPlex SEC CELL SUPPRESSION Network Disclosure report EDE T ARGUS 2 2 user s manual 3 A tour of t ARGUS This section will give the reader an introduction to the use of TARGUS Some Windows experience is assumed In section 4 a more systematic description of the different parts of TARGUS will be given 3 1 Preparation To start the disclosure control with T ARGUS there are two possible options Open a microdata file 3 1 1 3 1 3 Open
42. e can be used to identify the primary unsafe cells It is also possible to specify no safety rule apart from a minimum frequency value A further option is to apply the Request Rule This is a special option applicable in certain countries relating to foreign trade statistics Here cells are protected when the largest contributor represents over for example 70 of the total and that contributor asked for protection Therefore you need a variable indicating the request This rule cannot be applied along with any other rule The request code asked for will replace the value for that contributor When a cell is set manually unsafe an option to discussed later t ARGUS cannot calculate safety ranges itself So the user must supply a safety percentage manually The safety range T ARGUS 2 2 user s manual 20 corresponding to the minimum frequency rule is approximately zero So as a rule a small positive value should be entered in the box Minimum frequency range When you have filled in everything you click v to transport all the specified parameters to the listwindow on the bottom You can specify as many tables as you want but as the size of the memory of a computer is still restricted you should not overdo it If you want to modify an already made table you press the button More than one Response Variable can be specified by the user This will produce tables for each of each of the Response Variables using the Spanning vari
43. e dimension of the table is specified along with a label for the marginal totals 21 T ARGUS 2 2 user s manual If the Simple metadata route is followed click on Go Directly to create the table Here basic metadata can be generated The Manual Safety Range is equivalent to Manual Safety Range in the Specify tables window The Field Separator distinguishes between the cells in the datafile The next allowed operation is to view the table Open Table file X Table data file D ASDCAT au amp rgusSSimpletabl tab Table meta data file m Cancel JF e Simple meta data Field separator im Dimension Label for total Safety range 0 x Cancel Go Directly If the Table metadata file option is chosen even if no file is selected the OK button can be clicked and this allows the metafile to be specified under Specify Metafile Here the variables can be specified as required The options are Explanatory Variable The spanning variables used to produce the table Response Variable The variable used to calculate the cell totals Weight Variable If each cell has an associated weight the variable can be declared here Frequency This indicates the number of observations making up the cell total If there is no frequency variable each cell is assumed to consist of a single observation topN variable This shows the values
44. e number of unsafe cells is reduced One can try to eliminate all unsafe combinations in this way but that might lead to an unacceptably high information loss Instead one could stop at some point and eliminate the remaining unsafe combinations by using other techniques such as cell suppression 2 4 Secondary cell suppression Once the sensitive cells in a table either of magnitude or a frequency count type have been identified possibly following table redesign it might be a good idea to suppress these values In case no constraints on the possible values in the cells of a table exist this is easy one simply removes the cell values concerned and the problem is solved In practice however this situation hardly ever occurs Instead one has constraints on the values in the cells due to the presence of marginals and lower bounds for the cell values typically 0 The problem then is to find additional cells that should be suppressed in order to protect the sensitive cells The additional cells should be chosen in such a way that the interval of possible values for each sensitive cell value is sufficiently large What is sufficiently large is to be specified by the data protector by specifying the protection intervals In general the secondary cell suppression problem turns out to be a hard problem provided the aim is to retain as much information in the table as possible which of course is a quite natural requirement The optimisation prob
45. e rule and the p rule we compute symmetric safety ranges automatically As arule the minimum frequency range should be a small positive value A manual safety range is also required for cells that have been made unsafe by intervention of the users So in these two cases the user must provide a safety range percentage When you have filled in everything you click v to transport all the specified parameters to the listwindow on the bottom You can specify as many tables as you want but as the size of the memory of a computer is still restricted you should not overdo it If you want to modify a already made table you press the button Pressing the Compute tables button will invoke t ARGUS to actually compute the tables requested and you are ready to start the process of disclosure control t ARGUS will come back with the main window showing you the number of unsafe cells per variable per dimension 4 3 3 Specify TableMetadata This option is specifically to apply safety rules to the table prior to primary suppression As secondary suppression is an option The Manual Safety Range has to be set This is equivalent to Manual Safety Range in the Specify tables window Specify table Al x gt Variables Explanatory CostFunction for Status second suppression Frequency ResponsVar Top N Freq TETIGU Number 2 C Unity m safety rule Dominance mle E Prule None l M I3 number f
46. e statistical institutes CASC partners At first sight the CASC project team had become rather large However there is a clear structure in the project defining which partners are working together for which tasks Sometimes groups working closely together have been split into independent partners only for administrative reasons Institute 1 Statistics Netherlands TARGUS 2 2 user s manual 4 3 University of Plymouth tor UK 4 Office for National Statisties ons UK 6 The Victoria University of Manchester UNIMAN UK 7 Statistisches Bundesamt sea D Lo estate i as Institut d Estadistica de Catalunya IDESCAT ss 10 Institut National de Estad stica m m 11 TU Ilmenau 12 Institut d Investigaci en Intellig ncia Artificial CSIC 13 Universitat Rovira i Virgili 14 Universitat Polit cnica de Catalunya Although Statistics Netherlands is the main contractor the management of this project is a joint responsibility of the steering committee This steering committee constitutes of 5 partners representing the 5 countries involved and also bearing a responsibility for a specific part of the CASC project CASC Steering Committee Institute Country Responsibility Statistics Netherlands Netherlands Overall manager Software development Istituto Nationale di Statistica Ital Testin Office for National Statistics UK Statistisches Bundesamt Germany T
47. e to the next box From the left box with explanatory variables you can select the variables that will be used in the row or the column of the table Up to four explanatory variables can be selected to create a table The response variable The response variable is the variable that will be used to calculate the cell totals From the list of response variables you can select a variable as response variable the cell item This is the variable for which the table to be protected is calculated i e used to calculate the cell totals The shadow variable The shadow variable is the variable that is used to apply the safety rule By default this is the response variable but it is possible to select another variable The safety rules are built on the principle of the characteristics of the largest contributors to a cell If a variable other than the response variable is a better indicator for the size of a company this variable can be used here e g the turnover as a proxy for the size of the enterprises can be a suitable variable to apply the dominance rule although the table is constructed with an other variable The cost variable This variable describes the cost of each cell These costs are minimised when the secondary suppressed cells are calculated See section 2 5 By default this is the response variable but another choice is possible It is also possible to use the frequency of the cells as a cost function This will minimise the number
48. ect that was partly sponsored by the EU under contract number IST 2000 25069 This support is highly appreciated The CASC Computational Aspects of Statistical Confidentiality project is part of the Fifth Framework of the European Union The main part of t ARGUS has been developed at Statistics Netherlands by Aad van de Wetering and Ramya Ramaswamy who wrote the kernel and Anco Hundepool who wrote the interface However this software would not have been possible without the contributions of several others both partners in the CASC project and outsiders The German partners Statistisches Bundesamt Sarah Giessing and Dietz Repsilber have contributed the GHMITER software which offers a solution for secondary cell suppression based on hypercubes Peter Paul de Wolf has build a search algorithm based on the non hierarchical optimal solutions This algorithm will break down a large hierarchical table into small non hierarchical subtables which will then be protected The optimisation routines have been developed by JJ Salazar cs of the University La Laguna Tenerife Spain Additionally Jordi Castro has developed a solution based on networks For solving these optimisation problems t ARGUS uses commercial LP solvers Traditionally we use Xpress as an LP solver This package is kindly made available for users of t ARGUS at a special agreement between the t ARGUS team and DASH optimisation the developers of Xpress Alternatively t ARGUS can also use the
49. for the marginal totals can be entered along with the Field Separator and Safety range See section 3 1 4 for more details 35 T ARGUS 2 2 user s manual Open Table file a x Table data file DASDCAT au amp rgusSSimpletabl tab Table meta data file Eliseo cx OUV 4 MLLLIT e Simple meta data Field separator Dimension Label for total Safety range 30 Y Cancel Go Directly The default extension for a table is tab and if a metadata file is to be imported the default extension is rda Other extensions can be used if required If no table metadata file is specified however more detailed metadata can be created See sections 4 3 1 and 4 3 3 Open Table file x Table data file D SDC T au Argus SimpleT ab1 tab m Table meta data file EE B8 Cancel ok o Simple meta data Field separator Dimension Label for total Safety range ET Y 2 y Frequency available Bancel iG Direct 4 2 3 File Exit Terminates the t ARGUS session 4 3 The Specify menu 4 3 1 Specify Metafile For Microdata Clicking on Specify metafile gives the user the opportunity to either edit the metafile already read T ARGUS 2 2 user s manual 36 in or to enter the metafile information directly at the terminal This option is only used when the data have been entered in the form of microdata If an already calculated table has been entered modification to the metafile are
50. heme can be specified as can be seen below with the variable GK Global Recode Mile ES is Edit box for global recode T ARGUS 2 2 user s manual The syntax is as follows A recode groups together categories as they appear in the original microdata and makes each group a new category with a new code The syntax of the recode file is as follows each line in the file corresponds with one new category The code of the new category is placed before a colon The old categories to be grouped into the new category are placed behind the colon Single categories are separated by commas and if a hyphen is placed between two categories it refers to all subsequent categories between and including these two categories If a hyphen is only placed before or after a category this refers to all categories before or after and including this category respectively Example The line 7 4 6 8 10 13 means that the categories 0 1 2 3 4 6 8 9 10 13 14 Gf present are recoded as 7 Via the read button you can read an existing recode scheme In the non hierarchical case you could change the codes for missing values here if you wish and indicate a new codelist The suppress pane The Suppress button is the most important button It will activate the modules for computing the necessary secondary suppressions There are a number of options here e Hypercube Modul Xpress Modul Cplex Network Opt Xpress Opt Cplex
51. ingle respondent who often can be reasonably assumed to know that he is the only respondent could use his knowledge on the amount of his own contribution to recalculate the value of any other suppressed corner point of this hypercube e For tables presenting magnitude data t ARGUS will ensure that GHM TER selects secondary suppressions that protect the sensitive cells properly at least to the extent possible It is assumed that users of the table can estimate any cell value to within some percentage of its actual value in advance of the publication the so called a priori bound By default t ARGUS assumes this percentage to be 100 but the user is offered to change it in the screen below GHMiter specifications 3 Additional parameters for the use of GHMiter v Protection against inferential disclosure required 100 external a priori bounds on the cell values Considering the given a priori bounds X ARGUS will compute a suitable sliding protection ratio for explanation see 5 T ARGUS will display the value of this ratio in the report file to be used by GHMI TER to make it select secondary suppressions that are sufficiently large This approach ensures that a user of the resulting protected table when using apart from the assumed a priori information only information provided by the data of the protected table would normally not be able to derive any bounds for the contribution of any respondent to a particular sensitive cell close e
52. is is the same option as Output Save table 3 3 Saving the safe table When the table is safe it can be written to the hard disk of the computer You have three options T ARGUS 2 2 user s manual 32 Ne CSV format a format easily read by Excel CVS format for pivot table Nice for manipulating the table in Excel 3 Code Value This displays the table as a text file with each line showing a row and column value as well as the total for the cell Save Table Add Status And of course an HTML report file is written to a user specified directory X C T ARGUS 2 2 user s manual 4 Description of the Menu Items In this section we will give a description of the program by menu item The information in this section is the same as the information shown when the help facility of TARGUS is invoked 4 1 Main Window Ed TauARGUS olx Ele Specify Modify Output Help Gum 8H aa Hunsafe combinations in every dimension Variable name Code tabl ooo 1 1 1 reg dim Status There are five menu headings Under File either a microdata file or tabular data file can be opened Specify allows the metadata to be entered or edited as well as letting the user specify the tables of particular interest along with primary suppressions Under Modify this table can be viewed and any secondary suppressions carried out Output allows the suppressed table to be saved and finally there is a
53. lems that will then result are quite difficult to solve and require expert knowledge in the area of combinatorial optimisation 2 5 Information loss in terms of cell weights In case of secondary cell suppression it is possible that a data protector might want to differentiate between the candidate cells for secondary suppression It is possible that he she would like to preserve the content of certain cells as much as possible and is willing to sacrifice the values of other cells instead A mechanism that can be used to make such a distinction between cells in a table is that of cell weights In T ARGUS it is possible to associate different weights with the cells in a table The higher the weight the more important the corresponding cell value is considered and the less likely it will be suppressed We shall interpret this by saying that the cells with the higher associated weights have a higher information content The aim of secondary cell suppression can be summarised by saying that a safe table should be produced from an unsafe one by minimising the information loss expressed as the sum of the weights associated with the cells that have secondarily been suppressed t ARGUS offers several ways to compute these weights The first option is to compute these weights as the sum of the contributions to a cell Secondly this weight can be the frequency of the contributors to a cell and finally each cell can be weighted as one minimising the number of sup
54. les However due to continuous demands from users we have now also build the option to read ready made tables however with the restriction that the options for table redesign will not be available A problem however arises when also the marginals of the table are published It is no longer enough to just suppress the sensitive cells as they can be easily recalculated using the marginals Even if it is not possible to exactly recalculate the suppressed cell it is possible to calculate an interval that contains the suppressed cell This is possible if some constraints are known to hold for the cell values in a table A common found constraint is that the cell values are all nonnegative If the size of such an interval is rather small then the suppressed cell can be estimated rather precisely This is not acceptable either Therefore it is necessary to suppress additional information to achieve that the intervals are sufficiently large Several solutions are available to protect the information of the sensitive cells Combining categories of the spanning variables table redesign Larger cells tend to protect the information about the individual contributors better e Suppression of additional secondary cells to prevent the recalculation of the sensitive primary cells The calculation of the optimal set with respect to the loss of information of secondary cells is a complex OR problem t ARGUS will be build around this solution and take
55. lows heuristics since the network can transport any positive flow The current package is linked with three network flows solvers CPLEX 7 5 6 PPRN 4 and an efficient implementation of the bidirectional Dijkstra s algorithm for shortest paths that will be denoted as Dijkstra 1 Later releases of CPLEX will also work if the interface routines are the same than for version 7 5 The 0 1 flows heuristic can use any of the three solvers The network flows problems formulated by the n flows heuristic can only be solved with PPRN and CPLEX PPRN and Dijkstra were implemented at the Dept of Statistics and Operations Research of the Universitat Polit ecnica de Catalunya and are included in NF CSP PPRN was originally developed during 1992 1995 but it had to be significantly improved within the CASC project to work with NF CSP Dijkstra was completely developed in the scope of CASC The third solver CPLEX 7 5 is a commercial tool and requires purchasing a license However PPRN is a fairly good replacement although not so robust for the network flows routines of CPLEX7 5 Therefore in principle there is no need for an external commercial solver More over solver Dijkstra is by far the most efficient option although it can only deal with problems formulated by the 0 1 flows heuristics It should be used whenever possible for efficiency reasons Even though two of the three solvers are included in the distribution of NF CSP thi
56. n section 3 2 1 T ARGUS 2 2 user s manual 26 lal x Table ExpYar1 x Exp ar2 Cell Information Value 1 200 Status Unsafe manual Cost 1 200 Shadow 1 200 contributions 1 Top n of shadow Up Low levels 350 360 1 290 300 300 300 300 400 100 100 100 100 400 100 100 100 100 400 100 100 100 100 Change status Set to Safe Set ta Unsafe Set to Protected Read Hist Hecode Suppress HyperCube Singleton C Modul XPress Undo Singleton Modul CPlex C Opt Netw Suppress pt XPress C Opt CPlex Undo Suppress v 3 dig separator Select Table Change View Output View Table Summary Write table Close 3 2 The process of disclosure control When the table s have been calculated the main window of t ARGUS will show again with an overview of all the unsafe cells per variable over all the tables If you had specified more than one table you have to choose a table with Modify Select table In this example you can go directly to Modify View table The window below is the main menu for TARGUS showing the number of unsafe combinations per variable For example there are 12 unsafe cells in the 2 dimensional aspects of the table and a single unsafe cell for a one way marginal total for the variable GK The right hand window gives the equivalent information for each level of the variable indicated on the left
57. ng should be together The Request protection option is used if the Request Rule under Specify tables is to be applied to this particular variable See section 3 1 3 37 T ARGUS 2 2 user s manual Specify metafile Fixed format y Attributes name Regio explanatory variable starting position 12 response variable length 2 weight variable decimals fo holding indicator C request protection Codelist automatic Missings codelist filename s Paf c Program FilesXT au Argus 2 data R egion cdl E V hierarchical Levels from microdata E E El 1 I m Bal ni New Ej Levels from file Leading string e Delete El c Program Files T au Argus 2 data region2 hrc al Codelist As this version of T ARGUS is the first version that can handle hierarchical codelists this information should be made known to T ARGUS If automatic option is selected in the non hierarchical case T ARGUS will always explore the datafile itself and make the codelist Alternatively the user can specify a codelist but that is only a list containing the labels attached to the codes These labels are only used to enhance the information by t ARGUS on the screen but TARGUS will work only with the codes that it has found when it explored the datafile In the hierarchical case there are two options 1 The hierarchy is derived from the digits in the codes Each digit of the code
58. nough to disclose this contribution according to the primary sensitivity rule in use e Note if in the screen above the option Protection against inferential disclosure required is inactivated GHMITER will not check whether secondary suppressions are sufficiently large e As mentioned above GHM TER is unable to add the protection given by multiple hypercubes In certain situations considering the given a priori bounds it is not possible to provide sufficient protection to a particular sensitive cell or secondary suppression by suppression of one single hypercube In such a case GHM TER is unable to confirm that this cell has been protected properly according to the specified sliding protection ratio It will then reduce the sliding protection ratio automatically and individually step by step for those cells the protection of which the program cannot confirm otherwise In steps 1 to 9 we divide the original ratio by k values of k from 2 to 10 and if this still does not help in step 10 we divide by an extremely large value and finally if even that does not solve the problem step 11 will set the ratio to zero The t ARGUS report file will display the number of cases where the sliding protection range was reduced by finally confirmed sliding protection ranges Note that that the number of cases with range reduction reported by this statistic in the report file is very likely to exceed the actual number of cells concerned because cells
59. of records contributing to the cells to be suppressed A third option is that the number of cells to be suppressed in minimised irrespective of the size of their contributions unity option Weight If the data file has a sample weight specified in the meta data the table can be computed taking this weights into account by clicking the Apply Weights option The safety rules see below have been extended to allow the weights to be applied here by clicking on the Apply Weights in Safety Rule option The safety rule The concept of safety rules is explained in section 2 1 On the left side of the window the type of tule can be selected and the value of the parameters Additionally the minimum number of contributors can be chosen For the dominance rule and the p rule safety ranges can be derived automatically The theory gives formulas for the upper limit only but for the lower limit we have chosen a symmetric range However the theory does not provide any safety range for the frequency rule and of course not for the cells that have been made unsafe by manual intervention of the user So in these two cases the user must provide a safety range percentage The parameters for the dominance rule see section 2 1 can be specified Not only the parameters n k for the dominance rule can be specified but also the minimum number of records contributing to a cell can be specified Alternatively the minimum protection of the p rul
60. of the top n contributors to each cell The pre defined value for TopN is 1 The first variable declared as topN will contain the largest values in each cell the second variable so declared will contain the second largest values etc Status Indicator allows the Status option to be highlighted in the left hand pane and the status codes for each cell can be changed The most frequently occurring codes are to declare a cell either Safe or Unsafe T ARGUS 2 2 user s manual 22 The default name for the explanatory variable occurring first in each line of the file containing the table is ExpVarl Below are displayed the Specify metafile windows as they look for both an explanatory variable and a status variable when there is no frequency variable Specify metafile Freefomat y EVENS TOM Microdata E 5i El a _ a a a E El EJES on He 23 TARGUS 2 2 user s manual Specify metafile Free format Attributes name Status explanatory variable frequency Separator 4 5 C response variable O topN variable length 1 weight variable status indicator decimals 0 C holding indicator safe s C T unsafe T request protection prole Codelist f automatic Code for Tota m Missings Esterno j rr rr rs rs Us rs rs i New lala al Leading string Hmm Delete m p NEN E E Below the Specify metafile window is displayed where frequency dat
61. on and lead to some overprotection Other simplifications of the heuristic approach that add to this tendency for over suppression are the following when assessing the feasibility of a hypercube to protect a specific target suppressions against interval disclosure the method e is not able to consider protection maybe already provided by other cell suppressions suppressed cells that are not corner points of this hypercube within the same sub table e does not consider the sensitivity of multi contributor primary suppressions properly that is it does not consider the protection already provided in advance of cell suppression through aggregation of these contributions e attempts to provide the same relative ambiguity to eventually large secondary suppressions that have been selected to protect cells in a linked sub table as if they were single respondent primary suppressions while actually it would be enough to provide the same absolute ambiguity as required by the corresponding primary suppressions The section on GHMiter has been contributed by Sarah GIESSING Federal Statistical Office of Germany 65180 Wiesbaden E mail sarah giessing destatis de 9 T ARGUS 2 2 user s manual 2 8 The ARGUS implementation of GHMITER e In the implementation offered by ARGUS GHM TER makes sure that a single respondent cell will never appear to be corner point of one hypercube only but of two hypercubes at least Otherwise it could happen that a s
62. parameters for the dominance rule and the minimum of records in a cell At this moment t ARGUS only allows for up to 4 dimensional tables but due to the capacities of the LP solver used Xpress or CPlex and the complexity of the optimisations involved 4 dim tables can only be protected by the hypercube method see section 2 7 39 TARGUS 2 2 user s manual Specify Tables OF ES m explanatory variables pieta response variable gt GK Var2 Sbi gt Regio GK E Regio gt shadow variable Var2 gt cost variable 3 E unity frequency C variable m safety rule Dominance rule C p rule C None 7 Request mule EB EN BM E eo E 3 Eaa s l pere 70 Use holdinas into 75 percentage range lao IV Minimum frequency il A T Apply Weights Manual safety range an 2 ME e Min frequency range n E ply Weiahts m Gate Rule Esplwas ue Resp ver Shadow amp Cost var Cancel Compute tables The explanatory variables On the left is the listbox with the explanatory variables When you click on gt or lt you transport the selected variable to the next box From the left box with explanatory variables you can select the variables that will be used in the row or the column of the table The response variable From the list of response variables you can select a variable as response variable the cell i
63. pitated Argus eyes were planted onto a bird s tail a type of bird that we now know under the name of peacock That explains why a peacock has these eye shaped marks on its tail This also explains the picture on the cover of this manual It is a copperplate engraving of Gerard de Lairesse 1641 1711 depicting the process where the eyes of Argus are being removed and placed on the peacock s tail Like the mythological Argus the software is supposed to guard something in this case data This is where the similarity between the myth and the package is supposed to end as we believe that the package is a winner and not a looser as the mythological Argus is Contact See Anco Hundepool et al 2003 y ARGUS version 3 2 user s manual Statistics Netherlands Voorburg The Netherlands This interpretation is due to Peter Kooiman and dates back to around 1992 when the first prototype of ARGUS was being built by Wil de Jong The original copy of this engraving is in the collection of Het Leidsch Prentenkabinet in Leiden The Netherlands 3 T ARGUS 2 2 user s manual Feedback from users will help improve future versions of t ARGUS and is therefore greatly appreciated The authors of this manual can be contacted directly for suggestions that may lead to improved versions of t ARGUS in writing or otherwise e mail messages can also be sent to argus cbs nl Acknowledgments t ARGUS has been developed as part of the CASC proj
64. pressed cells T ARGUS 2 2 user s manual 8 2 6 Series of tables In t ARGUS it is possible to specify a series of tables that will be protected one by one and independently of each other It is more efficient to choose this option since t ARGUS requires only a single run through the microdata in order to produce the tables But also for the user it is often more attractive to specify a series of tables and let T ARGUS protect them in a single session rather than have several independent sessions 2 7 The Hypercube GHMITER method In order to ensure tractability also of big applications TARGUS interfaces with the GHM TER hypercube method of R D Repsilber of the Landesamt fiir Datenverarbeitung und Statistik in Nordrhein Westfalen Germany offering a quick heuristic solution The method has been described in depth in 1 2 and 3 for a briefer description see 4 2 7 1 The method The approach builds on the fact that a suppressed cell in a simple n dimensional table without substructure cannot be disclosed exactly if that cell is contained in a pattern of suppressed nonzero cells forming the corner points of a hypercube The algorithm subdivides n dimensional tables with hierarchical structure into a set of n dimensional sub tables without substructure These sub tables are then protected successively in an iterative procedure that starts from the highest level Successively for each primary suppression in the current sub t
65. s care of the whole process A typical t ARGUS session will be one in which the users will first be presented with the table containing only the primary unsafe cells The user can then choose how to protect these cells This can involve the combining of categories equivalent to the global recoding of U ARGUS The result will be an update of the table with fewer unsafe cells certainly not more if the recoding has worked At a certain stage the user requests the system to solve the remaining unsafe cells by finding secondary cells to protect the primary cells At this stage the user can choose between several options to protect the primary sensitive cells Either he she chooses the hypercube method or the optimal solution In this case he she also has to select the solver to be used Xpress or Cplex After this the table can be stored and can be published 2 2 Sensitive cells in frequency count tables In the simplest way of using Tau Argus sensitive cells in frequency count tables are defined as those cells that contain a frequency that is below a certain threshold value This threshold value is to be provided by the data protector This way of identifying unsafe cells in a table is the one that is implemented in the current version of t ARGUS It should be remarked however that this is not always an adequate way to protect a frequency count table Yet it is applied a lot Rather than mechanically applying a dominance rule or a p rule one should think
66. s document only describes the features of the heuristics and from the user s point of view A detailed description of PPRN and Dijkstra s solvers can be found in 2 4 and 1 respectively A full description of the network flows solution can be found in a separate document 8 The structure of the document is as follows Section 2 introduces a simple program that shows how to use NF CSP from the user s application Section 3 describes the main options and features of the package In Section 4 we present the set of routines to interface with NF CSP grouped by functional categories A final Appendix lists all the files and routines of NF CSP 1 Ahuja R K Magnanti T L Orlin J B Network Flows Prentice Hall 1993 13 T ARGUS 2 2 user s manual 2 Castro J PPRN 1 0 User s Guide Technical report DR 94 06 Dept of Statistics and Op erations Research Universitat Polit ecnica de Catalunya Barcelona Spain 1994 3 Castro J Network flows heuristics for complementary cell suppression an empirical evaluation and extensions in LNCS 2316 Inference Control in Statistical Databases J Domingo Ferrer Ed 2002 59 73 4 Castro J Nabona N An implementation of linear and nonlinear multicommodity network flows European Journal of Operational Research 92 1996 37 53 5 Cox L H Network models for complementary cell suppression J Am Stat Assoc 90 1995 1453 1462 6 ILOG CPLEX ILOG CPLEX 7 5 Referenc
67. sed as well 2 1 Sensitive cells in magnitude tables The well known dominance rule is often used to find the sensitive cells in tables i e the cells that can not be published as they might reveal information on individual records More particularly this rule states that a cell of a table is unsafe for publication if a few n major contributors to a cell are responsible for a certain percentage k of the total of that cell The idea behind this rule is that in that case at least the major contributors themselves can determine with great precision the contributions of the other contributors to that cell The choice n 3 and k 70 is not uncommon but t ARGUS will allow the users to specify their own choice As an alternative the prior posterior rule has been proposed The basic idea is that a contributor to a cell has better chances to estimate the competitors in a cell than an outsider and also that these kind of intrusions can occur rather often The precision with which a competitor can estimate is a measure of the sensitivity of a cell The worst case is that the second largest contributor will be able to estimate the larges contributor If this precision is more than p the cell is considered unsafe An extension is that also the global knowledge about each cell is taken into account In that case we assume that each intruder has a basic knowledge of the value of each contributor of g Note that it is actually the ratio p q which determines which
68. tem This is the variable for which the table to be protected is calculated The shadow variable The shadow variable is the variable that is used to apply the safety rule By default this is the response variable but it is possible to select another variable The safety rules are built on the principle of the characteristics of the largest contributors to a cell If a variable other than the response variable is a better indicator for the size of a company this variable can be used here The cost variable This variable describes the cost of each cell These costs are minimised when the secondary suppressed cells are calculated By default this is the response variable but another choice is possible It is also possible to use the frequency of the cells as a cost function this will minimise the number of records contributing to the cells to be suppressed A third option is that the number of cells to be suppressed in minimised irrespective of the size of their contributions Weight If the data file has a sample weight specified in the meta data the table can be computed taking this weights into account For this purpose the dominance rule has been extended The safety rule T ARGUS 2 2 user s manual 40 The concept of safety rules is explained in section 2 1 On the left side of the window the type of tule can be selected and the value of the parameters Additionally the minimum number of contributors can be chosen For the dominanc
69. tti and Salazar 1998 a theoretical framework is presented that should be able to deal with hierarchical and generally linked tables In the sequel this will be called the mixed integer approach In that framework additional constraints to a linear programming problem are generated The number of added constraints however grows rapidly when dealing with hierarchical tables since many dependencies exist between all possible sub tables containing many sub totals The implemented heuristic approach HiTaS deals with a large set of sub tables in a particular order A non hierarchical table can be considered to be a hierarchical table with just one level In that case the approach reduces to the original mixed integer approach and hence provides the optimal solution In case of a hierarchical table the approach will provide a sub optimal solution that minimises the information loss per sub table but not necessarily the global information loss of the complete set of hierarchically linked tables In the following section a short description of the approach is given For a more detailed description of the method including some examples see e g De Wolf 2002 HiTaS deals with cell suppression in hierarchical tables using a top down approach The first step is to determine the primary unsafe cells in the base table consisting of all the cells that appear when crossing the hierarchical spanning variables This way all cells representing a sub tot
70. you enter the recoding described below manually or you read it from a file The default extension for this file is GRC There are some rules how you have to specify a recode scheme All codelists are treated as alphanumeric codes This means that codelists are not restricted to numerical codes only However this also implies that the codes 01 and 1 are considered different codes and also aaa and AAA are different In a recoding scheme you can specify individual codes separated by a comma or ranges of 43 T ARGUS 2 2 user s manual codes separated by a hyphen The range is determined by treating the codes as strings and using the standard string comparison E g O111 lt 11 as the 0 precedes the 1 and ZZ a as the uppercase Z precedes the lowercase a Special attention should be paid when a range is given without a left or right value This means every code less or greater than the given code In the first example the new category 1 will contain all the codes less than or equal to 49 and code 4 will contain everything larger than or equal to 150 Example for a variable with the categories 1 182 a possible recode is then 1 49 2 50 99 3 100 149 4 150 for a variable with the categories 01 till 10 a possible recode is 1 01 02 2 03 04 3 05 07 4 08 09 10 Don t forget the colon if you forget it the recode will not work The recode 3 05 07 is the same as 3 05 06 07 you

Download Pdf Manuals

image

Related Search

Related Contents

Varioskan LUX Technical Manual  Kicker 2011 KS Components Owner's Manual  Saft Batteries Automobile Parts LM User's Manual  Hanns.G HannsPad SN14T71 16GB Black, Silver tablet    

Copyright © All rights reserved.
Failed to retrieve file