Home
Data Publication Suite Manual - the VPH
Contents
1. Outputs Gender Y Number of days since the reference date Day Patientldentifier RecruitmentDat F SurvivalDays 8 1 Spirometry Destinations aix W Data Destinations ig sth_acs_subsetl H 7 ClinicalExams 8 1 Comorbidities f Images T ImagePath Tif noord 11 PatientlD H E Medications 8 7 1 PatientT able 8 1 Spirometry Figure 28 Advanced data field processing options The reason this looks complex is that there are many ways in which one might de identify a data item Indeed often the resulting piece of data is not the same as the one annotated in the source For instance you may wish to change date of birth to age but now we would explicitly want the output data item to be annotated with the new concept to ensure it meaning 1s correct The process may be even more sophisticated than that the example above shows the use of the Days Since plug in Here we not only have a different output type but the plug in also requires some other information in this case a date on which to calculate the elapsed days Above we have chosen the recruitment date as the datum for the calculation which is a very common process in clinical trials for de identifying the dates of events 6 3 File options To be presented with the file properties window you would have selected the checkbox on the source data item indicating that this field contains a file or folder path on the local or potent
2. 0 9 Plugin Create new Figure 19 Creating a new data transformation collection _ Gender Properties X Data Type string File reference Field Annotation Bender necit Gender Description Data Transformatian Values None XSD Type C9 Ontological Male MCI Thesaurus Figure 20 Annotating a single data instance If you then select one of the values you can choose to transform it into a standard data type such as a number or date or define it as an ontological concept Figure 20 shows the latter option and as usual you simply drag the selected concept into the box from the ontology search results Figure 19 also shows the fact that you can write your own plug ins to perform this data transformation An example being that a dataset with a column of diagnosis codes could contain hundreds if not thousands of unique values Annotating these by hand in the way just described 1s not practical so the DPS offers a way for people to develop the functionality in this area and add some automation to it With the completion of the annotation process the data source is now completely defined and we are ready to decide how to publish it for access by other users 4 3 View the table contents sources d 3 9 Data Sources Rename Relationships Properties View Data Types Figure 21 View table contents option This will simply display the table contents unaltered from the s
3. 0 Sete ee otov besote het n e E eR Ld e MC 15 34 SREEATIONSIHIBS Cove aveva ee PvP io e Uo ANEREN 16 A ANNOTATIONua icis E E DEN VV PEE URN a 18 4 1 SEMANTIC DATA ANNOTAT TION eee pot ovest vass ut oa do estuve Lev cavae cov Cov eo 18 4 2 SEMANTIC DATA TRANSFORMA TION eet aste ee Coe eoe bue 20 4 3 VIEW THE TABLE CON TENTS 21 WVIEWDATA TYPES Qo Eaus doeet wan Pose Peru vas 0e Pru 22 3 CREATING A NEW DESTINATION 5 eceesesseaaeo ees osa bees aeo sae aca a 23 6 DE IDENTIFYING THE eire oce eee ee osea eee o eaa 25 6 1 TABLE OPTIONS eon oe Besta eed eese eed e eoe E eines 25 0 2 JFIEEDOPTIONS3c c In vasco Seca abo d uve ao dod eet vie Uva Eo EEUU 25 OPTIONS IR UC sewers M Olde El UL 27 T DATASETPROPERLIES teoiecese voce ces cedo coca eese dus cao tuo aro voce 28 7 1 METADATA SEARCH PROPERTIES eeeeeeeesees eese eese sese see see sees sees esses esses e eese see ooe 30 1 2 WCCESS CONTROL nania tolsecofeesoduevetus eve tes eodwsVelu sed A aes edu desee aepo qve vol ee Tui asa dua god 30 9 DATA PUBLICA TION reco totu brav E ve Dea veia er Eva E reped 31 9 VERY T
4. Beverley High Rd Tesco s 08 17 00 Towards Town Beverley High Rd Tesco s 08 27 00 Towards Town Beverley High Rd Tesco s 08 37 00 Towards Town Beverley High Rd Tesco s 08 47 00 Towards Town Beverley High Rd Tesco s 08 57 00 Towards Town Beverley High Rd Tesco s 09 07 00 Towards Town Beverley High Rd Tesco s 09 17 00 Towards Town Beverley High Rd Tesco s 09 27 00 Towards Town Beverley High Rd Tesco s 09 37 00 T 1 1 ur mir Fur 437 Figure 5 Excel data source configuration This has a very similar form to the delimited file plugin in terms of configuration options you simply select the excel spreadsheet The system should then where it 1s possible produce a table from each of the sheets within the files These sheets will be the equivalent of tables in the DPS You can ispect the contents of the sheets by selecting one from the dropdown menu and clicking Load preview There is also an option for you to select which of the sheets in the file you which to import into the DPS Since the process behind the scenes it to ask Excel to export the required sheets to CSV file format and then these are loaded into the DPS the caveats and advice on checking the data typing once imported still hold 8 NOTE This plugin requires that Microsoft Office is installed on the machine in order to work We have found significant differences in how versions of Office handle this type of remote control and
5. PORTSMOUTH PORTSMOUTH PORTSMOUTH HOLTSVILLE 472 63707 HOLTSVILLE 472 63707 ADJUNTAS 4066 72258 AGUADA 067 18095 AGUADILLA 067 14578 AGUADILLA 067 13588 AGUADILLA 067 14148 MARICAO 066 94411 aD m 3 Longitude 4071 01320 071 01320 071 01320 071 01320 071 01320 4071 01320 oi a u U U U U U u U GIG Figure 8 Microsoft access data source configuration The user simply browses to the location of the MDB file and opens it You can view the contents of the tables if you wish but there are no specific options associated with the tables and they will all be imported with the data source As with the other database plugins data typing is not a problem nor should you need to define any relationships for the data source following import as these will be picked up directly from the database NOTE Since this is explicitly related to Microsoft office it is entirely possible that some versions may not be supported by the plugin Whilst we will endeavour to keep the system in step with developments from Microsoft it is possible newer or indeed very old MS Access databases may not load properly As with the Excel plugin the fall back position would be to export the tables manually to CSV files and load them with the CSV File Collection plugin 2 2 3 Microsoft OLE DB plugin E New Source Seo Ara Database ClearCanvas Workstation images CSV Hle Collection Delimited Data
6. 3 System specific sources 2 3 1 ClearCanvas Workstation images E Mew Source AQ Database CSV File Collection UCLCardiac Data Delimited Data File Excel Data File Microsoft Access Database ClearCanvas database location Microsoft OLE DB C Program Files clearcanvas cleancanvas workstation dicom Microsoft SQL Database MySQL Database Study Tracker database location SAP Text File XNAT Importer C Program Files clearcanvas clearcanvas workstation dicom L Only upload new studies Add Cancel Figure 10 ClearCanvas DICOM data source configuration This plugin provides a mechanism for managing DICOM images and depends entirely on a product called ClearCanvas Workstation The installation files for a free version of this can be found on the VPH Share portal In essence this can be used in two scenarios to benefit researchers in the imaging community First it can be truly integrated with a PACS Picture Archiving and Communication System which would allow the institution to send specific studies usually related to a research cohort and then these can be processed and de identified for publication to VPH Share This offers an opportunity to minimise the effort for clinical institutions in provisioning imaging collections to their research teams Second it is capable of ingesting and properly structuring large collections of DICOM files In many institutions research d
7. 4 Annotation Annotation in general is the process of attaching some item of metadata e g description to a piece of data with the intention of turning into information In this application the fundamental source for annotation 15 to use ontologies and this process is know as semantic annotation 4 1 Semantic data annotation Data annotation is the top level of annotation and simply assigns a concept to a data table or column within it In terms of system functionality this step is optional in that you do not need to annotate a single element in the source data set By default a new set of semantic terms based on the column names of the tables will be created and the data will be published with these into the SPARQL access point This approach will make it very difficult for others users to search for and find the data set and will make the resulting data set almost impossible to interpret without further documentation but it is valid Assuming you wish to go further and annotate the data as fully as possible you can add annotations at the both the table and column level This is done by using the ontology search panel Simply type in the name of a concept you think represents the data and click search You will then have returned a list of concepts that contain the keywords you entered NOTE Many concepts will have multiple terms from a number of different ontologies Which one to select is still a subject for research and ultimately the syste
8. File Excel Data File Tac Microsoft Access Database El OleDb TypeDescription Connect to data sources that support OLE DB Microsoft OLE DB Microsoft SQL Database ConnectionString MySQL Database Provider E ande El OleDb Extended Properties DataSource Type Fmt Hdr Imex El QeDb Standard Properties Database DataSource Password Userld El Source Description Name Figure 9 OLEDB data source configuration This 15 perhaps this most unfriendly plugin for the standard user but in fact the most powerful from a data management perspective It is capable of connecting to a very large range of data sources and its capability is dependent on other components already installed on the operating system For this reason it 1s actually not possible to give an accurate description of how it might behave on the users machine The http www connectionstrings com net framework data provider for ole db web site gives a good overview of the types of connections one might make with this plugin but there are many other internet based resource available if you search for oledb NOTE We would not envisage that many users would attempt to use this plugin but if you are not able to get the data you have loaded into the DPS you can request support from the team and we will work with you to find a set of properties for this plugin that meet your needs in lieu of us developing a more bespoke and friendly plugin for that specific data source 2
9. whilst we have tested as many as we have access to it is possible the process will fail on some installations In this case the fall back would be to manually use Excel to export the sheets to csv files and use either the CSV File Collection if there are multiple sheets to be imported or the Delimited Data File plugin 2 1 3 CSV File Collection New Source es e AG Database nn ClearCanvas Workstation images Soucename DPS TestData CSV File Collecti P Delimited Data File CSV Files location C Users smwood Work DPSTestData Browse Excel Data File Microsoft Access Database NULL field string Microsoft OLE DB v First row has column names Microsoft SQL Database MySQL Database 7 Data is enclosed in quotes SAP Text File AA XNAT Importer Tables Load preview first 50 rows Subject ID MRI ID Group isi MR Delay 2 0001 MR1 Nondemented 0 OAS2 0001 OAS2_0001_MR Nondemented 457 OAS2 0002 OAS2_0002_MR1 Demented OAS2 0002 OAS2_0002_MR2 Demented OAS2_0002 52 0002 Demented OAS2 0004 OAS2 0004 1 Nondemented OAS 0004 OAS2 0004 MR2 Nondemented OAS2 0005 OAS2 005 Nondemented OAS2 0005 OAS2 0005 2 Nondemented OAS 0005 OAS 0005 MR3 Nondemented OAS2 0007 OAS2_0007_MR1 Demented OAS 0007 OAS _0007_MR3 Demented n Figure 6 CSV File collection data source configuration Much of this functionality has been discussed above
10. 6 1 Table options There are only 2 options for tables these are either Include the default or Exclude NOTE f you exclude table it will override any options applied the fields within it and no data from the table will be published Destinations Data Destinations E F waadtest 2 83 Include Exclude View Processed Table Image Figure 26 De identifying table options 6 2 Field options As with tables the options to simply publish unaltered Include or withhold from the publication Exclude exist on every field However there is an addition option with is the Properties item Destinations Data Destinations 5 18 woodtest a ranttable EE address country Include Exclude Figure 27 De identifying data field options When selected you will be presented with a properties window in the middle area of the interface that allows you to transform this data item f sth acs dataset image Data Publication Suite NT ROSE L amp _ ImagingDate Ontology Searc Ii ta 0 Enable Deidentification Deidentification Plugin Days Since v Bioportal ata Sources 5 18 ACS Datasetl Inputs Search di Ready to search 7 Description Referenced Field 7 Images F Referepce date ACS Dataset PatientT able tcsy RecruitmentD ate 9 7 Medications z Pa PatientT able T Age CohortlD Deceased EthnicOrigin Description Annotation
11. HE WEB a i E ER et eh eae 32 1 Overview The Data Publication Suite is designed to support the process of publishing in a secure internet accessible way clinical or research data sets When we use the term data sets we primarily refer to structured and potentially relational data sources such as CSV extracts and relational databases Once published the data 1s made accessible through SPAROL and an SQL type protocol from the OGSA DIA project The general process for publication although many of them are not mandatory is as follows Import a data source Define relationships between the tables if they exist and are not automatically detected Semantically annotate the data Create a destination container of the server if you have permissions Create a new destination based on a data source Define a de identification profile for this destination Publish the data Manage the access list for the resource The DPS contains a large number of features for managing all of these requirements and the document will go through the each of these in turn f sth acs dataset Data Publication Suite Sources B i g Data Sources a g ACS Datasetl ClinicalExams Comorbidities 8 1 1 Medications B 2 PatientT able fF Age fF CohortlD F Deceased F EthnicOrigin F Gender JP PatientlD F StudyCode T SurvivalDays Spirometry Data Destinations ig sth acs subsetl 8 1 1 ClinicalE
12. User Manual VPH SHARE Data Publication Suite Version 12 Date 30 02 2013 Authors S Wood R Knight Institution Sheffield Teaching Hospitals Contents Lb QQVERVIENVNV o veo eoe cu o Uv eV PUE da 3 2 CONNECT TOA DATA SOURCE ioceeteces cete vea pes toa a 5 2 1 TEXTBASEDSOURCES cis Cosas acea ee ee ree dus veles eee eet rove vd eva ER dub eS 6 AEL 6 EXCEL DATAE IBS a enh rri A Lei A e ri EA A n Aer E heat 7 24 29 CSV BILE CO TECON rione aR 8 2 2 DATABASE SOURCES wascasccinscccseiccesnacasccccscvescsvedeccccdsdseucevadacsed nevescesadasadctedsscestadssocdexessecsts 9 2i MICROSOFT SOL ae 9 222 MICROSOBT ACCESS DATABASE a er Pise b eoo a Denes oa 10 2 2 3 MICROSOFT OLE DB PLUGIN cccecceccccescccccsccsccecceccecceccescescescscescescescascesceecees 11 2 3 SYSTEM SPECIFIC SOURCES 5 22 23 2 c Co PEE FEE Reet RS 12 2 3 1 CLEARCANVAS WORKSTATION IMAGES c cssccsccscccccecceccecceccescescscscescescescescesceecees 12 guae SXONPASDCINBPORTE 2 s esie aei i 13 3 WORKING WITH TABLES AND FIELDS ccccccsssssssossssssssssssssccccccsccsoossoes 14 3 1 SOURCE PANEL COLOUR CODING inset eee os esee ve Feo Cada 14 3 2 RENAMING eair eaa su basa dues mE OD EE 14 3 5
13. You browse to a folder which contains a number of CSV files and the system will load them all as separate tables The viewing of the datable data and data typing issues are as above so will not be re iterated here NOTE The term CSV stands for Comma Separated File and this plugin o requires exactly that The term CSV is often abused and we commonly receive examples delimited file using semi colons or tabs which have a CSV extension If you have these types of data either use the Delimited Data File plugin if there 1s only one or transform the data yourself through something like Excel 2 2 Database sources 2 2 1 Microsoft SQL MySQL and ArQ New Source Sah Ar Database 2 ClearCanvas Workstation images z CSV File Collection Misc Delimited Data File Type Description Description of SQL database Excel Data File EAE gn TypeName Microsoft SQL Database Microsoft OLE DB El Settings Connection String MySQL Database El Source SAP Text File ANAT Importer Description Name Name Name of source Figure 7 Database server data source configuration These two plugins are identical from a users perspective and so will be discussed together The configuration is rather unfriendly to the naive user but would be obvious to any administrator with experience of managing these types of database In essence there is only one option beyond the name and description of the data source and this 1s the Connection St
14. a row in another table when the values in the two fields match sth acs dataset image Data Publication Suite Edit Tools Window Help PatientTable Relationships 5 8 ACS_Dataset1 Clinical Exams FE Diastolic Blood Pressure E Heart Rate FE Height HE Respiratory Rate FE Systolic Blood Pressure LE Temperature Ethnic Origin HE Visit Gender E Weight Patient Identifier Study Code Survival Days PatentTable HE 8 7 Comorbidities H E Medications 59 081 Patient Table E 1 Spirometry Figure 15 Showing the creation of a relationship by dragging one field onto another In the example data each of the tables link to the PatientTable table via the Patient Identifier field Figure 15 shows the creation of the relationship between the ClinicalExams and PatientTable tables To display the relationships window for a table the user should right click on a table see Figure 14 and select Relationships When the Relationships window is displayed fields from the Sources tree view can be dragged onto fields in the relationships window to create a relationship Once a field 15 dropped into the Relationships window a confirmation window is shown which lets you review the fields you have chosen for the relationship and confirm the type and direction of the relationship Figure 16 shows this conformation window displaying a preview of the relationship to be created There are two ty
15. ata is stored on CD DVD or a central filestore with little or no formal structuring or search capability The ClearCanvas workstation can be used to import these collections and allow the users to achieve this structuring as well as offering an 1mage viewing and processing capability The ClearCanvas workstation documentation can found at http www clearcanvas ca Portals 0 ClearCanvasFiles Documentation UsersGuide W orkstation 2 0 SPI and will describe to the users how to use this as a DICOM image management system The usage of this plugin is in the extraction of the subject information from the imagestore and its publication as a coupled database filestore for the DICOM metadata and image files When used in conjunction with the ClearCanvas DICOM Upload plugin see section on handling file references the DPS can also de identify the DICOM files and database records so images that were originally identifiable can be made available for secondary use The ClearCanvas database location is located under the installation folder and by default can be found at C Program Files clearcanvas clearcanvas workstation dicom_datastore viewer sdf The location of the tracking database can be anywhere you wish as it gets created on import of source 2 3 2 XNAT Importer 4 New Source e mj AG Database E ClearCanvas Workstation images Source name MySpine Visit level split CSV File Collection Ait visit level s
16. boolean False ExclusionCriteria_Immunosupression boolean True vi Add Cancel Figure 11 XNAT data source plugin This plugin connects via web services to an XNAT instance and extracts data from it into a form that 1s easily queried It also extracts all of the imaging data and uploads it to the cloud storage services adding in URL links to locations of the images as it does so In many ways it utilises the PACS storage model only there 15 often significantly more subject level data In terms of what data can be extracted from the system we have gotten to the stage where the core data relating to the patients visits and 1maging studies can be extracted XNAT does however have the facility to be extended to have custom data collection forms but this process cannot be catered for automatically by the plugin Because of this issue in every system we have connected to there has been some development work done on the XNAT Importer to effectively extract all of the data By default the system will just extract that information it knows about so this will not prevent publication but if you require a more comprehensive integration please contact steven wood sth nhs uk to discuss how this might be achieved The importer has an advanced viewer which allows the user to browse the contents of the XNAT repository and get an understanding of what the exported data might look like Often people who use the XNAT system are unaware of the c
17. ccession may fail but you should be informed that a process 1s already underway 9 Query the Web Dataset Destinations TOV ips Data Destinations ig A T a rol f Publish 23 Exclude Properties Query Web Dataset E ES JOUER Figure 35 Query the published dataset EET roottable Select smoker int ale i Je country string 36 Return roottable PatientlD ale moving image string fixed image string Cmm gt T T7 T7 099 Len mop ow 1 1 bae Di Figure 36 Start to build a SPARQL query When we run this query with the Run query button we get the following Number of Records 10 roottable PatientlD roottable country roottable gender roottable moving image iRRhMft2qiv2UcoSC ES nUhgss Virgin Islands British Male https lobcder cypfronet pl lobeder dav home woody demo IM_0390 dem dem Buk6ZFL xGO1BxrCez gxwuugLlE Samoa Male https lobeder vph cyfronet pl lobeder dav home woody demo IM O0400 dcm dcm nIPF373pRagHMUGMnjSil aKNMeA Slovenia Female https lobcder vph cyfronet pl lobeder dav home woody demo IM 0427 dcm dcm KUyw w FBO0KywDX20reEZItPHZTIM Benin Female https lobeder yph cyfronet pl lobeder day home woody demo IM_0429 dem dem MGapFSDGTzCrCb3 p3wADc2fwwDU s Guinea Bissau Male https lobeder vph cyfronet pl lobeder day home woody demo IM 0431 dcm dcm obsBd2VEPoLA
18. csitaly cam Remove Richard Knight richard knight sheffiel Remove Steven Wood Steven Wood gsth nhs uk Remove Current access list Search far Users amp Groups v Users Debora Testi dtesti scsitaly corn Steve Wood m balasso woody iscsitaly com Test User deboratesti gmail cor master interface namail scsitaly corn balassus hotmail cam Groups Batestgroup000l 2 QU DARE test 1 002 2 Users richtest2_dataset_read 2 groups you dataset reacdwrite 1 can add Grichtest dataset admin 1 Figure 33 Managing access for a role on the dataset 8 Data publication At this point all that s left to do is publish the data set to the server This is shown in the context menu in Figure 34 Destinations s D ata Destinations Figure 34 Publish the destination dataset This will do everything necessary to upload the data to the server and whilst the process is in progress you will be shown a window indicating its progress Please not that some of the steps can be time consuming especially Uploading Data which is heavily dependant on your network connection NOTE Whilst the client side process can complete relatively quickly it can still take several minutes for the server to complete its work and upload the metadata etc across the VPH Share network This means that repeating the publish process in quick su
19. ially networked drive The options on this page describe how this is to be handled The first decision 15 if the file references should be uploaded at all Assuming you check the enable file upload box the next two controls become enabled First you must select a folder in the LOBCDER filestore to upload the files to There is an option for using a plug in to process the file or folder before uploading it to the network Since these can be produced by anyone it is not possible show a list of available options or what the interface may look like once one 15 selected but we will use the example of the DICOM file processor supplied with the DPS to outline the process DestinationFileFieldProperties X Enable file upload Use file processing plugin DICOM v Root upload folder STH HeviewLDr emo Metadata Hame Category License Descriptions S Latus Add Semantic Annotations Plugin Properties Patientld FatientM ame d Temporary irector Figure 29 File processing properties window At the bottom of the window is an area titled Plugin Properties which is generated by the plugin itself In this case it 1s asking for a Patient ID a Patient Name and the location of somewhere on the local disk where it can temporarily store the processed files before they are uploaded to the LOBCDER services The other two fields can either have simple text put into them in which case this inf
20. m will rank the results in an intelligent way but this does not exist yet sources 9239 m Data Sources 5 8 PatientData roottable E address country date of birth First Name Image Figure 17 Display annotation properties for source field Once you have searched for a term adding it to the table or data field 1s a simple as dragging it onto the item in the source tree or dropping it into the box on the field properties window as shown in Figure 18 f sth acs dataset Data Publication Suite File Edit Tools Window Help EAS _ Gender Properties X B X Data sting File reference Share Repository a 1 Field Annotation Drag term into gender 8 E Gender ncit Gender on Total results 47 0 2 seconds 8 1 1 Comorbidities anm Medications B A PatientT able Thesaurus OE Age Male Gender Cohort Dens Drag term onto NCI Thesaurus Etico ij data item mi Sex or Gender rix gt 2 StudylD NCI Thesaurus F StudyCode SurvivalD ays Data Transformation None died Gender Spirometry esaurus Gender Alert HL Destinations Gender of the subject LHDL_MasterOntology 20 4 W Data Destinations Gender of the subject 18 sth_acs_subset1 LHDL_MasterOntology 8 7 1 ClinicalE xams Comorbidities Gender observable entity Medicatio
21. metry Query Web Dataset tB Em Check Linked Datasets AB Last Name fa moving image foe Patientl D smoker Data node address vphsharedatal sheffield ac uk Permissions Manage read only group Manage read write group Mange administrators group Manage access Metadata Name sth acs subsetl Category Blood Heart and Circulati se License ODC By v Description Dataset Release 2013 Acute Coronary Syndrome ACS Patient Cohort Status active Tags Add Semantic Annotations Use X to remove tags Figure 32 Dataset properties KEk Ontology Search Share Repository Total results 1000 0 2 seconds Cancer Patient NCI Thesaurus Patient Education NCI Thesaurus Patient Problem Drag amp Drop NCI Thesaurus i Veterinary Patient NCI Thesaurus No Patient Involvement NCI Thesaurus Patient Noncompliance NCI Thesaurus Patient Load NCI Thesaurus Patient Artifact NCI Thesaurus Previous Page 2 of 100 The properties panel for a destination dataset contains two main components the area used to display and update the global metadata for the resource and the access control for the resource 7 1 Metadata Search properties The primary use of the metadata added onto any resource 1s to enable data discovery With this in mind there are tree primary fields that support this the free te
22. n from a source If you have not already logged into the system you will be asked at this point after which you will see a list of all data instances on the server Select the one created for you and press OK Add Destination Enter the information given to by your data node administratar Data Made Address yphsharedatal sheffield ac uk Dataset hes 0910 hes pvp richtest richtest sth ace subset sth ace subset vphop novel woodreviewdencd woodreviewdendc Figure 25 Connecting to a datanode and selecting a publication container Within the destinations section of the interface you will see a mirror image of the table and field structures shown in the sources section but the options applied to each of these items are now quite different This is the area where the de identification profile is created for a given instance note there can be many destination for the same source which is the equivalent of creating multiple views of the same data set to different users or groups 6 De identifying the dataset There are many options for de identifying the dataset in particular you can write your own tools and embed them into the application for handling specific data types so it 1s not possible to give examples of the functionality you may find in any given installation We will however go over the core components and some examples of the freely provided tools to give an idea of the process
23. ndow The left section of this window lists the installed source types which can be selected and the right section displays the configuration of the selected source Sources 2 Figure 3 Highlighting the add source button 2 1 Text Based Sources 2 1 1 Delimited data file New Source 54 AG Database Mcd images Source name allPatientIntolOR nov2012 CSV File Collection quum Data CSV file to load 3 C Users ssmwood Work DPS TestData Files csv Files timet Browse Excel Data File 5 Microsoft Access Database Separator NULL field string Microsoft OLE DB Microsoft SQL Database Comma MySQL Database A Semicolo SAP Text File e AMAT Importer Tab E First row has column names 7 Data is enclosed in quotes Load preview first 50 rows Column 1 Column2 Column3 Column4 Kingswood Retail Park 05 10 00 Kingswood Leisure Park 05 13 00 Orchard Park Tesco s 05 17 00 Newland Av Cottingham Rd 05 27 00 Princes Av Zoological PH 05 37 00 Princes Quay 05 47 00 Hull Paragon Interchange 05 50 00 SB B S BB Bo Figure 4 Delimited data source configuration This plugin is one of the core plugins of the system and is also one of the most difficult to develop since data can be provided in hugely varied forms The interface is relatively self evident in its use but there are steps that you should perform before proceeding with the data import First always load the preview t
24. ns SNOMED 8 7 1 PatientT able Spirometry Gender unspecified finding SNOMED Gender reassignment patient finding SNOMED Page 1 of 5 Figure 18 Annotating data items with ontological terms Should it not be possible to find an ontological term that accurately describes the data item you have the current fall back 1s to add a free text description of the data so this can be passed through to the person consuming the data 4 2 Semantic data transformation Sometimes simply annotating the column of data is not enough to fully define the meaning of the actual data held in it For instance we have annotated the data field in 4 1 with the concept Gender This however is not helpful if the contents of the field are simply or with no indication of which one relates to male and which to female To help with this issue we have created facilities for transforming the actual data within the fields to concepts as well in order to help with general understanding and querying the data The simplest form of this process is to create a new transformation collection as shown in Figure 19 Once we do this the software analyses the data in the field and returns the unique values so we can now add a more formal definition of what each element means Gender Properties TX Data Type string File reference Field Annotation Gender neit Gender 4 Description Data Transformatian Mone ICD 10 Plugin
25. o see what the system makes of the file provided it attempts to guess what the delimiters are but this can be incorrect depending of the file format Second check the data types the plugin has assigned to the data There are many ways in which the data can be interpreted and in order for you to use the data effectively via the cloud services it needs to be in the correct format The NULL field string option tells the system to leave an entry in the database truly empty if it contains this string in database parlance this is called a null entry What can often happen when exporting from another system is that empty entries can have the string null NULL or even Empty in them It is not a problem to publish data with these entries but when someone tries to query the data they need to know that instead of asking for empty entries they need ask for entries equal to null or whatever the value is It is worth noting that querying for empty entries in a database is very common scenario as it quite often indicates that something has not been done and action is required and so it is important that the publisher and consumers of the data have the same understanding on this concept 2 1 1 1 Handling dates and times In particular dates and times are very problematic as there are a large number of different formats that they can be provided in For instance 01 06 2014 in the US would be translated to the 6 of Jan 2014 and in Euro
26. omplexity of the data model but when extracted into a relational database this becomes apparent and may be overwhelming The only real data the plugin needs at present is the URL of the XNAT server and some login credentials SOURCES LA 1 9 18 a Data Sources Sige E Last Name E moving image e PatientlD Figure 12 Source tree showing tables and fields 3 Working with Tables and Fields 3 1 Source panel colour coding 2 The table has been is linked into the rest of the tables or there is only one table and it does not need linking a The table is not linked and probably should be Green text on any of the items in the data tree indicate that the field has a semantic annotation assigned 3 2 Renaming Some dataset schemas will be easily readable by a human and can be used without modification but some naming conventions are a little verbose like the example data and some have unrelated names such as Field1 so to make these easier to use the user can optionally rename the tables and fields without modifying the source To modify a name the user should right click on the item which will display a context menu see Figure 14 and click Rename The name then becomes editable in the source tree view in a behaviour similar to the rename action in Windows Explorer After renaming the fields of our example data it can be seen that there is a vast improvement in its readability Figure 13 shows a diag
27. ormation will be used in every instance of the process over all records in the dataset However you can drag amp drop any data item from the data sources area into these fields Once this 1s done this data item will be used during the processing of every file associated with the current record and if the field in question has also been de identified or encrypted the new value will be used in the process The section on metadata 15 a place holder for now and 15 not functional in this version of the DPS In time this will be the metadata that 1s attached to the files or folders once they are uploaded to the network Destinations 10 4 Data Destinations roottable LE address H Last Name E moving image E PatientlD Figure 30 Colour coded displays on field items As shown above fields are colour coded to quickly indicate which of the 3 options have been applied Red Excluded Black Include and Green Processed 7 Dataset Properties Figure 31 shows how to get to the destination dataset properties Destinations d vx ip Data Destinations E is sth acs dataset Sources F Data mem ACS Datasetl H E ClinicalExams E Comorbidities H E Medications 2 PatientT able 8 7 1 Spirometry Data Publication Suite Destinations a v x eg Data Destinations EE g sth 5 subsetl 8 7 ClinicalE xams H E Comorbidities 8 7 1 Medications 8 7 1 PatientT able Spiro
28. ource 4 4 View Data types When dealing with text based file sources it is especially important to view what the system thinks the file types are As described earlier there are many ways in which this interpretation can be performed and inspecting this list may prompt you to re export the data in a slightly different format if problems arise Snurces a Data Sources 5 PatientD ata a ig set as key Rename Relationships Properties Figure 22 Display data types option Field amp nalysis J Group data types Field Mame Data Type Date Time date of birth date T ime Integers smoker gender Doubles weight double walst double Strings string First Mame string Last Mame string address string country string Image string moving Image string Figure 23 Display of data types assigned to each column of the table 5 Creating a new destination At this point standard users can not create a new data container on the VPH Share data nodes this has to be done by the system administrator at this point please mail steven wood G sth nhs uk for support Once you have requested and had create your own data instance the process for populating it with data 1s as follows First right click on the Source you wish to publish and select Add as new destination SOUFCES T B ri ale Rename 2 Rernove Properties AR Last Mame A moving Image e Fatientl D Figure 24 Adding a new destinatio
29. pe would be 1 June 2014 There are system settings within windows to control the culture of the operating system but these often are not set appropriately so you should always check If you have any control over the data that 1s contained in the file the most unambiguous form for a date is of the form dd mmm yyyy e g 17 Dec 1971 and for times it is hh mm ss using 24 hour clock e g 22 01 55 The reverse form for dates is also unambiguously interpreted yyyy mm dd e g 1971 12 17 Dates and times are probably the most important type to check properly since they are the sources of most of the problems we have encountered in the data management process 2 1 2 Excel Data Files Mew Source Database ClearCanvas Workstation images Source name Beverley rd CSV File Collecti Debe Dia Fide Excel Fileto import C Users smwood Work DPSTestData Files xlexFiles Excel Data File X Microsoft Access Database Sheet Name Beverley rd Sheets Load Microsoft OLE DB Beverley rd Microsoft SQL Database MySGL Database SAP Text File AMAT Importer Analyse Field Types Load preview first 50 rows TriplD BusNumber Direction Bus Stops Time Coh TT5 1 Towards Town Beverley High Rd Tesco s 07 37 00 TT12 l Towards Town Beverley High Rd Tesco s 07 47 00 TT19 Towards Town Beverley High Rd Tesco s 07 57 00 TT25 5 Towards Town Beverley High Rd Tesco s 08 07 00 Towards Town
30. pes of relationship which can be created one to one or one to many The latter is the most common and is the type used for all the relationships in the example data It means that each row of data in the left table 1n the confirmation window will link to zero one or multiple rows in the right table The direction of the relationship 1s indicted by the 1 and o symbols on the line joining the tables ensuring this direction is correct 1s important if the tables are the wrong way round then you can click the Swap Fields button to switch their places The direction of the one to one relationship is not important because it means that for each row in either table there can only be zero or one rows in the other Relationship Properties iE ACS Dataset il ACS Dataset 1 Patient Table So linicalExams E Patient Identifier One One 8 Patient Identifier One Many Swap Fields Add Cancel Figure 16 Relationship properties window is shown to confirm the direction and type of the relationship Once all the relationships have been created the relationships window will look like Figure 13 A relationship can be modified by double clicking its line between the tables in the relationships window where you will be presented with the relationship properties window this is similar to the relationship confirmation window the only difference is the addition of a delete button which allows the relationship to be removed
31. pli Delimited Data File Project ies Excel Data File Split strategy X Microsoft Access Database Connection Microsoft OLE DB za FEE Character Microsoft SQL Database URL http myspine nat cistib org 8080 xnat MySQL Database Usem Pattem Project Subject Visit Experiment SAP Tex File ame XNAT importer Password 77777777 4 Save Projects Connect Local file path MySpine WPT Use Cookies Browse data Subjects Diagnosis 2 Symptomatic Segment L12 string 1 MY0003 Symptomatic Segment 23 string 2 MYO0004 Symptomatic Segment L34 string 4 MYO0005 Symptomatic Segment L45 string 3 Symptomatic Segment L5S string 5 0 Diagnosis SideOf The Treatment boolean False MYSPINE 1705476 Pathology DiscDegeneration boolean False MYSPINE 1724191 Pathology DiscProtrusion hemiation boolean True MYSPINE 1813508 is Pathology_FacetArthrosis_cyst boolean False Experiments Pathology_CentralStenosis boolean False diagnosis test1 2012 04 03 Pathology LateralRecessStenosis boolean True Pathology Degenerative Spondylolisthesis boolean False ExclusionCriteria_Age65 boolean False ExclusionCrteria_PreviousLumbarSpine Surgery boolean True ExclusionCriteia HeavySmoker boolean False ExclusionCriteria_Steroidintake boolean False ExclusionCriteta Type 1Diabetes boolean True ExclusionCriteia Metabolic BoneDisease boolean False ExclusionCriteia ChemoRadio Therapy
32. ram of all the tables of the example data and their renamed fields which are much easier to read NOTE These modified names are used internally by the DPS this process does not modify the source nor does it require the source to be manually modified externally Comorbidities Disease _ Visit Patient Identifier Day Started Day Stopped Dose Drug Medication _ Visit Patient Identifier Patent able Spirometry Age Investigation Cohort ID 2 gat Deceased 1 fvcrat Ethnic Origin Fev Predicted Gender Hj Fvc Patient Identifier Fvc Predicted study Code Investigation Done Survival Days J Patient Identifier ClinicalExams Diastolic Blood Pressure Heart Rate Height Respiratory Rate Systolic Blood Pressure Temperature Visit _ Weight Patient Identifier Figure 13 Relationships diagram 3 3 Keys A key is an item which can uniquely identify records Within the DPS a key field within a table identifies a single row of data inside that table and a key table within a source uniquely identifies the top level record in the dataset If the source contains this information then it will be extracted as part of the schema but if the information is missing then you will need to set the keys manually Since our example data is in a format which can t store keys we will need to set them All the tables apart from PatientTable do not have a unique field to identify each row so key fields should no
33. ring This is a well defined term and we would normally refer users to the http www connectionstrings com web site to find out which string to use for the specific database configuration they are using An example of a connection string for a MySQL database running on the local machine would be Server localhost Database crim 1 Uid root Pwd myPassword There are no data typing considerations with these types of plugin as they are picked up from the database themselves The ArQ plugin is just a specific version of the MS SQL database connector which has some bespoke processing embedded to maximise the utility of the data stored with an ArQ system NOTE In most circumstances the connection string will contain the username and password to access the contents of the database and this has to be stored in the DPS project file for future use The project files are not encrypted so you must ensure that this file 15 saved in a secure location where it cannot be accessed by anyone who does not have legitimate access to the database contents 2 2 2 Microsoft access database Mew Source e je Database e ES ClearCanvas Workstation images DORUM IE SV File Collecti ee MDE location CUsers smwood Downloads zipcodes mdb Browse Excel Data File licrosoft Access Database Microsoft OLE DB Microsoft SQL Database Tables ZIP Codes MySGL Database l T Load preview first 50rows PORTSMOUTH PORTSMOUTH
34. s patient study groupsttstudy number Figure 1 General application display Figure 1 shows a general screen shot of the software running with a central tabbed panel describing the table relationships Share Repository Total results 1000 0 2 seconds Cancer Patient NCI Thesaurus Patient Education NCI Thesaurus Patient Problem NCI Thesaurus Patient Base NCI Thesaurus Patient Monitoring NCI Thesaurus Veterinary Patient NCI Thesaurus No Patient Involvement NCI Thesaurus Patient Noncompliance NCI Thesaurus Patient Load NCI Thesaurus Patient Artifact NCI Thesaurus Previous Page 2 of 100 Next sth_acs_ dataset Data Publication Suite ui Data sources 2nology sSearci pane Properties display panel Data destinations Figure 2 Application functional panels Figure 2 shows the structural components of the interface each of these will be explored in detail during the document but as an overview the basic functionality is as follows Data Sources The system is designed to allow you manage multiple data sources within the same project although this can lead to some confusion if you intend to have multiple destination per source so normally we would only manage single source within a project This panel allows you to manage the data import structure of the data and also the semantic annotation of the data source Data Destinations Each source can have multiple de
35. stinations and this are is where these are listed and managed In this area you can create a profile for the destination which also provides data transformation operations like withhold data items or process them in some way to de identify the original source in some way Ontology search A single Google type search window which allows the user to find semantic terms to annotate data Two query destination appear by default these are bioportal and the specialised VPH Share repository Both function but VPH Share is far better so we would normally use this by default Terms from this window can simply be dragged onto data items in the sources window to perform annotation Properties panel This is a tabbed area where a other information is displayed The example in Figure 1 shows the relationship manager but other tabs include the metadata management for each destination of the specific field properties for a data item As a design philosophy this software makes heavy use of context menus so if you think there should be an option to do something at any point in the application please try to right click 2 Connect to a Data Source As has been previously mentioned the first step in publishing data is to connect to locally available data which the DPS refers to as a data source Once the DPS application has been started 1n the data source window the user should click the add source button as indicated in Figure 3 which opens the new source wi
36. t be set in these tables During the publication process any tables which do not have a key field will have one created automatically We should always be able to set the key table because for each dataset there will always be at least one table The key table is the one which contains the rows of data which can be linked to all other data through as number of relationships discussed in the next section Even though it seems like the key table could be automatically determined after the relationships have been defined this 1s not the case as multiple tables could emerge as a key table so it 1s left to the user to decide Setting an item as a key is simple you should right click on the item either a table or a field where you will be presented with the context menu shown in Figure 14 and click Set as key Since only one key can be allocated to a table or source 1f one has already been set it will be removed and the selected item set as the key SOUFCES 3 gi Data Sources 5 18 PatientData 51 180 roottable address country HE moving image set as key b ale Rename Ei Froperties Figure 14 Source table right click context menu 3 4 Relationships Relationships define how data is related between tables These are very important as when they are missing data becomes orphaned and meaningless A relationship informs the user or a query engine that a row in one table is linked to
37. thable country raottable gender ronttable moving image deiude efi Men ees Bed zs P Ran MEM Virgin islands British https labeder vph cufranet pllobederdavhomewoodydema I3EVOw3HHakMmqBsKPDa AzbaSLl United Kingdom Great Britain hitas Pid cites el neds iy tune eer Figure 39 Results of query with constraints EET roottable smoker int aje country string ale moving image string iafe fixed image string Number 7 Dae Time thes MOD DIV 1 Date Ditt Figure 40 Complex query formulation with sub groupings
38. wsjoXCFgGZqBkA0 Turkmenistan Male https lobeder vph cyfronet pl lobeder dav home woody demo IM 0325 dcm dcm PLXFZHHJwLFRXSPkstHOYOUV8 Jordan Female https lobeder vph cyfronet pl lobeder dav home woody demo IM 0408 dcm dcm 1Cadddh53z2n1k8PAjyjldtuyw Taiwan Female https lobcder yph cyfronet pl lobeder dav home woody demo IM_ 0436 dem dem I3bVOwSHHakNmgbxXKP0a AzboSU United Kingdom Great Britain Female https lobeder yph cyfronet pl lobeder day homewoody demo IM_ 0442 dcm dcm cFGpPreuSIFIHOVSHNNFvEJdLl Nepal Male https lobeder vph cyfronet pl lobeder dav home woody demo IM 0383 dcm dcm Figure 37 Simple query results from SPARQL Service The above query has no constraints on it so 1s a list of all rows in the datasource If we now click on the Where tab we can add data items again but this time we are offered the option of placing values and comparison types on them Figure 38 shows that we have added the country field and selected CONTAINS the letter Brit The results of this query are shown in roottable smoker int country string NOTz STARTS WITH ENDS WITH DOES NOT START WITH DOES NOT END WITH DOES NOT CONTAIN EMPTY NOT EMPTY Number 7 Dae MoD DIV 1 Date Ditt Figure 38 Adding constraints to the query Number of Records 2 roottable Patientl DI roo
39. xams 8 1 1 Comorbidities 8 7 1 Medications PatientT able Spirometry PatientT able Age CohortlD Deceased EthnicOrigin Gender PatientID StudyCode SurvivalD ays PatientTable Relationships Ontology Search Comorbidities cvbrul comorbiditiesttdisease item cvbrul comorbiditiesttvisit item pats patient study groupsttstudy number cvbrul medicationsttdaystarted number cvbrul medicationsttdaystopped number cvbrul medicationsttdose item cvbrul medicationsttdrug item cvbrul_medications medication number cvbrul_medications visit item pats_patient_study_groups study number Spirometry cvbrul_spirometry dayinvestigation number cvbrul_spirometryH fev1 number cvbrul_spirometryHfev1 fycrat number cvbrul_spirometry fev predicted number cvbrul spirometry fyc number cvbrul_spirometry fycpredicted number cvybrul_spirometryHinvestigationdonesitem cvbrul spirometrytttico number cvbrul_spirometryHtlcopredicted number cvbrul_spirometry yisit item pats_patient_study_groups study number cvbrul_clinicalexamination diastolicBloodPress ure number cvbrul_clinicalexamination theartR ate number cvbrul_clinicalexamination height number cvbrul_clinicalexamination respiratoryR ate number cvbrul_clinicalexamination systolicBloodPressure number cvbrul_clinicalexamination temperature number cvbrul clinicalexaminationttvisit item cvbrul clinicale aminationttweight number pat
40. xt description the list of tags which is essentially a list of keywords and the semantic annotations The name of the resource can not be changed at this point but is displayed for convenience The metadata on any resource can also be modified from the master interface 7 2 Access control Associated with every published dataset are 3 roles e Data Read Only e Data Read Write e Data Administrator The management process is the same for each of the roles so we will pick the Read Only role and show how it can be managed from within the DPS It should also be noted that the management these roles can be performed from within the master interface Figure 33 shows the form that allows you to add or remove individual users or groups from the read only role on the selected dataset The form contains two areas the current access list at the top and the search results for user or groups that can be added at the bottom You can enter any text into the search box which will be used as part of a contains query on users groups or both as dictated by the options in the menu item Once you have found the item you wish to add highlight it and click the Add Item hyperlink and it will be applied To remove a user or group simply click on the remove hyperlink next to the name in the top window There is no save on this form any changes made take effect immediately Group Manager 3aroupname sth acs subset dataset read Debora Testi d testi s
Download Pdf Manuals
Related Search
Related Contents
ViewSonic PJ862 Multimedia Projector Ceegraph Vision Users Manual pertinence et mode d`emploi Rapport annuel Chambéry 2013 MANUALE D`INSTALLAZIONE 8 100Base-FX to 10/100Base-TX PoE Media Converter FCU YSI 5400 and 5500D User Manual Samsung 150MP manual do usuário Manuel d`utilisation du SEFRAM 9730 AL242C-EVB-A1 User Manual Copyright © All rights reserved.
Failed to retrieve file