Home
An introduction to Taverna workflows
Contents
1. Q open middleware infrastructure institute uk www omii ac uk An introduction to Taverna workflows dj E Exercise Installing the Workbench Grid To 3 Download Taverna from http taverna sourceforge net 5 Windows or linux If you are using either a modern version of Windows Win2k or WinXP with XP preferred or any form of linux solaris etc you should download the workbench zip file For windows users Taverna can be unzipped and used for linux you will also need to install GraphViz http www graphviz org the appropriate rpm for your platform o Mac OSX If you are using Mac OSX you should download the dmg workbench file Double click to open the disk image and copy both components Taverna and GraphViz onto your hard disk to run the application 5 YOU WILL ALSO NEED a modern Java Runtime Environment JRE or Java Software Development Kit SDK from http java sun com Java 5 or above pen meer SLE be ua ae Dr m C el Workbench Layout 3 AME Advanced Model Explorer The Advanced Model Explorer AME is the primary editing component within Taverna Through it you can load save and edit any property of a workflow enables building loading editing saving workflows fi EE Grid gra Workflow Diagram Window Visual representation of workflow Shows inputs outputs services and control flo
2. 3 Go back to the AME and remove the database and program inputs by right clicking and selecting remove from model Exercise String Constants Select string constant from Available Services Right click and select add to model with name Insert program in the pop up window O Lj Select string constant for a second time and repeat for a string constant named database L In the AME right click on program and select edit me 4 Edit the text to blastp Repeat for database and enter SWISS for the swissprot database 43 Run the workflow it runs in the same way 5 Save the workflow by selecting the save icon at the top of the AME fii xe rci Se How can we use Taverna to annotate our protein with function descriptions 3 In the available services panel find the emboss soaplab services and find the protein_motifs section Hint use the simple text search at the top of the panel 43 Find out which of these services enable searching of the Prosite and Prints databases by fetching the service descriptions To do this right click on profein motifs and select fetch descriptions 32 Import both services into the workflow model Connect these services up to the workflow so that you can find prints and prosite matches in the query sequence returned fro
3. image jpeg JPEG Image image gif GIF Image application zip Zip File chemical x swissprot SWISSPROT Flat File chemical x embl dl nucleotide EMBL Flat File chemical x ppd PPD File chemical seq aa genpept Genpept Protein chemical seq na genbank Genbank Nucleotide chemical x pdb Protein Data Bank Flat File chemical x mdl molfile SLE be ul ae Dr m C el Grid Y Exercise 8 Taverna MIME types 2 The chemical mime types are rendered using SeqVista to view formatted sequence data g Reset the workbench and load FetchPDBFlatFile from the examples library directory for a demo The chemical x pdb can be used to view rotating 3D protein Images Advanced Features 3 Spotlight on BioMart 3 BioMoby Services 3 Iteration 3 Control Flow Substituting Services and fault tolerance pen madre SLE be ul ae Dr m C el Spotlight on Biomart Biomart enables the retrieval of large amounts of genomic data e g from Ensembl and sanger as well as Uniprot and MSD datasets 43 After saving any workflows you want to keep reset the workbench in the AME 3 Load the workflow BiomartAndEMBOSSAnalysis xml from the examples directory 3 Run the Workflow This Workflow Starts by fetching all gene IDs from Ensembl corresponding to human genes on chromosome 22 implicated in known diseases and with homologous genes in rat and mouse For each of these gene IDs it fet
4. datasets You may not reach the end of these exercises but they will provide a some examples to take home fa Exercise 8 Defining Output Formats Grid To So far most of the outputs we have seen have been text but in bioinformatics we often want to view a graph a 3D structure an alignment etc Taverna is able to display results using a specific type of renderer if the workflow output is configured correctly 3 Reset the workbench and load convertedEMBOSSTutorial from the examples directory 3 Look at the workflow diagram and read the workflow metadata to find out what the workflow does 3 Run the workflow x Grid 4 2 Exercise 8 Defining Output Format Look at the results For tmapPlof and outpufPlot you will see the results are displayed graphically This is achieved by specifying a particular mime type in the output 3 Go back to the AME and look at the metadata for tmapPlot and outputPlot 53 Select MIME Types As you can see each has the image png mime type associated with it If you wish to render results in anything other than plain text you MUST specify the mime type in the workflow output Exercise 8 Taverna MIME Types The following mime types are currently used by Taverna text plain Plain Text text xml XML Text text html HTML Text text rtf Rich Text Format text x graphviz Graphviz Dot File image png PNG Image
5. g geneldentifier 43 Connect this new input to the Get Protein Fasta service by right clicking on geneldentifier and selecting getFasta gt id You always build workflows with the flow of data fii EE Define a new workflow output by right clicking on workflow output and selecting create new output 3 Supply a suitable name e g fastaSequence 3 Connect this new output to the Get Protein Fasta service remembering to build with the flow of data You have now built a simple workflow from scratch 53 Run the workflow by selecting run workflow from the Tools and Workflow Invocation menu at the very top of the workbench You will again need to supply a GI for later exercises please use a protein GI e g 1220173 SLE Wreriure ua E nml o tuli m so TF M a Exercise 6 Stringing Services Together 43 We have used Get Protein Fasta to retrieve a sequence from the genbank database What can we do with a sequence Blast it Find features and annotate it Find GO annotations The first thing you need to do is find a service which performs a blast For this we are going to use the Feta Semantic Discovery Tool Feta is a tool to semantically describe services Instead of the user needing to know exactly what a service provider has called their servic
6. ches the 200bp after the five prime end of the genomic sequence in each organism and performs a multiple alignment of the sequences using the EMBOSS tool emma a wrapper around ClustalW It then returns PNG images of the multiple alignment along with three columns containing the human rat and mouse gene IDs used in each case Right click on the hsapiens gene ensemb l service and select configure BioMart query 3 By selecting filters change the chromosome from 22 to 21 now the workflow will retrieve all disease genes from chromosome 21 with rat and mouse homologues 3 Run the workflow and look at the results 43 See how the with disease association filter was configured and the sequence exports were configured on the other Biomart queries for mouse and rat Find out which diseases are on your chosen chromosome by adding a new Biomart query processor 43 Select hsapiens gene ensembl from the available services panel and select invoke with name as there is already a service with that namel 3 Call the service hsapiens disease 3 Select Filters 3 Configure hsapiens disease by selecting id list limit and ensembl gene IDs filter under the gene tab 43 Configure the output by selecting attribute and select Mim morbid accession under the External Reference tab in the attributes section Grid 4 Adding Extra Information 3 Connect
7. es the user can search by the biological tasks that are performed by the services or by properties of the service for example the types of inputs it requires outputs it produces Select the Discover tab and select uses method from the first drop down menu When you select it bioinformatics algorithm will appear in the adjoining box Scroll down this list to find Similarity search algorithm and then the subclass of this BLAST basic local alignment search tool Select BLAST and click Find Service The results are all the annotated services that perform blast analyses there may be more un annotated ones fh Finding Blast Select searchSimple from the list and look at the details 53 Look at the service description This tells you what the service does and what each input output is expecting produces It also tells you where the service comes from For this example we are using BLAST from the DNA Databank in Japan Right click on searchSimple in the Feta results list and select add to model This adds the service to your current workflow in the Design Window 53 Before you go back to the Design window go back to search services and experiment with other ways of finding services e g by task input output resource etc j Grid 2 Exercise Blast lt Go back to the Design window SearchSimple will have been imported into your model 3 In t
8. he AME expand the for the search simple service and view the input output parameters 3 This time you will see three inputs and two outputs For the workflow to run each input must be defined If there are multiple outputs a workflow will usually run if at least one output is defined P Exercise 6 43 Create an output called blast report in the same way we did before 43 The sequence input for the Blast will be the output from the Get Protein Fasta service Connect the two together from Get Protein Fasta Output Tex to search simple query 43 Create two more inputs called database and program and connect them to the database and program inputs on search simple service j Grid 2 Exercise 6 Blast it 53 Once more select run workflow from the Tools and Workflow Invocation menu You will see a run workflow window asking for 3 input values 3 Insert a GI e g 1220173 a program blastp for protein protein blast and a database e g SWISS for swissprot 3 Click run workflow This time you will see a blast report and a fasta sequence as a result j Grid 2 Exercise 6 Blast it 3 For parameters that do not change often you will not wish to always type them in as input In this example the database and blast program may only change occasionally so there is an alternative way of defining them
9. lect alternate1 and look at the inputs and outputs These need to be mapped to the correct inputs and outputs in emma j Grid 2 Substituting Services 53 Right click on the query input in alternatel and map it to sequence direct data In both services these inputs expect a set of fasta sequences 43 Right click on the result output and map it to oufseq in emma in the same way 3 Now you have a workflow which will run using emma when it is available but will substitute it for DDBJ clustalw if emma fails Taverna also allows the user to specify the number of times a service is retried before it is considered to have failed Sometimes network traffic is heavy so a working service needs to be retried Select tmap from the same workflow To the right of the service name are a series of Os and 1s By simply typing the numbers the user can specify the number of retries and the time between the retries Change it to 3 retries for tmap and set the status to critical using the final tickbox Now it is critical it means the whole workflow will be aborted if tmap fails after 3 retries Failures in non critical services will not abort the workflow run es This exercise highlights the services that do not perform biological functions but are vital for running life science workflows 3i Look at the workflow metadata what does the workflow do
10. low and load the terationStrategyExample xml 3 Read the workflow metadata to find out what the workflow does 43 Select the ColourAnimals service and read the metadata for that service Under the description is the iteration strategy Click on dot product This allows you to switch to cross product SLE Wreriure ua ae Dr m C uA Iteration 53 Run the workflow twice once with dot product and once with cross product 4 Save the first results so you can compare them what is the difference What does it mean to specify dot or cross product fii EE 9 U bstitutin g ser Grid To vices and fault Tolerance Taverna does not own many of the bioinformatics services it provides This means that it cannot control their reliability Instead Taverna provides strategies for dealing with services being unavailable 3 Reload the convertedEMBOSSTutorial xml from the examples directory 3 Look at the metadata for the emma service It is an implementation of clustalw 43 Find the DDBJ clustalw service HINT use the Feta discovery tool Substituting Services When you have added this service to your workflow right click on it and select add as alternate 3 In the resulting menu select emma 3 The DDBJ version of the clustalw service is now added as an alternative to emma in the AME It will be called alternate1 3 Se
11. low version of the EMBOSS tutorial and then select the workflow metadata tab at the top of the AME You will see a text description of the workflow its author and its unique LSID When publishing workflows for others this annotation is useful information and allows the acknowledgement of intellectual property dewane i n n Grid AS Exercise 4 Workflow Features 43 Run the workflow by selecting run workflow from the file menu 3 Watch the progress of the workflow in the enactor invocation window As services complete the enactor reports the events If a service fails the enactor reports this also di zm 5 1 Building a simple workflow from scratch Grid To 3 Import the Get Protein FASTA service into a new workflow model First you will need to close the current workflow from the file menu then find the Get Protein Fasta service again in the Available services panel 3 Right click on Get Protein Fasta and import it into the workbench by selecting Add to Model 3 Go to the AME and expand the next to the newly imported Get Protein Fasta service You will see O 1 input Green arrow pointing up O 1 output purple arrow pointing down pen meer Exercise 5 2 Adding Input 3 Define a new workflow input by right clicking on Workflow Input and selecting create new Input 3 Supply a suitable name e
12. m Get Protein Fasta you will see that soaplab services have many input values 53 Soaplab services have many input parameters but many have default values so may not always need to be altered In this case you can run the services by simply adding the query sequence Go to the EMBOSS home page to find out which input s relate to the query sequence 53 This extra searching is impractical but is necessary if it hasn t been described in Feta 53 Soaplab has an extra metadata section however right click on the service in the AME and select get soaplab metadata i Grid 4 2 Exercise 7 Protein Annotation 43 Run the workflow now you have blast results and protein domain motif matches 3 How else can you annotate your protein As an advanced exercise you might want to search for other ways of characterising your sequence e g structural elements GO annotation i Grid 2 Saving Results Taverna provides several options for saving data i Individual data items can be saved by right clicking on them All data can be saved to disk Textual tabular data can be saved to excel 3 Save dll the data from your workflow lCISes The previous exercises have covered the basics of myGrid workflows The following demos and exercises cover more advanced features such as rendering output configuring BioMart services dealing with service failure and iterating over
13. q Run the workflow Load the workflow entitled genscan shim example xml from the page http www cs man ac uk katy taverna For an input file load example_input txt from the same web page What happens Did all the services return results Why did some fail Load the workflow entitled genscan shim example2 xml from the page http www cs man ac uk katy taverna 53 Look at the workflow metadata what does the workflow do How is it different from the previous one 43 Run the workflow using the same input what happens this time Genscansplitter is a shim service it performs no biological function it simply parses a results file m Other shims 3 There are many myGrid shim services These are currently being described in a shim library but for now a small collection are documented here htto www cs man ac uk hulld shims html From the list 43 Find a shim that will return a genbank DNA file from an id Load the example workflow and run it in Taverna 3 Find a shim that will translate DNA HINT these services might be in the feta registry 3 Load the CompareXandYFunctions xml workflow from the examples directory This workflow contains several shims Some are beanshell scripts 53 Select the GetUniquelDs service in the AME and right click Look a the script and see if you can work out what it is doing Beanshell scripts allow use
14. rkflow and the service is invoked i Grid 4 2 Exercise 3 View Results 3 Click on Results H The fasta sequence is displayed on right when you select click to view 43 Click on Process Report O Look at processes This shows the experiment provenance where and when processes were run 3 Click on Status 93 Look at options As workflows run you can monitor their progress here i ec Grid g s Exercise 3 Conclusion The processes for running and invoking a single service are the basics for any workflow and the tracking of processes and generation of results are the same however complicated a workflow becomes In the next few exercises we will look at some example workflows and build some of our own from scratch pen meer SLE be ua ae Dr m C el Grid gra Exercise 4 Finding and using workflows Select Open Workflow from the File menu at the top of the workbench You will see a selection of xml files in an examples directory These are workflow definition files 43 Select ConvertedEMBOSSTutorial xml and a pre defined workflow will be loaded 3 View the workflow diagram you will see services of in different colours x Grid 2 Exercise 4 Workflow Documentation 43 Find out what the workflow does by reading the workflow metadata 3 n the AME click on the name of the workflow in this case A workf
15. rs to write small bespoke java scripts to allow incompatible service to work together SLE Wreriure ua E nml Other Shims The emboss suite of programs have a subdivision edit 3 All the edit services are shims 3 Experiment with the edit services Find a service that will remove gaps from sequences amane i S myP ri A y cot Grid To Useful links Taverna user manual http www myarid org uk usermanual 1 7 3 Taverna mailing lists htto taverna sourceforge net index pho doc lists html
16. the input to the hsapiens gene ensembl service via the ensembl gene id 53 Create a new workflow output for the disease description output 53 Re run the workflow and view which diseases are associated with your chromosome pen madre SLE be ul ae Dr m C el Spotlight on BioMoby The process of adding a BioMoby service is different from other services BioMoby services need to be defined using terms from the Moby Object ontology 3 Reset the workflow and load the blast biomoby xml workflow from http www cs man ac uk katy taverna j Grid 2 Spotlight on BioMoby Run the workflow and look at the results As the workflow name suggests a blast search is performed on a sequence 3 Look at the workflow diagram Instead of simply giving the blast service a fasta sequence there is a Fasta sequence object defined 3 Look at the inputs for Fasta 3 Read the metadata for the Fasta object in the AME window TEE Grid 4 2 Spotlight on BioMoby The Fasta object is defined by i The sequence as a plain string The namespace i e the database the sequence came from 3 A unique identifier for the sequence 4 Aname These extra definitions take time for the user to define but they have other advantages j Grid 8 2 Spotlight on BioMoby Right click on the Fasta object in the AME and select Moby Object Details 3 A pop up window
17. w Services Go to the Available services panel and right click on Available Processors For each type of service you are given the option to add a new service or set of services 3 Select Add new wsdl scavenger A window will pop up asking for a web address Enter the Blast Web service address 3 Scroll down to the bottom of the Available Services panel and look at the new DDBJ service that is now included pen meer SLE be ua ae Dr m C el Grid gra Exercise 3 Finding and invoking a Service Go to the Available Services Panel 3 Search for Fasta in the search list box at the top of the panel we will start with simple sequence retrieval 3 You will see several services highlighted in red 4 Scroll down to Get Protein FASTA This service returns a fasta sequence from a database if you supply it with a sequence id M E Grid 8 2 Exercise 3 Invokin g d sinc le service Right click on the Get Protein FASTA service and select Invoke service 3 In the pop up Run workflow window add a protein sequence Gl by selecting ID and right clicking Select new input value and enter a value in the box on the right O Clis a genbank gene identifier you don t need the gi just the number for example the MAP kinase phosphatase sequence GI 1 220173 would be entered as 220173 Click Run wo
18. will show you what BioMoby services a Fasta sequence is produced by and what services it can feed Into 3 Right click on the getDragonBlastText service and select Moby Object Details This tells you what the service requires as inputs and what it produces as output fac Spotlight on BloMoby The BioMoby services are annotated using terms from the Moby ontology to enable semantic searching for services 53 BioMoby services are specialist kinds of service from a closed community The object model ontology and annotations have been agreed by the BioMoby service providers 53 Semantic discovery queries over other myGrid services are also possible using the myGrid ontology and the Feta Semantic discovery component 53 The myGrid ontology and the Biomoby ontology both share the same service ontology so feta can search both types of service mi m Iteration Taverna has an implicit iteration framework If you connect a set of data objects for example a set of fasta sequences to a process that expects a single data item at a time the process will iterate over each sequence 43 Reload the BiomartandEMBOSSAnalysis xml workflow from the examples directory 3 Watch the progress report You will see several services with Invoking with Iteration fii Iteration The user can also specify more complex iteration strategies using the service metadata tag 3 Reset the workf
19. ws 3 Enables saving of workflow diagrams for publishing and sharing pen madre Available Services Pane Lists services available by default in Taverna 5 3000 services Local java services Simple web services Gowlab services Li Li BH Soaplab services legacy command line application Li o BioMart database services Li BioMoby services Allows the user to add new services or workflows from the web or from file systems mi E n Installing Plugins Go to the Tools menu at the top of the workbench and select the Plugin manager 3 Select find new plugins 43 Tick the boxes for Feta and LogBook and install these plugins 3 Two more options Discover and LogBook will now have appeared at the top of your screen 3 Feta is now available through the Discover tab 3 To use the LogBook you also need a mySQL database we will come back to this later services open riddiwane of Ya Mom Grid 472 Exercise 2 Adding New Services New services can be gathered from anywhere on the web Go to the DDBJ list of available web services at httpo xml nig ac io wsdl index is These services were not designed for use in Taverna but Taverna can use them if you supply the address of the WSDL file 53 Click on the DDBJ blast service http xml nig ac jp wsdl Blast wsdl and copy the web page address c Grid g s Exercise 2 Adding Ne
Download Pdf Manuals
Related Search
Related Contents
Wasp Standard - RFID Time High Point LP Gas Grill Istruzioni per l`uso Commodorefree.com Magazine Vol2 Issue17 Gebruiksaanwijzing 2 User manual 15 Notice d`utilisation 27 Samsung DA- F680 Kullanıcı Klavuzu "user manual" Copyright © All rights reserved.