Home
BioExtract Server User Manual - The University of South Dakota
Contents
1. res ter F perum and retrieve EMBOSS data fles Mutia sequence aignment cht m E Pace 50 Version 1 0 Removing Tools Workflows and Data Extracts from a Group If you re the owner of a group you can remove tools data extracts and workflows from that group To remove tools data extracts and workflows from a group 1 Sign in to the BioExtract Server and click the Groups tab 2 in the left panel under the Owned Groups heading select the name of the group you want to remove an element from The group form opens in the right panel 3 Click the Tools Workflows or Extracts tab depending on what you d like to remove The Add Elements form displays Tools Workflows Extracts 4 In the Remove column select the check box for the element you d like to remove Choose as many elements as you like Then click the Update List button The selected element will be removed from that group ends feedback BioExtract Server Kaoma Sustoousomai com sign out data access analysis storage and workllow creation query Barats Too Workllow Owned Groups Sanford Research Members Woridiows extracts Sanford Research USD Bloteam Member Groups wet sees 4 Shared Tools Descrphon Remove 1 tpijfsoap genomaap m poss embosedats Find and retrieve EMBOSS data fles e sequ
2. button The workflow is saved and displays in the Workflows menu avery roots i Create and Import Workflows Create and Import workflows Blomart SNP Create Workflow Blastn clustalW Example Carol s format conversion maize Gene funchon prediction in Macaca Fascculus Name petb protein aianment multiple Protein Alignment Profiing Description f NP 000135 Muttitple Sequence proteins where the starting point is a Alignement particular genomic coding sequence NP 000135 Multiple sequence representing only one member ot the alignement Nucleotide InterProScan Phylogenetic analysis from a gene query Protein alignment from a gene query Import workflow domain In homo screen Chose Fle no fie chosen Sequence Format Conversion Example Test Copy Unique Asparagus petb nucleotides SomRNA Markun 4rtblaste non redundant alignment gt a set of Workflow saved successfully You can find your workflow in the Workflows menu Page 41 Version 1 0 Executing a Workflow To execute a workflow select the desired workflow from the Workflows menu on the Workflows page A graphical representation of the workflow displays in the right panel Click the ial button at the top of the workflow panel When a step within the workflow begins to run it will turn blue
3. Results Page tsi 1 ned gt In Extemal Link Local Details Description os3489 view record rattus norvegicus collagen type xviii alpha 1 col18a1 scosasiz view record mus musculus collagen type alpha 1 mma edna cone scosscoz record mus musculus procollagen type xi alpha 1 mma edna dene acosso80 wiew record musculus collagen type alpha 1 mma odna clone pzs view record muscollag mus musculus mma for collagen partial cds 2898 muscoli amp a mus musculus collagen alpha 1 type xvii mma 9 NM 001109991 view record mus musculus collagen type alpha 1 col1821 transcript Page 38 Version 1 0 Step 3 Save the resulting data extract 1 Click the Extracts tab The tblastx data extract displays 2 Click the EL button The Save Extract dialog box opens 3 Enter a name and description for the data extract and click the button The data extract is saved and becomes a searchable data source available on the Query page In Available Data Sources under the heading Miscellaneous Save Extract x Please enter the following information Extract Name Description nucleotide sequence query is translated six reading frames and the resulting six protein sequences are compared in turn 7 to those in a protein sequence database a Step 4 Remove duplicate sequences in the data extract using the xmknr tool 1
4. Page 13 Version 1 0 3 Narrow the results by clicking the Region filter specifying the X chromosome and q28 Band Start and Band End chromosome Base pair Gene Start bp Gene End bp 10000020 Band Band Start Band End as Pace 14 Version 1 0 4 Click the Attributes tab and select the Gene filter a In Ensembl deselect Ensembl Gene ID and Ensembl Transcript ID Ensembl Ensembl Gene ID 5 Click the External filter Ensembl Transcript ID b In External References select RefSeq DNA ID BioExtract Server data access analysis storage and workflow creation Tools Alignment Tools i CILDB INPARANOID CNRS FRANCE FILTERED pest CNRS France lel COSMIC SANGER UK d CYANOBASE 1 KAZUSA JAPAN 4 CYANOBASE 2 kazusa v DE POTATO INTERNATIONAL DB SWEETPOTATO Potato EMAGE BROWSE REPOSITORY 6 EMAGE GENE EXPRESSION EMAP ANATOMY ONTOLOGY Since U H ENSEMBL BACTERIA 9 EB UK e SEMEL FUNCTIONAL GENOMICS sz MN ERU External References B puameo 10 Brow E clone based Ensembl transenst name Eim morbid accession Emm Gene accession E mRBase Accassions E gene name F erotin Genbani in IE efsea Predicted DNA ID Fineisen Protein D transcript where ENsT identical to OTT Send us eec User goest Eignin rester
5. If you know the Gl acceesion number of the sequence s you wart to retrieve use the Fetch sequence Records tool Retrieved records will display on the Extracts pave What s New gt 8 Query Form Select a search field and enter a search tern Press Add Search Line ta combine search terms with AND OR AND NOT Query examples Sa Search Field Search Term s x Denton v v sapiens x AD o Denton BTR domain x mo lt Definition ne finger Current Query Definition Homo sapiens AND Definition domain AND Definition zinc finger Figure 2 The Query page showing the Query Form using the Boolean operator AND to find Homo sapiens zinc finger BTB domain Search terms can be combined with Boolean operators to create precise queries Executing a Wild Card Search The BioExtract Server offers limited wild card searching functionality where the asterisk is used to represent any character For example to locate Arabidopsis thaliana basic helix loop helix bHLH family of proteins the taxonomy search value would be specified as Arabidopsis thaliana and the gene value on which to search would be bHLH This query will return all Arabidopsis thaliana protein records that have a gene annotation beginning with bHLH Viewing Query Results Viewing a Set of Records The results of a query are displayed on the Extracts page For each record in the result set there are two links The Local Detai
6. Information Input Data ditat Toole Input Fle i Information Tools Use records on Extracts pape formatted as FASTAL H my Tools Use previously executed tool results Selec tool x Saied rasute Node Tools Upload data saved on your computer d Phylogeny Tools choose Fie No fle chosen Protein Tools 9 Paste or type data into the text area simitariey Search Tools sas blast p p Parameter Settings Feladvanced options Options Figure 7 The BioExtract Server Tools Page showing the Available Tools with the Similarity Search Tools group expanded and the blastn tool selected Page Q Version 1 0 Executing an Analytic Tool Analytic tools are executed in the BioExtract Server by first selecting a tool from the list of Available Tools on the Tools page One of the most important characteristics of the BioExtract Server is allowing the user to specify the input source when executing a tool For each tool four possible input methods are available using the records listed on the Extracts page using the output from a previously executed tool entering data into a text box and uploading a data file When executing a tool using the Use records listed on the Extracts page option the default file format is FASTA For each record listed the BioExtract Server system will retrieve it in FASTA format and create an input file for the tool Attempts have been made in the BioExtr
7. When the execution completes the color of the step will change to green o Mme E Save Dataset 4800 89 4801 4802 4803 406 conee 4805 When execution completes the Provenance button becomes enabled Click the Provenance button to see the provenance information associated with the execution of the workflow BioExtract Workflow Report e Saeed 59 o a m MEE trans E ER r Paar 47 Version 1 0 Modifying a Workflow One of the major advantages of BioExtract Server workflows is they can be modified and executed using different data and parameter settings Warning Workflow modifications are saved permanently if you are the owner of the workflow Otherwise they are saved temporarily Before you modify a workflow you may want to make a copy so you don t overwrite the original workflow See Copying a Workflow for more information As an example assume the original workflow contained a query step using accession number L16896 and you re interested in running the same workflow using accession number NM 005341 You would also like to change some of the tblastx tool parameter settings To modify a query and tool step in a workflow 1 From the Workflows page select the workflow you want to modify i e Phylogenetic analysis This will show you all of the steps included in that workflow Don t forget to sign in to see your
8. othe form nd thon dick Save Tel Changer is olet the process Logica Name Fete coris Descriotion ivan iat of database ana he rame ofa database wi rte Execution name lia jar wes Rs RSRCE RCRD Can se Currant extract 7 Paar 71 Version 1 0 Step 4 Specify input files Data from the BioExtract Server is streamed to the local tool by using one or more input files created on your computer when BioExtract Server runs the tool The data stored in each input file can come from one of four sources selected when the tool is executed x 2 The current extract records Extracts page A previous tool s output An input file uploaded from your computer Text typed or pasted directly into a text box Input Data input O Use records on Extracts page formatted as FASTA O Use previously executed tool results Selecttool Select result O Upload data saved on your computer Browse Paste or type data into the text area Not all tools require input files so you may not need to add one at all Our example tool Fetch Records requires a single input file containing a list of Ids representing sequence records in an external data source Physical Name We re going to skip this one for now It will make more sense later after going through the rest of the fields Logical Name As with the tool itself the logical name is only used within BioExtract Server It may
9. secverces cipia Help URL hitp omeocs ccurcafoge net s9ps eleases 3 emboss apps ncece tmi alignment diferencos alignment dat lots Location hit wscebhac Uk soapiab services EJetinhent etel Execution Name coap alignmant_ global esma Can Modify Current Extractfalse Can Use Current Extractifalse nesdiealt E sector alignment locat alignment multiple assembly tragmont_acsomoly display edit Figure 8 The BioExtract Server Tools page showing the Add a New Tool menu expanded with a search for tools containing the word needle In this example the Soaplab needle tool has been selected Page 17 Version 1 0 Added tools can be found on the Tools page in the Available Tools menu in the My Tools group get inkab by entry losen needle premiat nrimersearh Pace IR Version 1 0 Modifying Added Tools Users have the option to change the logical names of the tool attributes e g parameters inputs and tool name which allows them to customize the tool interface If the input to the tool being added represents sequence data the user may indicate that the tool can use the BioExtract Server result set as input To modify an added tool 1 From the Tools page click the Customize Your Tools menu and select the tool to be modified The selected tool opens in
10. system Once the tool has been added it can be used like any of the other tools in the BioExtract Server and can even be included in workflows The tool must meet the following criteria The tool can be executed from the command line either a DOS prompt or Linux shell If the tool only has a graphical or window like interface BioExtract Server cannot use it If you wish to use the output of the local tool as input for another tool on the BioExtract Server the name and location of any output files produced by the tool must be known before the tool is executed If the tool meets the criteria above the following information is required to add the tool The full path to the location of the tool on your system For example C biotools mytool exe on Windows usr local bio mytool on Linux Any required command line arguments These are pieces of information typed after the program s name usually specifying the name of the file it should use as input the file name its output should be written to etc For example let s say we have a tool that can convert sequences in one format to another format The documentation for the tool may read usage seqconv i filename if formatName of formatName Where i filename is the file to be converted if is the format of the input file of is the requested format of the output file Valid format names fasta blast genbank We have a file in fasta format located at C temp fasta txt that
11. the Tools tab open the Similarity Search Tools group and select the blastp tool The tool form opens in the right panel In the Input Data section select Use records on Extracts page formatted as FASTA and click the Execute button BioExtract Server FEIN data access analysis storage and workflow creation Eann register why register 3 7 blastp E alignment Tools Search praten database using a proen que Hore Information Input Data i m Input ie Tools Use records on Extracts page formatted as FESTA Nude Tous Use previously executed tool resuts Lio v selectresut E Phylogeny Toole Upload data saved on your computer G Protain Tools Choose no fie chosen sitar search Toole Paste or type data into the text area sam tam me p Parameter Settings itera and masking Fiter Lowcomolexty FILTER L Fiter Human repeats FILTER R Fiter Mask for lookup table only FILTER m Pace 11 Version 1 0 A data extract will be created on the Extracts page containing sequence records similar to the Gallus gallus cadherin 19 sequence This data extract can be saved on the BioExtract Server website shared with others and used as input into other analytic tools BioExtract Server Ilys storage and workflow creation Enin register why eater m4 Query 50 records Databases Database otmi lane Dereon w 5 Re
12. the right panel amp BioExtract Server E data access analysis storage and workflow er aa eede uM Logica Name needle Toole Description Needteman iunsdh glabal nent of two sequences Find Help eembosssorceorgo ntspoareleasee1emboen aopsveedi hen A y p 9 I mmm Con Modify Current mogrbor gt 2 Click the Edit link adjacent to the tool name The tool form opens Page 19 Version 1 0 In the tool form you can change the Logical Name Description and HelpURL As an example let s modify the tool by entering Needleman Wunsch in the Logical Name box BioExtract Server ote access analysis storage and workflow creation mmm Soo me inch bp loot needle You an range the Loge and mating your ole te races name needs Descriptio Weedensn Wunsch globai sonent t sequence pine ssreserge 3 Click the Save link at the top of the tool form to keep your changes Then click the Save Tool Changes button to complete the process Pace on Adding a Local Tool A local tool is a program on your own computer The tool itself is not uploaded to the BioExtract Server Rather the BioExtract Server uses the information given about the tool to execute it on your own
13. to other tools on BioExtract Server Physical Name Logical Name Description File Name and Include in Command Line behave exactly the same as they do for Input files Record Number Limit Description File Name and Modify Current Extract are not used for local tool output files and can be ignored Step 6 Command Line Parameters Arguments Most tools have a set of options whose values are given by arguments on the command line Specifying the input and output files for the program is just one example of such arguments For our purposes here the terms argument and parameter are identical BioExtract Server uses the information given about each parameter to add an element in the completed tool s interface where the value for the parameter can be given This is the interface shown for Fetch Records parameters Parameter Settings for mutate Mutation Options Severity sev 4 Point Mutations point deletions T Codon Mutations codon duplications z Before adding a Parameter you must first add a Parameter Group Multiple parameter groups are allowed and can be used to keep related parameters together For example a tool may have a set of parameters Paor 76 Version 1 0 that affect the appearance of the tool s output These parameters could be placed within a group called Output Options and will be displayed together in the menu used to run the tool from BioExtract Server Our example too
14. tool and applets cannot execute programs or write files without permission which can only be granted if the applet is placed in a digitally signed jar file Since we have signed the file ourselves without using a certificate from one of the third party Certificate Authorities the browser reports that the signature cannot be verified R Once you click the button the applet will download the input files from the server execute the tool and upload the output files displaying a short message for each step Once it is finished you may close the applet window Page 31 Version 1 0 BIOEXTRACT SERVER WORKFLOWS Introduction Users do not explicitly create workflows in the BioExtract Server but implicitly do so by working with the system As you work in the BioExtract Server all of your tasks such as executing queries against selected data sources applying analytic tools and saving data extracts are recorded At any point a workflow comprising the performed set of tasks can be saved and subsequently executed as a single unit Individual tasks within the workflow may also be modified or deleted by the workflow owner To illustrate a BioExtract Server workflow consider the task of carrying out a phylogenetic analysis for a set of proteins where the starting point is a particular genomic coding sequence representing only one member of the gene family in a given species In the BioExtract Server the steps for accomplishing this task i
15. user clushbou usd edu now a member of the Sanford Research group 5 Click the Continue button The BioExtract Server web site opens Sign by clicking the sign in link on the top right corner of any BioExtract Server page 6 Click the Groups tab Under Member Groups you ll see the name of the group you joined Select the group name The group interface opens in the right panel 7 Click the Members tab In the Group Members list you ll see your email address You now have access to the tools data extracts and workflows owned by that group Page 58 Version 1 0 WORKFLOWS THROUGH MYEXPERIMENT About myExperiment experiment publish their workflows and experiment plans share them with groups and find those of others Workflows other digital objects and bundles called Packs can be swapped sorted and searched like photos and videos on the Web http www myexperiment org is a collaborative environment where scientists can Importing a BioExtract Server Workflow into myExperiment To import a BioExtract Server Workflow into myExperiment 1 Click the Workflows tab and select the desired workflow The workflow graph and its control buttons opens in the right panel Biokxtract Server oF E ES AW oe 2 Click the Export button The Open dialog box opens Save the file to your computer desktop Page 5 amp Version 1 0 3 Open the myExperiment
16. we wish to convert to the genbank format According to the documentation we could use the following command to accomplish this seqconv i C temp fasta txt if fasta of genbank Any additional requirements or constraints the tool may have in regards to input and output files For example some tools may require that all files used for input be located in a particular directory while others may allow files in any location Some tools may allow you to specify the name and location for each output file while others may automatically give a name to the file and place it in a predetermined location Paoa 71 Version 1 0 As an example suppose we have a tool called Fetch Records which takes a file containing a list of ids and the name of a data source i e ncbi embl refseq The tool accesses the data source and retrieves the records associated with the ids amp BioExtract Server rr ta access analys storage and workllow creation a New Tool Logical ames ew Te visui Desert amp shert tse of the tool xeco Tools pom mn n adda Web Serice Con May Current Eran Inputs Outputs eenen Parameter Groupings Step 1 Click the Tools tab then select SS from the menu that appears on the left Step 2 Click on Add a Local Tool then New Local Tool Now click on the Edit link next to New Tool in the right hand panel Logical Name a name for the tool that
17. why register Fiucsci clone based Ensembl gene E Cone based VEGA gene F meRops 10 E mm morbid Description Gene Description E miRBase 101 E mRBase tanscipt name ona 1D E RefSeq RNA predicted E RefSeq Predicted Protein ID Enone mis Pace 15 Version 1 0 6 Scroll to the top of the tool form and click the Execute button Homo sapiens genes GRCh37 p3 7 Now open and run the Fetch Sequence Records tool to create a data extract From the Tools page expand the Information Tools group and select the Fetch Sequence Records tool The tool form will open in the right panel 8 In the Input Data section select Use previously executed tool results In the associated drop down menu select BioMart Query 9 In Parameter Settings gt Database Options set the database to refseq then click the Execute button A data extract will be created on the Extracts page This data extract can be saved on the BioExtract Server website shared with others and used as input into other analytic tools amp BioExtract Server pr oto acces oni vl storage and worklow creation FESSA uy none 1 Query retumed 249 records Databases ri 2 1134 5 natia hene MAUS Wesen no sans melanoma angen famiy a 529225 mra p sew raced uo sons cea tan nente 92 ro
18. will be used in the BioExtract Server It does not have to match the actual name of the program For our tool names such as Fetch Records Description Optional A description of the tool HelpURL Optional If there is a website associated with your tool you can enter a link to it here Location Not used for local tools The field should be left blank Execution Name Enter the full path of your tool here If your tool is a Java JAR file e g ResultSetDB jar this should be java jar followed by the full path of your JAR file For example on a Linux system the Execution Name might be java jar usr local tools ResultSetDB jar and on a Windows system it might be java jar C ResultSetDB ResultSetDB dist ResultSetDB jar Depending on your system configuration you may need to replace java with the full path to the java command e g C Program Files Java jre6 bin java exe Page 22 Version 1 0 Can Use Current Extra lot used for local tools and should be left unchecked As a note local tools can use the current extract as input but this checkbox has no effect on that at all Step 3 Once this information is entered click the Save link at the top of the form BioExtract Server daa access analysis storage workllow creation sear soatlab Tools aga a Weh Seres New Tool You can change the Logical ame Desrtan and HebURL After mang vour changes Sava the tap
19. workflows Send us BioExtract Server Welcome sushsoumcmailcem dela access analysis storage and workllow creation Query orac Tools nem mena rur E quay agn domain On hono screen La 4 Secures Form fy samenca rama Crear E Seque Ferma Corres ug T E mana ro on Beate quer 4798 4799 cm save Dataset custava 5 ug 4805 thane pa randan pate of dna sequence J e Page 43 Version 1 0 If you want to change the query request click the plus sign to expand the Execute query step The step will expand and display modifiable sub steps Query and Databases Select the Query heading A Query Information form opens in the right panel You can change the search field and the search term Change the search term L16896 to NM_005341 and click the Save button to keep your changes Table 1 displays valid search terms that may be used when modifying a query Query Information Properties Query common accn L16896 Query examples Query Information p Properties Query common accn NM 005341 f save J reset Query examples Contains the unique accession _005341 number of the sequence or record Contains all terms fr
20. Click the Tools tab In the Available Tools menu expand the Edit Tools group and select the xmknr tool The tool s form opens in the right panel 2 Inthe Input Data section specify Use records on Extracts page formatted as FASTA w input source 3 In the Parameter Settings section set Sequence type protein or DNA a Dlana Te to ana for the 4 Click the Execute button A number of result files are created In addition the data extract visible on the Extracts page has been modified to remove any duplicate sequence records Pace 3A Version 1 0 Step 5 Convert the data extract to GenBank format using the FormatConversion tool as the next tool requires GenBank format to run 1 Click the Tools tab In the Available Tools menu expand the Edit Tools group and select the FormatConversion tool The tool s form opens in the right panel Use records on Extracts page formatted as FASTA v for the In the Input Data section specify input source 3 In the Parameter Settings section set To Format to genbank and From Format to fasta 4 Click the Execute button The records in the data extract are now converted to genbank format Send ux BioExtract Server Welcome cistou Comal com sign out data access analysis storage and workflow creation a LLL eonen irom one stowed format to another alowed format More Information 3 Aionment Tools Input Data ao
21. Holder Users Additonal Actions Description Name Santora Research Sanford children s Hospital Research Grout Page AR Version 1 0 Adding Tools Workflows and Data Extracts to a Group After you create a group you can add elements such as analytic tools data extracts and workflows to your group Note Before adding tools data extracts and workflows to your group you must first add them to your private BioExtract Server account To add tools data extracts and workflows to a group 1 Sign in to the BioExtract Server and click the Groups tab 2 In the left panel under the Owned Groups heading select the name of the group that you want to add an element to The group form opens in the right panel 3 Click the Tools Workflows or Extracts tab depending on what you d like to add The Add Elements form displays Tools Workflows Extracts 4 Click the black arrow on the Add Elements button A list of tools workflows and data extracts appears Select the name of the element that you d like to add Choose as many elements as you like then click the Add Elements button The selected element will be added to the form BioExtract Server data access analysis storage and workflow creation Page 40 Version 1 0 amp BioExtract Server data access analysis storage and worknow creation owed Groups Sanford Research
22. P ADDING TOOLS WORKFLOWS AND DATA EXTRACTS TO A GROUP REMOVING TOOLS WORKFLOWS AND DATA EXTRACTS FROM A GROUP INVITING MEMBERS TO JOIN A GROUP Workflows through MyExperiment ABOUT MYEXPERIMENT IMPORTING A BIOEXTRACT SERVER WORKFLOW INTO MYEXPERIMENT Page 7 La auuuak 9 10 10 13 17 17 19 21 30 32 33 42 43 46 47 48 49 51 52 56 56 Version 1 0 INTRODUCTION The BioExtract Server bioextract org is an open Web based system designed to aid researchers in the analysis of genomic data by providing a platform for the creation of bioinformatics workflows Scientific workflows are created within the system by recording tasks performed by the user These tasks may include querying multiple distributed data sources saving query results as searchable data extracts and executing local and Web accessible analytic tools The series of recorded tasks can then be saved as a reproducible sharable workflow available for subsequent execution with the original or modified inputs and parameter settings Integrated data resources include interfaces to the National Center for Biotechnology Information NCBI nucleotide and protein databases the European Molecular Biology Laboratory EMBL Bank non redundant nucleotide database the Universal Protein Resource UniProt and the UniProt Reference Clusters UniRef database The system offers access to numerous preinstalled curated analytic tools and also provides researchers
23. Version 1 0 amp BioExtract Server User Manual University of South Dakota About Us The BioExtract Server harnesses the power of online informatics tools for creating and customizing workflows Users can query online sequence data analyze it using an array of informatics tools web service and desktop create and share custom workflows for repeated analysis and save the resulting data and workflows in standardized reports This work was initially supported by NSF grant 0090732 Current work is being supported by NSF DBI 0606909 Copyright 2008 Brendel Group Iowa State University and Lushbough Bioinformatics Group University of South Dakota Page 1 Table of Contents Introduction Querying Data Sources EXECUTING QUERY Creating a Query Executing a Wild Card Search VIEWING QUERY RESULTS Viewing a Set of Records Filtering a Set of Records Exporting a Set of Records Saving a Set of Records BioExtract Server Analytic Tools INTRODUCTION EXECUTING AN ANALYTIC TOOL Creating Data Extracts Using Analytic Tools A Second Example With BioMart and Fetch Sequence Records ADDING AN ANALYTIC TOOL Selecting From a List of Available Tools Modifying Added Tools Adding a Local Tool Running a Local Tool BioExtract Server Workflows INTRODUCTION CREATING A WORKFLOW EXECUTING A WORKFLOW MODIFYING A WORKFLOW COPYING A WORKFLOW BioExtract Server Groups ABOUT GROUPS CREATING A GROU
24. act Server to provide an appropriate level of abstraction to hide as much low level format transformation as possible In situations where this is not possible users may run intermediate tools or shims to perform the necessary format conversions The BioExtract Server has incorporated a predefined tool for format conversion and many of the Web services available to users have been explicitly defined for data filtering and transformation As an example if the output from one analytic tool is in FASTA format and the input into another tool is required to be in GenBank format the user may run the intermediate tool or shim FormatConversion to convert the FASTA formatted file to GenBank format FormatConversion makes the conversion by parsing the id from the FASTA file and retrieving the record in GenBank format Creating Data Extracts Using Analytic Tools The execution of some analytic tools results in the creation of a data extract which displays on the Extracts page As an example let s create a data extract using a Gallus gallus cadherin 19 protein sequence record and the analytic tool BLAST 25 First we ll query the NCBI Protein Database for Species Gallus gallus and Definition cadherin 19 From the query results use the Select Records button to specify one record as the sequence of interest External Link Local Details Description 154152095 view record cadherin 19 Gallus gallus Page 10 Version 1 0 Next click
25. age Clicking a Local Details link displays the selected record s GenBank file Clicking an External Link takes you to the Website hosting the original data record Filtering a Set of Records Records under the Extracts tab may be easily filtered to specific subsets by using the Select Records button Clicking the check box to the left of a desired record adds that record to the subset Clicking the Select All on Page button selects all of the records on an individual page Clicking the Complete button after selecting all desired records generates a new result set containing only the filtered portion Exporting a Set of Records Records under the Extracts tab may be downloaded using the Export Records button The Export Records feature has options for downloading records in FASTA format or in the default format which depends on the data source queried Figure 4 For example if an NCBI data source were queried the default format would be GenBank Page f ion 1 0 t Send us amp BioExtract Server Welcome sian out data access analysis storage and workflow creation Export selections Al Records Format FASTA wo 1 2 3 4 5 next gt lasts Results page External Link Local Details Description 330251858 viowrecord transcription factor bHLH 100 Arabidopsis thaliana 330254857 view record transcription factor bHLH 100 Arabidopsis thaliana 334183217 viowrecord tran
26. ation to join a group 1 If you don t have a BioExtract Server account you ll need to create one before accepting the invitation Open bioextract org On the top right corner of any BioExtract Server page select register Fill in the required fields and click Add User Your account will be created immediately 2 Return to the invitation email The invitation email tells you that BioExtract Server group wants to add you as a member It gives you the name of the group that you ve been invited to join and gives you a link to accept the invitation Take note of the group s name for future reference Collaboration Invitation from BioExtract org bioextract usd edu nt Thu 6 30 2011 8 24 AM Lushbough Carol Hello The Sanford Research collaboration group at bicextract org wants to add you as a member To accept this invitation go to http bioextract org sharing accept invitation jsp7invitationI d fefl d320 a939 4499 8eb1 293501062892 Thank you for your time 3 Click the Invitation link A BioExtract Server login window opens Enter your BioExtract Server user name and password amp BioExtract Server Enter your login credentials to accept invitation User Name dlushbou usd edu Password KEE ETT Page 54 Version 1 0 Click the Accept Invitation button A window displays Thank you for accepting the invitation Invitation Accepted Thank you for accepting the invitation The
27. ecute button The output from the execution of the tblastx tool is a blast report furthermore the tblastx report is turned into a data extract viewable on the Extracts page View File usr locol BioStreamServer tmpFiles tblastx 1309274312876 Davia J rapran 1997 sappen BLAST ana PSI SIAST mew Genecation of protein database search programs Hacleic Acids Rea 25143893402 RID OKWESHETOI2 Database A11 GenSanitENBLADDST PDE sequences buc no EST 515 G85 environmental sameles or shese 0 1 or 2 HIGS sequences 157 082 sequences 36 508 476 836 zotal lettera Querye MUScOLieA Mus musculus collagen alpha 1 type XVIII BIA Stend 2805 te Tength 2005 Sequences producing significant alignsens EIS valme x 0211167902 1901734612 1100001807 Mas musculus alghe 1 XVIII 195 0 0 1 gi 194226339 zeZIXR_35837 21 PREDICTED Equae caballus nis imo 2e 179 1 GiisUenenseiembALSSizst Z Pan crogioayces chromcsome 22 016 348 1 gi 226953461 zeFING_011923 1 Hom sapiens collagen type XVI 243 4 179 1 gi 56713438 ige ACIE0394 9 Mos masculus 10 BAC RP24 3 9D19 R 438 30168 1 23500747 amb 5X322561 1 sapiens chromosome 21 from 2 343 1 ail TITS emb AL163302 21 Homo sepiens chromosome 2l segments 243 Se 158 1 Tools woritiows Groups Help Query returned 50 records Databases Database Total Database Description BS nar 50
28. el Page AT Version 1 0 Group data extracts can be found on the Query page in Available Data Sources under the Miscellaneous heading Creating a Group The BioExtract Server provides a facility to create groups of registered users Groups allow registered users to share tools workflows and data extracts To create a group 1 You ll first need to create a user account and sign in At the top right corner of any BioExtract Server page select register Fill in the required fields and click Add User Your account will be created immediately Return to the BioExtract Server and sign in by clicking the sign in link at the top right corner of any page 2 Click the Groups tab 3 In the left panel under the Additional Actions heading click Create Group A new group form opens in the right panel 4 Click Edit to the right of the heading A New Group Enter a Name and Description for this new group then click the Save button A new group is created You ll see your new group under the Owned Groups heading in the left panel The Name and Description of the newly created group can be edited by selecting the group and editing the information Once the information has been edited click the Save link at the top of the panel to keep your changes Query Extracts Tools Workflow REET Dena rupe A New Group Tools Workflows Extracts USD Bioteam Save palete Cancel Member Groupe Place
29. ence alignment ClustalW B Wrapper Page 51 Version 1 0 Invi ing Members to Join a Group To invite a new user to a group the group owner sends an invitation to that user Once the recipient accepts the invitation they will have access to the tools workflows and data extracts owned by the group To invite other people to your group Sign in to the BioExtract Server and click the Groups tab 2 In the left panel under the Owned Groups heading select the desired group name The group form opens in the right panel 3 Click the Members tab The Invite Members form displays Sanford Research Tools Workflows Extracts 4 Click the Invite Members button An Invite New Members dialog box appears Invite New Members x Addresses Remove Please enter a reciepient s email address 5 Select Please enter a recipient s email address A text box will appear Paar 57 Version 1 0 6 Enter the email address of whomever you want to invite to the group click the Save button then click the Send Invitation button to complete the process A message appears stating the invitation was sent When you invite people to join a group we will immediately send email invitations to the addresses you provided Once the recipient accepts the invitation they will have access to the tools data extracts and workflows owned by that group Page 5 Version 1 0 To accept an invit
30. eous Nusectide Sequences Protein Sequences vii aeui PES use the Fetch sequence Records rac Retnevad records wil display on the Extracts aoe What snow 4E viridiplantae Protein 28 Query Form Select a search held and enter a search tem Press Add Search Line to combine search terms with AND OR AND NOT Query examules eem Search Feld Search Term s x Definition B 0 x OE Taxonomy Figure 1 The BioExtract Server Query page showing the data source NCBI Protein Database selected and the query form set up to find the R2R3 MYB gene in Pinus taeda When attempting to locate sequence records associated with a fairly long description you may find it helpful to use Boolean operators to break down the description into smaller units As an example suppose you are interested in locating nucleotide records containing the description Homo sapiens zinc finger BTB domain Using the Boolean operator AND this could be broken into Homo sapiens AND zinc finger AND domain as illustrated in the screen grab below Figure 2 Page d Version 1 0 Sond us feedback BioExtract Server Welcome clushbeumgmall enm sign out data access analysis storage and workflow creation eres Available Data Sources Select one or more data sources to query Nudeotide nuccore mum O Miscellaneous B Nucleotide Sequences 14 0 Protein Sequences viridiplantae CI iidilantae Protein
31. ers access analytic tools through the list of Available Tools on the Tools page Figure 7 The source of the analytic tool s input may be 1 the records listed on the Extracts page 2 the output from a previously executed tool or 3 private data provided by the user uploaded or entered in a text box Analytic tool parameters may be selected or modified before execution and resulting output files may be viewed downloaded and used as input into subsequently executed analytic tools The BioExtract Server offers users the ability to add analytic tools to their BioExtract Server workspace Users may select such tools from a list of web services including EMBOSS SoapLab BioMoby and KEGG with the integration of BioMart www biomart org currently in development The BioExtract Server also offers generic support for other SOAP based Web services and lets users integrate local command line tools residing on their own workstations through the use of a client side Java applet Analytic tools added by users may be annotated through the Customize Your Tools functionality which allows users to provide detailed descriptions of the tools as well as add a help link to additional information Send us amp BioExtract Server Welcome com sign aut data access analysis storage and workflow creation query Workflows Groups ad Tools Search nucleotide database using a nudeotide query More
32. este Figure 5 The Save Extract window showing Extract Name and Description entered Clicking the Create Extract button creates a data extract which is stored on the BioExtract Server website BioExtract Server ume Sto somsicon sion out 1 data access analysis storage and workflow creation por Tools Workflows Groups Help Available Dato Sources Select one or more data sources to query EM Lo Protein Fetch Sequence s C C epit ona C epu protein C NCBI ORF BlastP ab070068 you know the Gt accession number of the sequencels you want to retrieve use the Fetch Sequence Records tool Retrieved records wil display on the ae a s new 28 C RORZN Pinus taeda 2 Query Form Select a search and enter a search term Press Add Search Une to combine search terms with AND OR AND NOT Query examples Search Feld Search Term s x Treron Iz TTE o ban Current Query Taxonomy Arabidopsis thaliana AND LL Figure 6 The Query page showing a user s privately owned data extract Arabidopsis thaliana bHLH in Available Data Sources under the heading Miscellaneous Pace R Version 1 0 BIOEXTRACT SERVER ANALYTIC TOOLS Introduction A number of well established and unique bioinformatics analytic tools are made available through the BioExtract Server with the majority integrated as curated Web services Us
33. flow memory But if you have been working with the system and want to begin creating a workflow it is necessary to clear any previously executed tasks from memory To clear previously executed tasks from memo 1 Click the Workflows tab and select the 7 Create and Import Workflows heading in the Workflows tree The Create and Import Workflows form opens in the right panel 2 In the Create Workflow section click the BRILL button This will erase any previously executed tasks amp BioExtract Server EED out data access analysis storage and workflow creation Heb Create and Import Workflows CreeandimpotWokfows Biomar sue Create Work Example format conversion cpoar maize M 2 Gene funcion predition in Macaca Nome pisaseentera warkow nama i petb protein a onment Protein Aionment Profiling Descriptlon Dicas enter deseroton NP 00135 Purple Sequence Algemene GODTAS Sequence E Nucleotide InterProScan 4 Phylogenetic analysis rom gene 4 protein alignment rom a gene query Query aan i580 domain n homo screen Chasse Fl No chosen mm Sauene Format Conversion unique asparagus petb nudeotides l mRNA Markup tblastx non redundant alignment Save Workflow Import Workflow Page 33 Step 2 Execute the tblastx tool from the Tools tab 1 Cl
34. g The tool form opens in the right panel Change parameters according to your preferences Input into a tool may be modified if the original data was entered in the text box or uploaded as a file Click the Save button to keep your changes Query txtracis Tools Lim to the following organism ENTREZ_QUERY E ld NP_D00735 Mutitple Sequence Jemen m and masking Nucleotide iteroroscan Fiter Low complexty FitTeR L Phylogenetic analys from gene Fiter Human repeats FILTER E Fiter Mask for table oniy FILTER m El Protein alignment fom a pene query doman hono screen j advanced Opbons sequence Format conversion Other advanced OTHER ADVANCED J sequence Format Conversion Femattina options clic m Show oraphical overview SHOW OVERIIEW ves E E a Show database inkouts SHOW_LINKOUT yes Get sequence GET SEQUENCE yes El Use new formatter NE FORMATTER ves E tbiaste non redundant alignment Descrntons DESCRIPTIONS 50 execute query Aionments ALIGNMENTS a_50 e 4832 maximum number of sequences AX NU SEQ Ton 50 fomir 4922 Formatconversion 4823 Fetchtransiation 4834 E Cuts 4895 Tool Results blast resus hen Iz view neous Cotes 4936 E E Version 1 0 Copying a Workflow To copy a workflow 1 Click the Create and Import W
35. gn out amp BioExtract Server data access analysis storage and workflow creation Query Extracts Tools Workllows Owned Groups A New Group By cresting groups and inviting other Server users ta participate in them you can easily share analy c tools workflows and data extracts create now group ensure you are logged In and Create Group on the lat Read more about creating groups here Groups you own ara sted under Oped Groups and groups of which you are a member are sted under Member Groups 4s a member you can view and execute any of the tels workows Place Holder and data extracts that you ve been given permission to use Users Additional actions Create Group Groups of which you are a member are listed under Member Groups As a group member you can view and execute any of the tools workflows and data extracts that you ve been given permission to use To use group tools sign in and click the Tools tab In Available Tools expand My Tools By default group tools are listed along with your private tools Locate and select the group tool you want to use The tool form will open in the right panel To use group workflows sign in and click the Workflows tab By default all workflows public private and group are given in one list Locate and select the group workflow you want to use The workflow graph and its control buttons will open in the right pan
36. help to use the type of data as the logical name especially if the tool has more than one input Since Fetch Records expects this input file to be a list of Ids we ll use ids for the logical name Description Any additional information or notes about the tool can be written here Record Number Limit If the current extract is being used as the input the number of records included will be truncated to this amount 0 means no limit lle Name Enter the name with full path that BioExtract Server should use to create this file For Fetch Records input file we will use C biotools id_input txt Data Types Not used for local tools Page 24 Version 1 0 Uses Current Extract This control isn t used for local tools and has no effect at all on whether the input can come from the current extract It s best to leave it unchecked Inputs Create New New Input Save Delete Cancel Physical Name i Logical Name Ids Description Unique sequence identifiers Record Number Limit 0 File Name C biotools id_input txt Data Types Amino Acid Nucleotide Sequence Protein Sequence Sequence m Uses Current Extract File Size Limit 0 Include in Command Line 7 File Size Limit If the size of the input file should be restricted to a certain number of bytes you can enter that number here This should only be needed if your tool has a limit on the size of the input files given to it Most tools ca
37. ick the Tools tab In the Available Tools menu expand the Similarity Search Tools group and select the tblastx tool The tool s form opens in the right panel 2 In the Input Data section specify Paste or type data into the text area for the input source and enter the accession number of the nucleotide sequence record as input L16896 without the quotes Send us teeda BioExtract Server Welcome custou gmail com L sign out 1 data access analysis storage and worktlaw creation Query Extracts Workflows Groups Help At organisms E Tools LlFiters and Masking 59 Alignment Tools Fitar Lon complexity FILTER L E BioMart Fiter Human repeats FILTER El CpGAT Fiter Mask for looku table only FILTER m F1 soin p Advanced options Information Tools My Tools Formatting Options Nude Tools Show araphical overview SHOW_OVERVIEW ves E Phylogeny Tools Show database linkouts SHOW_LINKOUT ves Protein Tools Get sequence GET SEQUENCE yes E Similarity Search Tools Use new formatter NEW_FORMATTER ves Descriptions DESCRIPTIONS Fra Alianments ALIGNMENTS 8 50 maximum number of sequences MAX_NUM_SEQ J etx i il blastn lasts Ec Status Executing Executing xecuting Modifying Current Extract Execution Complete Updated 10 23 19 Tool Results blast_results heml Pace 34 Version 1 0 3 Click the Ex
38. in a certain order this number gives the rank of this particular parameter The ordering is from lowest to highest so a parameter with a tab order of 2 appears above one with an order of 3 Is Mandatory If checked the tool will not be able to execute unless a value for this parameter has been entered Click the Save link under New Parameter Page OR Version 1 0 This is the completed db parameter for Fetch Records Parameters Crete New New Parameter r Save Delete Cancel Logical Name Database Physical Name db Description Name of database containing desired sequences Parameter Type select Tab Order 0 Is Mandatory 12 Values Create New nebi Edit embi Edit 3 Step 7 Saving the Local Tool Once all of the necessary inputs outputs and parameters have been defined click on the button at the bottom of the screen Please note that all of the sub forms opened for each input output and parameter must be saved before clicking EET If any of the sub forms are still open the following error message will be displayed Paar 29 Version 1 0 Running a Local Tool BioExtract Message You must click Save atthe top of the form to keep your changes then dick Save Tool Changes to complete the process Once the tool is saved it will appear under the My Tools group in the Available Tools menu When you
39. l Fetch Records has one parameter which specifies the name of the database containing the desired sequence records On the command line it would appear as db ncbi Toadd parameters begin by clicking on the Create New link next to Parameter Groupings Then click Edit next to New Grouping Parameter Groupings Cate New New Grouping Parameters ate New Assign a to this parameter grouping The Viewed V check box allows you to specify if you would like to have the parameter group expanded when the tool is being executed Parameter Groupings Cate New New Grouping Save Delete Cancel Name Required Viewed V Parameters Create New Now click Create New next to Parameters and then click the Edit link next to New Parameter Below is a description of each field required to define a parameter After describing these fields in general we will demonstrate how they were used to add parameters for the mutate tool Logical Name As with Inputs and Outputs before the logical name is the name used for this parameter within BioExtract Server Physical Name This is the actual parameter name as it should appear the command line Note that BioExtract Server will not add a automatically so please remember to do so if your tool requires a before the parameter name Description Optional Additional information about the parameter Pace 97 Version 1 0 Paramete
40. ls view record link provides quick access to details pertaining to the selected record The External Link allows navigation to the Website hosting the original data record Figure 3 Page 5 Version 1 0 amp BioExtract Server data access analysis storage and workflow creation Query Toole Workliows Grouns Help Query returned 13 records Databases Database Total Database Description Protein 13 Protein sequence record Results Page ro Y ret h External Link Local Details Description Q ewrmard Konsens rant fator 2 Pinus taeda impose view record RASAN rene for voz Pinus taeda 223574397 viewrecord RZR3 MVYE transcription factor MYB16 Pinus taeda 223574395 record R2R3 MYB transcription factor 1 Pinus taeda 223574393 viewrecord R2R3 MYB transcription factor MYB10 Pinus taeda Q 222674201 view record FOR transcription factor a Pinus taeda 34142026 viewed KORG rancor is taeda Q ser SORS roc actor i anal M sousHGUS view record R2R3 MYB transcription factor MYB2 Pinus taeda sooss606 view record R2R3 MVE transcription factor MYB3 Pinus taeda 1 view record R2R3 MVE transcription factor MYB7 Pinus taeda u9us0602 view record R2R3 MVE transcription factor Pinus taeda Figure 3 The BioExtract Server Extracts page showing the results of a query performed on the Query p
41. m the Tools page in the Available Tools menu expand the Alignment Tools group and select the ClustalW2 tool The tool s form 2 In the Input Data section specify Use previously executed tool results for the input source and select FetchTranslation and fetchTranslation_results txt from the associated drop down menus 3 Click the Execute button The tool ClustalW2 runs and creates a multiple sequence alignment and draws opens in the right panel a dendrogram that represents how the sequences are related Clustalw2 Multiple Sequence Alignments More Information Input Data Input file Use records on Extracts page formatted as FASTA v lv executed tool results 4 Fetchlransiation s st lz Upload data saved on your computer No file chosen Paste or type data into the text area bioetractorg clan Esos 7 123249414 16027854 1377229 28 4 15492886 60552423 55560541 73556793 200784652 HEREAPLAVLPFSD VONSVRVLQELNHOREKGOYCOATLD VGGLVFAHNSVLACCSEEFOSLYGDGSGGSV VOA ge geek VA TGA Pace 39 Version 1 0 Step 8 Create a TCoffee multiple sequence alignment and dendrogram with the pr
42. n accept files of any size so if you re not sure it s probably safe to leave this at the default value of 0 no limit Include in Command Line If your tool expects the name of the input file to be given after its name in the command line this box should be checked if Include in Command Line 1 i checked BioExtract Server adds the file name to the command used to run to the tool If Include in Command Line i jeft unchecked the file will still be created with the name Page 28 Version 1 0 amp given by File Name C biotools id_input txt line but its name will not be added to the command Some tools require a switch like i or f before files given in the command line For example Fetch Sequence requires a i before the input file The command line would contain i CA biotools id_input txt Remember the Physical Name field we skipped earlier If the tool requires a switch before the input file enter the switch in the Physical Name field Once all the information about this input has been entered click the Save link at the top of the form Any number of inputs can be added depending on the requirements or limitations of your tool Step 5 Specify output files The interface for defining output files is very similar to the one used to define input files As with input files output files are optional and need only be defined if you would like to use the output of the tool as input
43. nvolve Selecting the tblastx tool and providing the accession number of the nucleotide sequence record as input The output from tblastx a BLAST report along with a set of records representing similar sequences is parsed using a formatting template to produce an initial data extract i e a set of matching nucleotide sequences The resulting data extract is saved The resulting data extract is used as input into the tool Vmatch see http www vmatch de to remove duplicate sequences tool fetchTranslation is run This tool is defined to use the current data extract as input in GenBank format and returns the protein translations from the GenBank annotated coding sequence CDS regions in FASTA format The ClustalW tool is run to create the multiple sequence alignment with the input specified as coming from the previously executed tool fetchTranslation and to define and draw a dendrogram that represents how the sequences are related The TCoffee tool is run to create the multiple sequence alignment with the input specified as coming from previously executed tool fetchTranslation and to define and draw a dendrogram that represents how the sequences are related Page 37 Version 1 0 Creating a Workflow As an example let s create the workflow outlined above Step 1 Preparing to create the workflow If you have just signed into the BioExtract Server it s not necessary to clear the work
44. oer taps ara used in Esper You tag this volo wi gren gere geromis proteins sequence specs and oin Wy Sat Compa tho upeat pocese E Pace SR
45. om all common all common all BTB domain searchable database fields in the database Contains all authors from all common author common author Zhang references in the database records Includes only those words found common defn common defn Homo in the definition line of a record sapiens AND common defn BTB domain AND common defn zinc finger Contains the biological features common fkey common fkey gene assigned or annotated to the nucleotide sequences and defined in the DDBJ EMBL GenBank d SEI i Version 1 0 Feature Table Contains the standard and common common gene common gen names of genes found in the database records common id common id 157694498 Contains special index terms from common keyword Zbtb8 the controlled vocabularies associated with the GenBank EMBL DDBJ SWISS Prot PIR PRF or PDB databases common species Camphor tree Contains the scientific and common taxonomy common taxonomy Mus names for the organisms associated TESTS with protein and nucleotide sequences Title of the journal abbreviation Common title Common title Plant Physiol Table 1 List of valid search fields and example search terms If you want to change tblastx tool parameters click the plus sign to expand the tblastx tool step The step will expand and display the modifiable sub step Tool Select the Tool headin
46. orkflows heading on the Workflows page The Create and Import Workflows form opens in the right panel 2 In the Create Workflow section click the button 3 Next open the workflow you want to make a copy of and run it The BioExtract Server will record the running workflow in the background 4 Once execution is complete click the Create and Import Workflows heading on the Workflows page The Create and Import Workflows form opens in the right panel 5 In the Save Workflow section enter a new Name and keep or change the Description then click the Save button The name of the new copied workflow will appear in the Workflows menu You can now modify the original workflow or the copy Save Workflow Please enter workflow name Description please enter a workflow description ET Page 6 Version 1 0 BIOEXTRACT SERVER GROUPS About Groups Groups provide a collaborative environment to facilitate the sharing of data extracts analytic tools and workflows Registered users can create new groups by using the Create Group option on the Groups page When you create a group you ll be able to invite others to join Plus you ll be able to share your tools workflows and data extracts with others In addition you ll be able to modify your shared workflows and remove tools data extracts and workflows from group access Groups you own are listed under Owned Groups ace hbeu omailcom si
47. otein translations retrieved by FetchTranslation 1 From the Tools page select the TCoffee tool in the Alignment Tools group The tool s form opens in the right panel 2 In the Input Data section specify Use previously executed tool results for the input source and select FetchTranslation and fetchTranslation_results txt from the associated drop down menus 3 Click the Execute button The tool TCoffee runs and creates a multiple sequence alignment and draws a dendrogram that represents how the sequences are related TCoffee Computes a multiple sequence alignment and the associated phylogenetic tree for Protein RNA and DNA sequences More Information Input Data Inout sequences FASTA format Use records on Extracts page formatted as FASTA Use previously executed tool results 4 FetchTranslation fatchTranslaton results bt Upload data saved on your computer Choose File No file chosen Paste or type data into the text area ct Seve Google Chrome C biosaract o 60552423 72956793 n Page 40 Version 1 0 Step 9 Save the workflow 1 Click the Workflows tab and select the Create and Import Workflows heading in the Workflows tree The Create and Import Workflows form opens in the right panel 2 In the Save Workflow section enter a name and description for the new workflow and then click the
48. r Type This specifies the way values are given for the parameter Depending on the type they may be entered directly by the user or chosen from a list of pre defined values Below are the details for each type o text Creates a field where the value can be typed in directly Also useful for numeric values Severity 5 checkbox Creates a checkbox Useful for parameters that don t have any additional values for example some programs will print extra information if a v is present on the command line If checked the parameter specifically the Physical Name will appear on the command line ju Verbose Output o select Creates a drop down menu from which one of several possible values can be chosen Mutation Options Severity s Point Mutations point insertions v Codon Mutations codon i Verbose Output v O insertions deletions When using the select type the possible values must be defined This is not required for any of the other types To define a set of values click on the Create New link next to Values Then click Edit next to the New Parameter Value Value is the value as it should appear on the command line If Is Default is checked this value will be selected by default when the tool s interface is shown textarea Identical to text type above Tab Order Optional If you would like the parameters to be displayed
49. scription factor bHLH 115 Arabidopsis thaliana 1415334911 viewrecord tranecriotion factor BHLHS3 Arabidopsis thaliana 88 29313299 viewrecord transcription factor bHLH121 Arabidopsis thaliana Figure 4 The BioExtract Server Extracts page showing the Export Records feature with downloading options FASTA and Default which depends on the data source queried Saving a Set of Records While most of the BioExtract Server s functionality is available to all users the ability to save records on the BioExtract Server website is available only to users who have registered with the system Users who are signed into the BioExtract Server may save records as searchable data extracts by using the Save Extract button available from the Extracts page Figure 5 Once saved data extracts can be found on the Query page in Available Data Sources under the heading Miscellaneous Figure 6 All data extracts saved on the BioExtract Server website are privately owned by the user and are only made available to others by explicitly sharing them with a group This is accomplished by 1 clicking the Groups tab 2 creating a group under additional actions 3 clicking the new group name 4 selecting the Extracts tab for the new group and finally 5 clicking the Add Elements button to select the data extract to share Version 1 0 Save Extract x Please enter the following information Extract Arabidopsis thaliana bHLH r
50. select the tool the interface presented is the same one used by all BioExtract Server tools BioExtract Server Send us feedback Welcome clushbouggmsilcom sign out data access analysis storage and workflow creation Query Extracts Tools Alignment Tools Biomart CpGAT Edit Tools Information Tools My Tools nd ESI InterproScan Fetch Records Format conversion backtranseq baet cpgplot embossdata fconsense neighbor fseqboot Workflows is Help Fetch Records This tool takes a file containing a ist of ids and the name of a data source ie ncbi ombl refseq The tool accesses the data source and retrieves the records associated With the ids No help available Input Data 1ds Use records on Extracts page formatted as FASTA E Use previously executed tool results Select tool s Select result s Upload data saved on your computer Ghoose File no fie chosen Paste or type data into the text area Parameter Settings Pace 30 Version 1 0 A few moments after clicking Execute a popup window should appear followed by another one that looks like this The application s digital signature cannot be verified Do you want to run the application Mame LecaToolAsle Publisher T rons pusher pneum This is normal A Java applet is used to execute the local
51. ste Page lae ew ecard cadhin 9 a gole reed Enlai elope Mb deer rested cadherin 19 tate M arcis reed cadherin 39 eroi dames Mb arona deer prostan cadherin 19 ieofomn 2 mac ml rete cbr eto 2 pan weed reseed cadherin 29 ela us rede arin 19 e septo lactis prend cadherin 19 tpe Iesus cabs nm reet similar to cure pe sre cadherin 13 ems europea press chein carolinensis Pace 12 Version 1 0 A Second Example With BioMart and Fetch Sequence Records Now let s create a new data extract using BioMart and Fetch Sequence Records tools 1 From the Tools page expand the BioMart group and select ENSEMBL GENES 65 SANGER UK From the expanded group select the Homo sapiens genes GRCh37 p5 dataset The tool form opens Send us eeu BioExtract Server Samen Users gust data access analysis storage and workflow creation resister why rater 1 Workflows Groups oat Daria rerio genes Z9 Homo sapiens genes GRCh37 p3 Ean al sees E ee ile cata genes com d deus washes REGION gonila genes 3 gt Heep ge omnes TRANSCRIPT EVENT rodenia genes is ede GENE ONTOLOGY Menedelohis domestica EXPRESSION E MULTI SPECIES COMPARISONS PROTEIN DOMAINS Meleagris alopsvo genes
52. t input Fie diem Et Tots Use prevousiy erected tool resuts Select toot z Select ren FermatComversin Upload data saved on your computer mor Choose rie no chosen formation Tools Paste or type data mto the tet area my Tools onde Toole Prvoga Tools Parameter Settings T Iconversion options To Format 98 genbank sl From Format 4 fasta i Pace 17 Version 1 0 Step 6 Retrieve the protein translations from the CDS regions of the DNA sequences using the FetchTranslation tool 1 From the Tools page in the Available Tools menu expand the Information Tools group and select the FetchTranslation tool The tool s form opens the right panel 2 In the Input Data section specify Use previously executed tool results for the input source and select FormatConversion and result txt from the associated drop down menus 3 Click the Execute button The tool FetchTranslation runs and returns the protein translations in FASTA format Input Data Input File Use records on Extracts page formatted as FASTA v Use previously executed tool results 3 FormatConversion result e Select tool Upload data saved on your computer 2 xmknr Choose File wo file chosen Paste or type data into the text area Paor 3R Step 7 Create a ClustalW2 multiple sequence alignment and dendrogram with the protein translations retrieved by FetchTranslation 1 Fro
53. u varant t mma siet recat hme sapiens ceapan member 3 rep meret vara 1 rna Hono mien sre Frer protein s2 eons ss re hono snes one amet np wave ane pri mena messin pon mehr ave eb prim mme manibns cene pigmente mem ssi sense epi nono sanen tesis sessed 28 te rac nee nma meine high reb ras box mimm sew record Pace 1A Version 1 0 Adding an Analytic Tool Selecting From a List of Available Tools Adding an analytic tool by selecting from a list of Web services is fairly simple primarily because most of the information required by the BioExtract Server system can be obtained by parsing the XML Web service description and translating it into a BioExtract Tool Object The particular GUI control that the system maps to a specific tool parameter is a function of the parameter type From the Tools page use the Add a New Tool menu to view the list of tools which you can add to your BioExtract Server account The search text box above the list of tools lets you search for a tool by name As an example entering needle will highlight those tools containing the word needle Figure 8 Clicking on the Save Tool button adds the tool to your BioExtract Server account Query Extracts lows Groups Help a needle 1 Heomony Too Logical Name Description Recdioman wWunsch alignment of two
54. website http www myexperiment org 4 Click the Workflows tab Click the GO button in the upper right corner under the New Upload heading Eee iment ooo mamno Us Pubeatone iion Gees Feetack ite Files Packs users Groups EES Home Workflows Gane function prediebon in Macaca Fasciculus Ben Workflow Entry Gene function prediction in Macaca Fasciculus Creed ee OMS Uantepdaet GA cese Attn Tags 0 Fanta Pac Rag tld Piae lire Ven y iom P O Version 1 oF 1 gote BioExtract Server Version created on 2709 1 24 1827 tw w 5 Click the Choose File button and select the file representing the workflow you exported from the BioExtract Server Upload Workflow 1 Workflow file script Choose File tblastxnonr ignment xml 2 Main metadata Attempt to infer metadata and possibly generate preview images from the workflow file script Enter custom metadata Up and Continue 6 Click the button at the bottom of the screen Pace 57 Version 1 0 Select the desired tags and click the button The workflow has been uploaded to myExperiment experiment Jod in Lit Peete Mint Home Users Groups Wonton workflow metadata m mcm neti ae ned nthe cone fr vor Ck ech my II MEI ots rm Nae Tag suggests are quer ty war your wore uf
55. with the option of selecting computational tools from a large list of Web services including the European Molecular Biology Open Software Suite EMBOSS BioMoby and the Kyoto Encyclopedia of Genes and Genomes KEGG The system further allows users to integrate local command line tools residing on their own computers through a client side Java applet Page 3 Version 1 0 QUERYING DATA SOURCES Executing a Query Creating a Query Queries are constructed in the BioExtract Server by first checking the box for one or more data sources and then composing a query using the Query Form The screen grab below presents an example of querying the NCBI Protein Database for the R2R3 MYB gene in Pinus taeda Loblolly pine Figure 1 Queries executed against data sources residing at the BioExtract Server respond quickly For example a query of Viridiplantae returns in a matter of seconds Response times for queries against data sources residing at other sites e g NCBI Nucleotide and Protein databases are consistent with response times for queries made directly at those sites Send ws eet amp BioExtract Server Welenme ERIS gma cen t data access analysis storage and workflow creation ESQ Toots Workflow es Avaliable Data Sources Select er mare data sources to query Fetch Sequence s 1 you know the Gt accession number of the sequence s you want to retrieve C Miscellan
Download Pdf Manuals
Related Search
Related Contents
Braun Satin Hair 5 HD 510 ABiLINX 2531T 取扱説明書[第1.3版] PFM-711A ELECTROSTATIC FIELD METER WAM TECH-IMPRIM - Primaire d`adhérence Cours sikidy N°4 - Anthropologie - Anthropomada Gesa wf800 - NC DHHS Online Publications COMMS GENIUS USER MANUAL DRAFT Copyright © All rights reserved.
Failed to retrieve file