Home

GenePalette 1.1 Manual (in PDF format)

1. Y G ree te ome pa eo a EE e Above isa list of oligos 1 designed using the 5 steps Ifyou want t see how my oligos lok on the sequence Figure 13 you can add them as an Oligo ist feature Copy all af the oligo Goto the Features men and click Add Feature In the Feature Dialog give the Feature a name like oligos and cick on the tab labeled Oligo ist Feature Paste the list of oligos into the text area contained under the Oligo ist Feature tab Click OK ngan nasara DS Desen ao Rampage wter E bem Tutorial 3 Working with Transcription Units In most cases a GenePalete user will be interested in using and manipulating the transcription units that are annotated on th sequence Not only is it important to be able to see where a transcript is located it is also necessary to be able to modify the annotation so that it matches further information you may have about a transcript This tutorial will show you how t use the annotation access and editing features of GenePalete In this tutorial we will use the portion of the Enhancer of split complex that was downloaded during Tutorial I A copy of this sequence is available with GenePalette in the file tutorial seq in the Sequences directory Ifyou are starting this part ftom scratch follow the instructions from Loading a GenBank Sequence from Part 1 Ifyou are not connected to the interet you can follow the majority of this tutora using the
2. Su H and PN E BOX Add hese features from the tutorial library to the sequence go to the Libraries menu and click the Add Feature From Library menuitem You can add multiple features from multiple libraries at the same time by clicking them Now you can sec that th ba pn 1 enhancer fragment contains two high affinity binding sites for Su H and two binding sites fr Pro neural bHLH activators there are 3 in the display but if you look closely one site is a semi palindromic match Exporting Images of the current view GencPalente allows you to export images of both the graphical display and the markup view in GIF or PostScript format Figure 6 The two available formats were selected to cover the wide range of uses for an output image The GIF image an be a great way to quickly demonstrate a genomie feature for a lab mecting presentation or something to email to a collaborator The Postscript image is extremely useful for custom editing your image for presentations posters or even publication purposes The PostScript format produced by GencPalette can be opened in graphical editors such as Adobe Ilustrator and every object in the image can be modified From the File menu click on Export Graphical View and select from Export GIE or Export Postscript Supply a filename for your image The fle selection dialog always stars in a directory caled Images under the main application directory This is a convenient place
3. Experiment with resizing the divider spacing The Sequence Display The top area of the loaded GenePalete window contains the nucleotide sequence along with information about the sequence As you select portions of the sequence Gata about your selection is reported in the data panel above You will also notice that a box will appear in the graphical view that shows what you have selected Select some par of the sequence to tr this feature out Note that vou can cop Portions of the sequence from this text area The Markup Display The markup view is the next area of the main window and is initially a blank pane Later during the session we will be using this area to view features at the nucleotide sequence level Upon user interaction with the graphical display regions are displayed with features highlighted on the sequence The Graphical Display The third main area contains a graphical representation of the sequence The top panels contain data about this graphical view the base pair pixel ratio a slider bar that allows you to adjust the ratio a scale bar and a legend panel that w indicate what cach symbol means as we start adding features to this sequence Move the slider to see how the base pair pixel ratio affects the graphical view The main part of the graphical display is the graphical representation CH Graphical View Note how you can move the scroll bars upidown and eh ne to scroll to parts of the s
4. Local Drosoptila melanogaster orig 2R 20302 20340 oner Drosophila melanogaster CC6183 MANA 19972 emote Drosophila melanogaster GHO9427 ful eng 16648 en KA Drosophila malanogati 15453 Remate Drosophila melanogaster genomic SCI 14 10727 EERE t Figure 3 Onee sequence has been stored the Sure colar sho that i it Note 10 Mac OS9 Users For Mac OSX users local access to large sequences is as fast as mentioned above However far users still using OS the step for loading all ofthe genes on a large sequence wil stil be fairly slow 30 60 seconds This is because the support for OS Java is limited and has been discontinued Although most things work there are still a multitude of bugs and inefficient processes that will not be Fixed Apple has dedicated a lot of energy the creation of a beautiful implementation of Java on OSX and I would take this space to urge any Mac user to try GenePalette on OSX All af the screen shots in this manual were made with the OSX version of GenePalette Because of the dated Java support on OS9 and dwindling support fr this operating system GenePalete version 1 1 was nat released on O89 Using Local Sequence Off Line Although collections are stored in the GenBank directory the Entrez query function is still very useful This is because the GenBank records are indexed in several ways Many times you can get to the right genomic contig by typing in the full name ofthe gene
5. There are 3 buttons next to the color chooser which control how the exon data table works The Add Exon buton will add an exon to the end of the exon table The Sort button will sort the exons that have been entered and name them according to position Finally the Delete Exon buton will delete the exon that is highlighted in the table ifan exon is highlighted You can enter values for the start and stop for cach exon which exists in the table as a separate row You cannot edit the Name column because the program automatically names exons by order in the direction of transcription Do some quick math to subtract 70 from the start position of Exon Lof ma and add 13 to the end of this exon Enter your new values in the row for Exon 1 in the Transcript Editor Dialog The exon using tutorial seq should now o from 29597 to 30266 Because we Believe in the coding region designation we will not aler that part Click the OK buton and you should have an accurate version ofthe ma transcript This concludes the tutorial portion of the manual The next chapter will go into depth about downloading sequences from GenBank Furs ra tanec bosseg Ea Daly Onee 1e ale is ana pon vtec tose ne Greal Yew alpha CHAPTER 2 GenBank Access Introduction Pivotal to the usefulness of GenePalet is the ability to access sequences and annotation through the Internet GenePalette uses the Entrez server at the National Lib
6. a Feature Library Add Feature From History Select a feature from the session history to add to the current sequence The history contains features that have been added or modified in any sequence during the current GencPalette session It also contains features that have been deleted from libraries during the curent session Ge Iha Ee The Libraries Menu Add Feature from Library Allows user to add multiple features from any loaded library to the current Sequence Add Feature to Library Adda new feature to a loaded library Aa o Library from History Adda feature to a loaded library from the session history The history contains features that have been added or modified in any sequence during the current GencPalette session It also contains features that have been deleted from libraries during the current session Modify a Library Feature Use the feature editor dialog to change a library feature Delete Feature from Library Delete ane or more features fom a single library Rename a Library Change the name of the library as it appears in the library selection tabs for adding features from libraries and in the library selection dialog Create New Feature Library Makes a new feature library The user must specify a name and file under which to save the new library Load Feature Library Load a feature library that has been saved Ifa library is inthe Libraries directory under the main prog
7. can search your local genome collections by gene symbol Improved organization of local sequences curate your local GenBank directory by storing collections in their own sub directory New clickability functions allows enhanced connectivity between interface elements CHAPTER 1 GenePalette Tutorials Introduction This chapter contains several tutorials that will guide the user through the operation of CiencPalet The first tutorial will give the user the bare minimum knowledge of how to use the program The later tutorials will expand on the fundamentals to demonstrate some of the more complex abilities of GenePalete In these tutorials we will take you step by step through the use of the program Program components menu s buttons sliders ete are shown in Bold text Actions that you should perform in the program are Underlined Tutorial 1 Reconstruction of a published reporter construct When studying enhancers it is common to want to understand how a published reporter fragment was constructed Although it appears trivial this process can he quite time consuming and difficult accomplish However using GencPalette this task is executed quite easily In this example the genomie region surrounding a gene ofthe Enhancer of Split Complex of Drosophila melanogaster wll be loaded and viewed with respect to a published upstream enhancer We will use this loaded sequence to highlight the basie features of GenePa
8. drawn to be as fall as the tallest overlapping exon For example clicking on any of the first 17 exons of the blue Nfl transcript will give a box that surrounds all three transcripts However it you click on Exon 18 the box will only surround the first two transcripts Click on the rows in the exon tables of NF to see how exon boxing works Now that we are comfortable with the basie sequence operations of the main window it is time to explore the regulatory sequence ofthe gene E Spl my Adding Features to a Sequence In this section we will add some features to the sequence we have loaded Features are defined as any sequence element that can be describe by sequence identity These would include transcription factor binding sites primer sequences restriction enzyme sites SNPs mRNA regulatory motifs or anything else you can think of In Nellesen ct Spl my was described 1990 e cloning of an enhancer element upstream of mg Frm 2 Hn age containing mg Deia Aran Taka 1993 4 1205 dp ef JALE imd UN Gen vas mubelasd into Capac Au Thal rage containing the ener was cloned directly io ZE OR nz Zenek X Figure 8 Dialog for Ading Features rom Feature Llanes Each ovary sa tatibed pane Inthe dialog and contains a table ane yay estes To ele features o a ich on mo tab fora rary and creck on ho features you want to add in the fast columns Zeie Once you ht OK the feature s is Searched ac
9. or non coding you will select the Sequence that spans the whole exon Click on both coding and non coding exons in Ze graphical display to get a feel for how the other components react to this display Make sure to click on different exons of NF so you can really see how Sons are selected in The data table Data Tables The final area of the main GenePalete window is split vertically into two panels The rightmost panel is currently empty but will soon contain data about features that will be added to the sequence The lefimost pane contains data about the genes that currently reside on the sequence Each transcript has an entry in the Combo Box at the top of the panel labeled with the name of the gene And the number of alternate transcripts in parentheses as in Nfl To access different transcripts simply click on the combo box and highlight the gene vou want to see Directly under the combo box there is data about the gene unit product name gene orientation and a range that specifies where coding sequence starts and ends The final component of this top region is a Combo Box that designates the color of the gene Experiment with changing the transcript color and note how this changes he color in the graphical display The bottom portion of cach transcripts data panel contains a table of exons When you click on a row in the exon table the exon is boxed in the graphical view So that you can see how exons overlap the box is
10. save the 1136 I site go to the Libraries Menu and lick Add to Library from History From this menu item you can add any site hat has been added to a sequence during the session to a library Choose the Eef136 ae and then choose a library to add it to Figure 5 Dialog for ading a feature dect to a sequence Trimming a Sequence Now that we have a specific region of my that we are interested in it would be nie to narrow our search so that we can focus on this gene To do this select a ox which includes both the my locus and the Hind LEO I fragment Go to the Sequence menu and select Trim Sequence to Graphical Selection via the Trim Sequence sub menu A new GenePalete window is created which contains Just the sequence that was selected in the box The old window is also stil there in Case you Wanted to use it You will want to keep this window around for the third tutora Completing the enhancer analysis of my The enhancer described by Nellesen et al was contained ina Apn Lv subfiagment Add Kpn I and Xha 1 from the restriction library just as we have done for Hind I Finally you can see the small piece of DNA that was used for the my reporter gene As described in the text this fragment contains binding sites for both Pro Neural basic Helix Loop Helix BHI activators and a transcription factor Suppressor of Hairless Su Included inthe Tutorial Library are binding site consensuses for these two labeled
11. the Hind II boxes that appear in the markup view This resus ina repositioning ofthe red arrow as well as highlighting the mach s row in the Feature Table Clik on any base in the Markup View This results in a repositioning ofthe arrow and selection of that base in the Sequence Displ Another feature of the markup view is that it relates information about the genes hat are annotated on the sequence DNA that is not associated with a transcription unit appears in the markup view as black letters DNA that encodes a non coding region of a transcription unit appears as white letters on a gray background for the Stand upon which the gene resides Bases that code for a protein portion of a ranscript appear as t same color as the coding exons of that transcript Using he Graphical View to select sequence and create a Markup View Another way to activate the markup view for a region is to drag out a box of sequence in the graphical view This operation will generate a Markup View of the boxed region and will select the boxed sequence in the Sequence Display Press the mouse buton in the graphical view and drag the mouse across a region of The sequence Experiment with generating a markup view and using it to sce features and transcription unit details When you box a region that overlaps an exon thar is not on the in of the DNA that of the line exon will be annotated in the Markup View When you drag out a box that will co
12. tutorial seq file contained in the Sequences directory under the main GenePalete Directory A glossary ofall GenePalete menu items will be given in Chapter 4 Extracting a cDNA His useful to have access to the spliced mRNA implied by the annotation of a transcript on genomic DNA Figure 14 To access this information go to the Sequence menu and click Extract Transcript cDNA Sequence A dialog appears bat gives you a list of transcripts currently associated with the sequence To demonstrate how this works it is most helpful to use NFI which is the only gene with multiple exons in the sequence Select one of the three NFL altemates from the Transcript selection dialog Note that you can easily tell between the two altemates rough the Tast column of the table labeled Color Click the OK button A sequence output dialog appears that has the spliced sequence in it From this dialog yu can copy the sequence for subsequent use Blast strider or whatever When You are done with the EDNA sequence click the OK buton of the dialog to close the dialog Fige 14 aana s cDNA The lod CONA sequence ot ny Hapayad dlalog txt wea where you cen sic me Esque nd copy Win dns Yeu sl press he OK baton ose me aaa Extracting Coding and Nan coding Sequences The Sequence menu contains two related submenus that allow you to extract either coding or non coding sequence from the genomic fragment Both submenus Extract No
13. GenePalette User Manual Version 1 1 TABLE OF CONTENTS NEW TO VERSION 14 CHAPTER 1 GENEPALETTE TUTORIALS is an Kar Ze Msn i nes GV ere igen Ven egal ip Wi ona retin ti ps nanan ane ings hunt Cee ya tn e i pa eari ii tap CHAPTER s GENBANK ACCESS ala chon ee r e anag a ear Sipe ong ses a un o ei Sting O e A Saray ea Cakes S S H CHAPTER 3 FEATURES AND FEATURE LIBRARIES 0 CHAPTER e INDEX OF MENU TEMS sete L E pn Cal View pa GF S S Bor cn E Ene er apa be SE 5 eve Comp Se Za Game ey Ha te re Ze Fee ee ee beteek a k New to Version 1 1 Since the release of version 1 04 of GenePalete in June 20002 we have made many changes The software is improved in so many ways that we decided it deserved its own tenth So here is a list ofthe major changes introduced with version 1 1 of GenePalette Improved speed of access to fragments of large contigs human mouse data is accessed at least 10 times faster Added the ability to import gene annotation from Ensembl with our new GenBank format importer New space saving layout that is more friendly to smaller screens Postseript output of Graphical and Markup views allows users to edit their images in graphics packages such as Adobe Ilustrator Enhanced support for access to local sequences you
14. am of the most upstream gene all the way y D Selec Upstream and Downstream Bases Warming gene was added because Lahe your selection tot of 2 genes selected Total bp selected Sisa Ian Gm Your selected Canes mm AT a ED ve N B Fe 2 Gane cto ange let When gene lc ota rte gere at ons dagang gre etches y a nou an argues Age Se Beg le so stray COM was sliced EN a00 id xa AP nom yu casa araro mal a poe agp seta Gea Abe rar a a get em o D to the first base of the next gene Figure 2 The downstream slider works ina similar way Above the sliders is contained information about the numberof genes and base pairs selected extra genes were included in the selection a message appears in this dialog to let you know how many genes were added Loading the selected sequence Finally when the range of sequence to be loaded has been negotiated the parser will download the sequence from GenBank using the unique gi number The sequence retrieval system now uses the E Utilities at Entrez for quick access to sequence fragments E Utilities allows users to download specific sub regions of large sequence quickly whereas before one would have to download the whole sequence to work with a portion at the end of that sequence Entrez query basies A basie Entrez query is usually going to start with a gene name or symbol Using the gene symbol as the first word of a query has the added benefit of automatically searching
15. at region will be selected for masking Extract Coding Sequence Much like the Extract Non Coding Sequence submenu this submenu allows users 10 mask non coding sequence while maintaining protein coding sequence by Numbers Allows the user to enter a range of bases on the current sequence to extract non coding sequence from This option should not be confused with the by Numbers option in the Trim submenu by Gene Boundaries Allows the user to select a core set of genes that shouldbe included in the masked sequence Once those genes are selected the user must use a slider to select the number of base pairs upstream and downstream af the selected genes The default position on the slider is t0 keep everything and the user must move the sliders closer to the limit the resulting sequence ta a specifie region o Graphical selection Fa box is selected within the graphical view ofthe window sce tutorial chapter 1 then that region will he selected for masking fo Ei The Feature Menu Add Feature Ada new feature to the curently loaded sequence Delete Feature Delete feature frm the currently loaded sequence Modify Existing Feature Modify a feature that has been added to the sequence Include all Current Features in History Takes all of the features that are loaded on the current sequence and places them into the session history so that the feature can be added to other sequences orto
16. ation data is downloaded compiled and the Java object data is stored in a file with the extension eg in the GenBank directory under the main program er stands for EntrezGrabber which isthe name of the device that interacts with GenBank Next the sequence is downloaded line by line and stored ina flat file With the extension nt for nucleotide Both nt and eg files are named according to the gi number that they represent This i convenient because the sequence and annotation for every gi is required to stay the same if changes are made a new gi is created and our old one wil he ignored should be noted ihat fr large sequences you might need to attempt the download several times before success This becomes a more prominent problem during peak hours of NLM usage during the week day One bug that we know about is that on rare occasions the download aborts in an undetected way and annotation is not properly saved The symptom of this problem is that when you go to load a gene from this sequence you do not get a gene selection dialog but instead are asked to type in a range to extract fom the sequence Lf you see this problem and know that what you downloaded does indeed have gene annotation fry the download again Steps for installing sequence collections downloaded from www genepaleneons In most cases the genome you want to use will be available at our website The download from the website wil be much
17. bl can export regional annotations in GenBank Flat File format To load Nkx2 5 from the Ensembl database Figs 3 9 go to the Ensembl website p v ensembl org using your favorite web browser Type er 5 into the search text field Fig Next a list of all matching entries in the database will be shown Fig 8 The lists are given in alphabetical order by organism Go down to the matches from within the Mus musculus Gene Index Click on the hyperlink labeled ENSMUSG00000015579 You will now be taken tu a page Gene View page for the mouse Nkx2 5 gene Fig 8 Go to the bottom row fof the Ensembl Gene Report Table and click the hyperlink marked Export gone dua in EMBL GenBank or FASTA This hyperlink brings you to the ExponView page Fig 9 that allows you to customize a regional export The ID field in this web form is already filed in with the gene ID ENSMUSG00000015579 Just below the ID field is a textfield that allows you to designate the flanking base pairs to download Enter 200000 into the Show context cf field In the middle of the page are a bunch of export options Select Export as GenBank and in the checkboxes below check the box for Gene Information Once these fields are filed in click the Export button A new page is brought up that contains a GenBank flat file of the 20kb upstream and downstream of Nka2 5 Fig 9 Copy the whole GenBank flat file into your copy buffer and then go tothe File menu in GenePalette clic
18. ces that are spaced by 0 to 4 nucleotides The consensus would be expressed normally as GGGCCA N 0 4 TGGCCC This means that there could be as few as zero and as many as four occurrences of N in between the two binding cores In GenePalet the above consensus would be input as GGGCCAN 0 4 TGGCCC The brackets follow the letter that will be ofa variable length repeat the first number will be the lowest numberof occurences and the second number will be the maximum numberof occurences Every possible number of repeats in between 0 and 4 will be allowed rer Adding Single Feanze sequence Three Types of Feature Whenever features are being manipulated there are 3 options for what kind of feature you can add modify The same feature editing dialog is used whether You are adding a feature to a library or a sequence or modifying a feature that exists in one of those places The top part of the dialog contains general information about the feature which pertains to all of the 3 types of feature while the bottom portion of the dialog contains atabbed pane which allows you to select which feature subtype you want to use If you are not editing a simple feature this tabbod pane will require that more data to be entered into the pane Simple Features A simple feature is the most common type of feature Figure 1 It consists af a name a simple consensus that will be used to search the sequence any notes that you would like
19. ded Features The Hind I sites will show up on the graphical view as vertical lines above and below the sequence terminated with a symbol that designates what feature it is The feature appears both above and below the line because Hind II sites are palindromic there is a match on both the top and bottom strands Observe how thre are Hind I sites flanking my in a 2 0 kb chunk as described by Nellesen et al 1999 A panel containing data for the Hind II feature appears in the lower Tight hand comer of the window There you can find data about the feature as well as well as modify the appearance of the feature Use the shape combo box o Select a differently shaped symbol for the feature Use the color combo box to Shange the color of the feature Nat only can you choose between different shapes but you can also set the symbol to be a letter or word Select the Text Symbol option from the shape combo box and write anything you want in the subsequent dialog that appears like Hor something Notice how these changes to the feature symbol change the graphical display Below the shape manipulation portion ofthe feature panel there i a table o ll of the sites that match the feature I you lick a row in this table the graphical view will become centered on the site whose Tow you clicked and a red arrow will appear under the clicked site Ifyou click in the cfimost column of this table the match whose row you clicked will be hidden in
20. e Entrez Genomes page to find these names Arabidopsis thaliana Finding an Arabidopsis gene is extremely easy Just type in the At number AtCHRgnnnnnn CHR chromosome n number cxample A1S867540 You will definitely want to download the whole chromosome collection of Arahidopsis sce the Local Storage section below Homo sapiens Mus Musculus The easiest way to find a gene in the working draft version of the human or mouse genomes is to use locus link to find approved symbols for the gene of interest You will most likely find several matches to your symbol since there are so many copies of things It may be helpful to add contig homo or mus to the end of the query if there are a lot of results coming back Local storage of large sequences As genome sequence projects finish their assembly and annotation the sequence data is transformed from small scaffolds to large whok chromosome contiguous segments contigs Although this is the best representation ofthe sequence in vivo it is not necessarily the most convenient way to look specific Benes in silico Large contigs with thousands of genes that span multiple mega bases take a long time to download To evade this problem of long downloads we have added the ability to download a sequence once and access it as needed To make this process very simple we have made genome collections from prominent mature genome projects Arabidopsis thaliana Drasophila
21. e selection table After clicking their checkboxes hit the OK button Choosing flanking basepairs The next dialog to appear is used to select the upstream and downstream Nanking bases that will be downloaded The dialog consists of two slider bars the first is used to specify the number of bases upstream of the first gene selected and the second specifies the number of buss downstream of the last gen selected om the sequence The default value is set to take the maximum length of intergenic region up tothe first base of the next transcript in both directions In the current dialog hiting the OK button would result in aking 1716 bp upstream of the first ene NEN and 3346 bp downstream ofthe last gene malpha selected The minimum value that can be specified by this dialog is no sequence upstream or downstream ofthe selected genes In his exercise it doesn t really matter since we are generously grabbing a two gene radius around my go ahead and hit OK to take the complete upstream and downstream intergenic regions Interacting wich the loaded sequence After the OK button is pressed a progress dialog appears that notifies you of download progress After the download the sequence data is loaded into the main window and you can begin to navigate and use the sequence First note the anatomy of the loaded window There are four main areas that are separated by resizable dividers Figure 2
22. elect the number of base pais upstream and downstream of the genes that will be kept The default position fom the slider is to keep everything and the user must move the sliders closer tothe selected genes to trim more o Graphical Selection If box is selected within the graphical view ofthe window se tutorial chapter 1 then that region will be trimmed out ofthe big sequence Extract Non Coding Sequence Submenu that allows the user to mask protein coding sequences for a portion of the currently loaded sequence in three different ways All of these ways result in the masked sequence being presented in a text area dialog so that the user can copy the masked sequence for use elsewhere by Numbers Allows the user to enter a range of bases on the current sequence to extract non coding sequence from This option should not be confused with the by Numbers option in the Trim submenu by Gene Boundaries Allows the user to select a core set of genes that should be included in the masked sequence Once those genes are selected the user must use a slider to select the number of base pairs upstream and downstream of the selected genes The default position on the slider is to keep everything and the user must move the sliders closer to the limit the resulting sequence to a specific region o Graphical selection Ifa box is selected within the graphical view ofthe window se tutorial chapter 1 then th
23. en o update how many sequence matches were found Once all of the matching Sequences ane loaded a selection dialog appears that gives the one line description for cach sequence I you typed the same query lhungamaa into an Entrez nucleotide search using a Web browser wwe ncbi pl nih gov Enirez you would get the same sequences that appear in the sequence selection dialog The Drosophila genome has been broken into several scaffolds or sections of 250KB in size The relatively small size of these scaffolds makes these sequences useful for quick searches using GenePalete Click on the checkbox next to the description line labeled Drosophila melanogaster chromosome 3R section 92 of 18 of and hit OK Choosing genes to toad Once the sequence has been selected a loading dialog appears giving running update of how many genes associated with the sequence have been loaded When all of the genes are loaded the gene annotation data is presented in a selection table If there is an exact match between a gene name on the table and the irst wond of the query the line containing that gene will be highlighted As isthe case in our example To select a gene from this sequence click the checkbox in the leftmost column ofthe table Fortis example we will check HLHmsamma and the two neighboring genes on ether side Nf and HL Hmdelia which are before HLHmgamms on the able and also HLHmbeta and malpha which appear below HLHmgamma on the gen
24. ence by GI This way no sequence is unsearchable Parsing Genes As mentioned in the previous section once a sequence is selected the gi number for that sequence is sent in a request to the Entrez server for the text version of the GenBank record If accepted GenePalette will begin to parse the data contained in this record line by line Ifthe GenBank record is very long length in this case is proportional to the number of genes in the sequence then the loading and parsing will ake a long time see the section about local sequence acces below A GenBank record is delimited by statements initiated with keys that are indented 19 the left of the annotation data Figure 1 Each gene in a GenBank record has between one and three keyed entries for cach transcription unit contained within The simplest annotation consists entirely of CDS statements These statements tell where coding regions begin and end in a comma delimited list of exons In genomes with lile or no untranslated transcrip microbial lower eukaryote this ou need In more complex annotation schemes the record will have statements that describe the CDS and the mRNA so that untranslated regions ofa wanscrpt ae annotated The parser collects data from mRNA and CDS statements and cross references them through the gene name as specified by the locus tag or gene field Figure 1 If a sequence doesn t seem to parse well itis probably because it is no
25. equence that you want to see Genes appear in the graphical view as boxes The direction of transcription can be seen by both the direction of the arrow that comes out of the box arrow pointing right is on the top Strand arrow pointing left is on the bottom strand The colored boxes represent coding portions of exons and the white boxes represent non coding exon portions Itis important to observe that the gene Nfl has three splice variants annotated One variant is displayed on the line that represents the DNA sequence while the other two variants are placed above the first Use the scroll bars on the lef and bottom of he graphical view to lok at the thie aliernate transcripts of NIT Can you tell Eeer Notice that when you click on an exon in this view several things happen Finst the clicked exon is highlighted in red Second the exon will be highlighted in the bottom area which holds annotation data Finally the range of sequence represented by the box you clicked will be highlighted in the sequence display If you click on a white box you will select sequence that is in an untranslated region SE GC x e ay Fowe2 AGenePalete trame which ha at oe he 5 genes NP Huse Hess ara NLHmoota ang maha Tra foi main ros ot tha ams aro soparatod by resale aves you click on a colored box you will select sequence that sin the coding region of an exon If an exon is entirely coding
26. es are available because they are in the Libraries subdirectory under the main directory for the program Any library contained in this directory will be pened automatically when the program staris up Creating Libraries You can create now Libraries by going to the Libraries Menu You must choose a name for the library and a place o store the library Adding to Libraries To add a feature to a library you go to the Libraries menu and click Add Feature to Library You then get a selection dialog that asks you which library you want to add to Select one of the available libraries and hit OK The normal feature editor dialog comes up and if you fill it in and hit OK then that feature will be added to the library you selected As soon as the feature is added the library is saved to disk as with any modification to a library Deleting from Libraries To delete a feature from a library go to the Libraries menu and click Delete Feature from Library You will be asked to choose a library from which features will be deleted Once selected a table o ll features in the bray is presented and you can click as many features in the library to delete them As Soon as you click OK the selected features are deleted from the library and the library is saved to disk Features deleted from libraries are added to the sesion history in case you really did not want to delete them You can then add these features back to the library if you wish by c
27. he annotation better Ensembl is fast and sometimes it has better annotation but symbols are referenced by long gene identifiers like ENSMUSG00000015579 For the sake of completeness we wil ook at both the GenBank and Ensembl version of the sequence Zone Pt Gm Soss a sos Kres EEN wl Figure 7 Leien at NCBI is the best way to e accepted symbol fr s gene as wal as what crromosome ison Loading a mouse gene through GenBank To load the Nkx2 5 gene through GenBank click on Entrez Nucleotide Query from the Genome ools menu Type nkx2 5 contig into the Entrez query dialog We added the word contig to the end of our Entrez query because this san easy way to narrow our search to the genomie contig sequences we want to use At the time of writing the nkx2 5 Enirez search yields two records mouse genomic contig on chromosome 17 and a human genomic contig on chromosome 3 consistent with the chromosomal positions seen in Fig 7 Select The sequence that is titled Mus Musculus chromosome 17 contig by lickin The checkbox in the lefimost column of the selection table The program loads genes on this mouse sequence in exactly the same way that the fy my region was loaded However you will probably notice that the process takes longer than it did for the Drosophila sequence This happens because there are 10 times as many genes on the mouse contig than there are on the fly genomie section and so a
28. iles that are in any immediate subdirectory of the GenBank directory Altematively you can index a selected umber of subdirectories Figure 4 Once the indexing process is completed you are ready to search this local index for your gene Searching Local Collections Once indexing is over you can search your local sequences for genes by the gene symbol Figure 5 Simply goto Search Local Sequences by Gene Symbol under the Genome Tools menu The difficulty with the local search is that you must know the gene symbol used in your locally stored sequences The search is not case sensitive so you do not have to worry about proper capitalization When You type in your symbol and press OK a list of perfect matches is retumed For convenience the subdirectory from which the match was found is presented In Figure 5 you can see that a scareh for the EGFR gene yields 2 hits one in the Dm scaff directory and one in the D_melanogaster directory Once you select a sequence to explore you are brought t a gene list that is identical to the gene list that you use during an Entrez Query The symbol that you input is automatically searched along the gene list that is loaded and if present that row in the gene table is highlighted Faves Souto o sun ty paso Oe rug o you ns ot DE DEE EE ZE CHAPTER 3 Features and Feature Libraries Introduction The main intent of the GenePalette creators was to provide a way to view seq
29. ing consensus GGGWWWWCCM GGGWwDWwWCCM ap mepe gens me pecan Foc Nad ASE U SA OS Although these are related you cannot make a single feature that matches all afthe possible sequences implied by these two consensuses You could use y T but you would mis the version of the second consensus where e position is aC You could consolidate the two versions into GGGWDW 2 3 CCM bu this would result in matches that you do not think are eal Dorsal binding sites The best solution to this problem is to make an Oligo st for which each af the two Dorsal consensuses appear as separate lines in the eane pro Es eer Adding Cg and Campes Fene suene Complex Features The final typeof feature is the Complex Feature Figure 2 This feature allows the user to restrict a consensus to a subset of the matches implied in the TUPAC code This is useful if you know of certain species that you are aware will match the consensus but are not pertinent to the feature you are tying 1o highlight To make a Complex feature fillin the top ofthe dialog You must fil in the Feature Consensus text field at the tap Onee the top is filled cick on the Complex Feature tab and click the buton labeled Compile Site Matches this brings up a list ofall possible matches implied by the IUPAC code entered into the Feature Consensus text field Then you can click the check boxes next to cach species you want disallowed One example of a Co
30. k on the New Sequence submenu and select the GenBank Flat Faure Lossing agen on e een E pm See Ee eo ume one senen Sno ine samen ete Lass inne Mus musculus Gene index ist males Naa Y woe matches were found and PR ert SE Paes oie DEE Se EE Ee e De File option A dialog GenBank Flat File t lick OK The GenBank Nat file will be loaded into GiencPalene Jast asi would bave been loaded from Entrez Note that instead of having a gene name like Nkx2 5 the gene name is ENSMUSGO000000 but if you look back at the record f r Nikx2 5 you will see that the number matches Now that we have leamed how tw access the sequence by the two different ways you can complete the tutorial with either the Entrez or the Ensembl version I you are extremely enthusiastic o ahead and follow the steps using both sequences o se how they are the same Senn ofthe N25 distal enhancer A studied enhancer of the Nkx2 5 gene has been narrowed down oa between 3059 amd 2554 upstream of the transeripuion tar se ign wbich wed diay te Lar ZSV 40 UTR rage pe Figure 10 View of he Nx distal enhancer GATA binding stes are viewed within a Bub Not Dea Tae that has onnancor acti Searcy RD Vincent FB Lnn CM Tra KE 1994 A GATA dependent 25 gulay elemen stas eren ac mc Denon 198 Nov 122017 To view the bounds of this element add Nor and Dra from the restricti
31. les painful than doing it through the software First download the zip archive forthe your favorite genome from vur Website Use standard unzipping WinZip Zipi unzip etc software to decompress the archive Each archive will decompress all of the sequences into a directory with the same name as the archive eg D melanogaster Move this directory which contains all of the eg and fies into the GenBank subdirectory under the main program directory The program will search this sn Bank directory and all immediate subdirectories for stored files This way you can organize your stored genomes by organism name which is really useful Tor the local access process described at the end of the chapter How to use a stored sequence Once a sequence has either been added from a downloaded sequence collection or manually stored through the program the way that you use this Sequence is quite easy You just do the standard Entrez Nucleotide Query NLM from the GenomeTools menu Ifa sequence that you have stored appears as a query result the Source column will show that ts Local Figure 3 Select the local sequence and go through the steps for selecting a gene region as you normally would The big differences are that now loading all of the genes on the sequence is much more quick 5 10 seconds and loading the sequence ino GencPalette is faster 1 2 seconds to access the sequence Choose a Sequence ta Explore o Sore Ser ca
32. lette Loading a GenBank Sequence lt A copy af the sequence downloaded from GenBank i included in the Sequences directory under the main GenePalete directory If you are not connected to the intemet when doing the tutorial you can go to the File menu and select Open Sequence Select the file named tutorial seq For a tutorial about constructing efective Entrez queries look at the Entrez Help section featured at www ncbi nlm nih gov Entrez Chapter 2 will provide some helpful hints about finding genome sequence for genes in specifie organisms To load a sequence from GenBank you first must go to the Genome Tools menu and select the menu item Entrez Nueleatide Query NLM Clicking this menu item brings up a dialog that asks for an Enirez Query The text that is entered into he dialog will be sent directly as a nucleotide search to the National Library of of Medicine s Entrez server To load a gene in any sequenced genome one must know how that gene is referred to in genome annotations The gene that will be loaded for this exercise is called Enhancer of Split spl my However in the annotated fly genome it appears as HLHmgamma For the Drosophila genome one can consult FlyBase Geet for the official gene symbol and in most cases that symbol will be used in GenBank as well To access the genomic region surrounding Espl my type hihungamma into the Entrez Query dialog box and hit OK A loading dialog appears on sere
33. licking on the Libraries menu and selecting Add to Library from History Feature History All features added to a sequence or modified in a sequence are added toa list of Features in the session history Additionally features deleted from libraries are also added to the sesion history for safety You can add features from history to either the current sequence Feature gt Add Feature From History or to a library Libraries gt Add to Library from History Another feature of the history is that you can use the history to copy features from one library to another Adda whole bunch of features from a source library to any kind of sequence These features are then added to the session history Then you can add to a different library from the session history and have basically copied features between libraries Finally you can add any feature that is part of a saved sequence to the Feature History Feature gt Include all Current Features in History This way if you added features and saved a sequence these features can be added to the history and then put in a Feature Library or added to other sequences via the Feature History CHAPTER 4 Index of Menu Items Introduction This chapter gives a 1 2 sentence description of every available menu item in GenePalete Connon aero as net wt rans ME ie ese Le Took Winds The File Menu New Sequence Sequence Only Displays a dialog where you can create a new se
34. lot ore data must be downloaded by GenePalete to completely describe the contig Once the genes are loaded the gene selection table automatically highlights the row containing the Nkx2 S gene because it was the first word of the Entrez Query we used Click the checkbox for Nkx2 S and one gene upstream and downstream Select the maximum range for upstream and downstream sequences ust click OK in the range selection dialog The nucleotide sequence ofthe Selected region ls then downloaded and a new GenePalete window is generated Tn the next section we will go through the same process using the Ensembl database Loading a gene through Ensembl Ensembl itp Awww ensembLong is a popular annotation database of particular utility o mammalian genome users In many instances the annotation by Ensembl is ahead ofthe annotation used at GenBank When designing the external compatibilities for GenePalette we placed highest priority on a system that would support as many genomes as possible For most purposes GenBank seemed to be the very best option one can query by gene name symbol GenBank offers access toall publie genomes including bacterial viral and yeast and it is all provided through a single location Enirez Nucleotides with a single format the GenBank Flat File Neither Ensembl nor the Distributed Annotation System DAS seemed to meet these pivotal criteria However a helpful GenePalete user mentioned that Ensem
35. melanogaster and Caenorhabditis elegans As other genomes come out as unmanageable contigs we will attempt to make new collections available on our website Steps to storing a sequence 1f your genome is not present on the GenePalete website as a collection to download you will have to manually download sequences to store them Please Iet us know that we overlooked your favorite organism so that we can puta genome collection onthe web The interface for storing a sequence locally is very similar o that forthe usual exploration ofa sequence You go to the Genome Tools menu and select the option for Save GenBank Records and Sequences to Disk You will get a dialog that asks for an Entrez query This will work in a way that is identical to the previous Entrez query dialog Next you will get a result dialog that is almost identical to the result dialog for conventional sequence loading The only difference between these two is that in this dialog you can select multiple Sequences for download Note that the second column of the table in this dialog Source indicates whether the sequence is Remote or Local If the sequence is Local selecting this box will overwrite the previously saved sequence You can select one sequence or many sequences to be loaded If you selected multiple Sequences they will be loaded one by one If applicable the program asks if you would like to overwrite a previously saved sequence First the GenBank annot
36. mplex feature i the consensus binding site for Suppressor of Hairless We know of 5 octamers that this anscrption factor binds to with high affinity Coon CGTGTGAA TGTGGGAA TGTGAGAA You could put these all together with the simple consensus YGTGDGAA but that would allow the species TGTGTGAA which is not a high affinity binding site In this situation the Complex feature shines as an easy way to define a consensus that hasa complex definition Feature Libraries Now that we know about the various types of sequence features supported by GenePalet it is time to lear about how you can store them and reuse them Feature libraries hold collections of features that can be added to a sequence All open windows operate on the same set of libraries Ifyou add to a library in one GenePalete window the library in all windows is changed All manipulations of feature libraries are conducted through the Libraries men Included Libraries The GenePalete software comes equipped with 2 libraries One isthe restriction library REE ib This library contains 208 consensuses for commonly used restriction enzymes listed in alphabetical order The second included library is a small library that is used with the tutorials in chapter one The Libraries Directory You may have already noticed that when you start up GenePalete there are ready 2 libraries that are available to use see Included Libraries above These librari
37. n Coding Sequence and Extract Coding Sequence use an identical interface to select a sub region of the genomic sequence fo extraction Once a region is selected the type of sequence that is desired will remain as A s C s T s and G s while the undesired sequence will be tumed into N s This type of manipulation could prove to be extremely useful for applications such as MEME Searches hp meme sdscedwmeme website for non coding regulatory motifs oF for highlighting the exon structure for coding sequence The three ways to extract coding non coding sequence will be demonstrated Note that untranslated regions 5 UTR and 3 UTR of transcripts are considered non coding and will not be masked by N s in a non coding extraction but will be masked ina coding Extracting by Numbers This option displays a dialog that asks for a range to select Type in a range a starting base pair and an ending bp and sce what happens If you choose a Tange that overlaps an exon you can really see how these features work Try to extract the range of 1 to 300 as both non coding and coding This range overlaps the first 4 exons of Nf IF you followed the tutorial instructions for Entrez queries or used the tutorial sequence provided with GencPalette Extracting by Gene Boundaries Selecting this option results in the display of all transcripts on the sequence much like the dialog displayed for selecting genes from an Entrez query Choose the ge
38. nes that you want to include in the masked sequence Note that if you choose several genes the outermost genes are selected as the minimum possible range that You can select Other genes not selected but which occur in the selected region ill be masked anyways ARer genes have been selected you can then choose the mge upstream and downstream of the selected genes to include in the masked Sequence It is important to know that the maximum ranges are not the ranges tothe next gene on the sequence but will instead include the whole sequence On the other hand the minimum range is the shortest distance that will include selected genes Extracting by Graphical Selection To use this option choose the region you want to include by dragging a selection box around it inthe graphical view Then go to the menu and choose by Graphical Selection from one of the extraction sub menu The sequence represented by the box you selected will be masked and output into the sequence output window Adding and modifying transcripts Ofall the genes currently loaded into the graphical view only ma has not had its untranslated region annotated If we want to completly understand the organization of a transcription unit we should make sur that the unit is completely annotated including 5 and 3 ends if possible Fortunately for us someone has already describe the met transcription unt If you go to the GenBank record AJ011140 in your web bro
39. o a fly gene is to use either its gene symbol iit has one or the CG number The CG number is a gene symbol in the form of CGnanan here mis a number and CG stands for Celera Gene Even ifa gene has a symbol it wil still be associated with a CG number which you can get from Gad ly FlyBase the gene has a really non speifc gene name you can narrow the search by adding contig section or drosophila afier the gene name Anopheles gambiae Currently the best way to access an Anopheles gene is to go to the mosquito search page at Ensembl hp unn ent org type in your gene symbol umber into the search field and look for hits nthe or family index This will take you to genes in Ensembl that have the ENSANGH that is used in GenBank Hopefully the web resources for Anopheles Gambiae will improve soon Itis recommended to download the Anopheles genome collection from our website Caenorhabditis elegans The worm sequence is curently available as whole chromosome sequences you have the even where w is a character and n is a number symbol you are pretty much golden This symbol alone gives you a limited number of query results to choose from To access an unnamed gene you have to find its map elementocus name old C36B7 1 new XHT321 A good cross reference for genting to this name is on the Wormcienes page of the NCBI Acembly website fins www nchi nim nih an ER Rear Acemby index htmi worm You can also use th
40. o associate with the feature optional and text that you would like to symbolize the feature with in the graphical view optional You can click the mismatch checkbox to allow a single mismatch between the sequence and the consensus that you entered For convenience the bottom panel contains a summary ofthe IUPAC code edn Features This type of feature ean be used to find any sequence that matches a list of oligonucleotides Figure 2 Just simply type all of the oligos or simple consensuses you want matched into the text area under the Oligo List tab You do mot cdo lin the Feature Consensus text ld a the top of he dialog st yon are ading an OligoLi Each oligo consensus should have ts own line in the text area separate terms with carriage returns An OligoList is useful in a multitude of eicht When you have a priner par for whieh you want to have the same Symbol this isa perfect hol Another use would be if you have random binding She selection RSS data and would Ice wo search for matching oligos rather han king a core consensus you can put all of the selected oligos into an Oligo This adds a different ype o specificity to a binding search The third application for this type of feature le when a feature has multiple simple definitions that cannot e grouped into one simple consensus For example in Markstein cal 2002 the binding sie far the Drosophila vanseripton Tar Dorsal was searched for in the fiy genome with the follow
41. on library sou sequence via Add Features from Library under the Libraries menu About 2 5 kb upstream of Nkx2 S you will see a Dra 7 site that is the beginning of his distal enhancer Figure 10 In Searcy et al several matches to the GATA consensus simply GATA were documented Y be within this enhancer To see where GATA binding sites are add a GATA feature to the sequence click Add Features from the Features menu In he Feature dialog type GATA in both the Feature name field and the feature consensus field To make tings casier to see You can change the symbol for GATA to be the letter g go to Text Symbol in the shaperselection Combo Box ofthe Feature Panel You can soe that there are several GATA matches clustered in the Nat 1 Dra distal enhancer Figure 10 tt SASS AAA ee chase manson O gon e Ten ias lees race a pn Designing the oligos To design oligonucleotides for EMSA analysis we usually choose sequences that are around 20 30 nucleotides in length centered on the binding site to test For this example it will be cases to design oligos that are 24 nucleotides you want 10 nucleotides upstream and 10 nucleotides downstream of the GATA core In GenePalette one can select oligos in 3 simple steps Figures 11 12 1 Click on the site in the Graphical View a Markup View is generated 2 Click on the first base of the site in the Markup View That base is Selected in the Se
42. or by typing in some other identifier For example if you know the CG number far a Drosophila gene that has a well known symbol eg num CG3779 you will be able to get othe Drosophila contig for that gene with far fewer Entrez hits than the actual symbol numb gives 246 hits ott gives 5 However it is extremely useful to access locally stored genome collections without ever having to use an internet connection What if you Pare nr gees or cal um Th et tou srt sacha ene ng etch a umn oor GAN are using your laptop on a plane What if you want to browse gene models while lounging on the beach What if there was a huge power outage on the cast coust but you still wanted to know how many API binding sites are in your promoter Well that is what our local access functions were built for Indexing genes In order to search for genes contained within your local genome collections you must first have indexed the sequences contained within your GenBank directory Figure 4 The indexing process is reset every time that you start a new enePalete session and is ini s Search Local Sequences by Gene Symbol under the GenomeTools menu You can also ask GencPalet to index the genes if you have changed the contents of your GenBank directory Recatalog Local Genes under the Genome Tools menu When indexing you have two choices You can index everything you have all iles in your GenBank directory and any f
43. pboard Copy alt Copies all ofthe sequence in the text area ofthe sequence display into the clipboard The Sequence Menu Rename Sequence Changes the name ofthe sequence in both the sequence display data area and in the titie of the window and inthe Window Menu Reverse Complement Sequence Flip the sequence and transcripts and features around so that the reverse complement strand is the strand displayed in the sequence display This new Sequence appears in a new GenePalete window leaving the old sequence untouched Extract Transcript cDNA Sequence Splics all exons together into a cDNA sequence which is dis that allows you to copy and paste the sequence layed ina dialog Trim Sequence Submenu that gives you three ways to trim the sequence down to a smaller size All three ways result in a new GenePalette window that contains the shortened Sequence while the old sequence remains in its current window unaltered by Numbers Allows you to trim by specifying how much sequence to remove from the front and end of the sequence Entering 5000 into the Front Cut text Beld will remove 5000 bp from the front of the sequence Entering 5000 into the Rear Cut field will result in 5000 bp removed from the end ofthe sequence by Gene Boundaries Allows the user to select a core set of genes that should be maintained in the trimmed sequence Once those genes are selected the user must use a slider to s
44. quence by typing ina sequen name and pasting the nucleotides ino a text area In the sequence text area al non muclotide characters are filtered out New Sequence GenBank Flat File Displays a dialog where you can create a new sequence by typing in a sequence name and pasting a GenBank Flat File into text area GenBank Flat Files copied from the Ensembl Exportview will be parsed using this menu item see Tutorial 2 Chapter 1 Open Sequence Lets you open a sequence that was previously saved in GenePalete Save Sequence Save changes toa sequence that has been saved before or save a previously unsaved file to disk Save Sequence As Specify and save sequence under a new file name Export Graphical View gt Export GIF Exports an image of the Graphical Display in GIF format Esport Graphical View gt Esport PostScript Export an image of the Graphical Display in Postscript format Esport Markup View gt Export GIF Exports an image of the Markup View in GIF format Export Markup View gt Export PostScript Exports an image of the graphical display in Postscript format About GenePalette Displays the About GenePalette window The hyperlinks are real ER Closes the program All unsaved files will be verified with the user before closing a NO sa Hanoi de e E bag Gan n The Edit Menu Cony Copies sequence currently selected in the text area ofthe sequence display into the cli
45. quenee Display 3 Decide how many bases upstream you want to include 10 bp and select that many bases upstream of the first base inthe Sequence Display The length of your sequence selection is displayed in the Sequence Display and a graphical representation of your selection is boxed in the Graphical View 4 Now that you know the starting point of your oligo select sequence downstream of the oligo to the length that you have decided 24 bp to make your oligos The length of your selection is displayed in the Sequence Display and your selection is boxed in the Graphical Display 5 Copy your selection and paste it into a text fileidocument You can reverse complement your sequence using strider or any number of convenient webtools Although there are a lot af steps involved it p also posible to reverse complement oligos in GenePalette To reverse complement your oligo in GencPalette go to the File menu select the New Sequence submenu and from there select Sequence Only Paste your sequence into the text area labeled Sequence A new window with your primer sequence in ts brought up In this window you can go to the Sequence menu and select Reverse Complement Sequence A new window appears with the sequence reverse complemented You can then copy your reverse complemented sequence into your text file word document Designed Oligos Santor E hnn E zone R BRES p s s Y 7 nla Rewer fe so e
46. ram directory the library will be automatically loaded when he program staris up Close Feature Library Closes a selected feature library The library will not be avai using nt itis opened again bl for modifying or Save Feature Library As Al library changes are saved automatically to the original file that the library was opened with However this option lets you save the library as a new file ina new location bie kar jog Fare e Genes watan DE D The Transcript Menu Add Transcript Adds a new transcript to the current sequence Brings up a blank transeript editor dialog Modify Transcript Make changes to a transcript that exists in the sequence After the target transeript is selected the data associated with the transcript is loaded into a transeript editor dialog Copy an Existing Transcript Makes a copy of transcript thats associated with the current sequence This new transcript has the word Copy appended to the old transeript name Delete Transcript Allows the user to select multiple transcripts to delete The GenomeTools Menu Entrez Nucleotide Query NLM Starts cascade of dialogs that allows the user to search GenBank via the National Library of Medicine s Entrez Query Search Local Genes by Gene Symbol Allows user to search all of the sequences stored in the GenBank directory and immediate subdirectories therein for genes by gene symbol If sequence
47. rary of Medicine NLM National Center for Biotechnology Information NCBI hp aen eh aah gov Entrez as its portal to the world of genomie sequence In this chapter the methodology of GenePaletie s GenBank parser is discussed as well as tips for the most efficent use of GenBank via GencPalette Getting to know the GenBank Parser To have a thorough understanding of how to most effectively use GencPalette a modest comprehension of its GenBank interaction cascade is needed Creating an Enirez query The most common starting point far access to genomic data through GencPalette will be in the form ofan Entrez query just as one would perform through the web interface at ht www SO nuh gov Entre see Figure 1 Chapter 1 f one is familiar withthe use of genomic GenBank records of their favorite organism then this step will come naturally There are some genome speci tips later on in this chapter The query that is entered is sent to the Entrez server and results are received by GenePalete and parsed into a table for user Selection Parsing Entrez query results The Entrez results page that is parsed is identical to the HTML that underlies the web version of Entrez If only one sequence results the program skips the sep of asking the user to select a sequence For the sake of simplicity only the first 300 query results are returned and displayed This was done to simplify the programming of the parser as well as shorten leng
48. ross the loaded sequence and matches aro displayed in the graphical vew and ata tables HHHH ji sterol it apiet per commen rc Dar 2108 Using the restriction library that is packaged with GenePalete and a special library created for this tutorial we can visualize the creation of this enhancer and casily understand its makeup About Libraries Because the addition of features is such a routine operation in GencPalette Wwe have created a system of Feature Libraries to store profiles far commonly added features GenePalette comes with a restriction library of 208 enzymes Additionally we have included a tutorial library for the purposes of this exercise We have not included a library of commonly used transcription factor binding sites because there are so many ways to interpret binding data such that we fec it is up to the user to compile libraries of sites that they believe in When the GenePalete application is started all library files contained in the Libraries directory under the main application directory are loaded into the library management system User libraries that are not contained in this directory can be Toaded manuz Adding Library Features To add a feature ftom a library to a sequence go to the Libraries menu and click Add Feature from Library Find the Hind I feature in the Restriction Library and click the checkbox in ihe Teftmost column of the table Then click OK Interacting with ad
49. s have not been indexed this tem will uike the user through the indexing process Recatalog Local Genes Performs the indexing of local genes This process does not remove previous indexes You can catalog one genome at one point and then add another genome ata later point without losing the first genome Load a Sequence by Git Prompts the user for a GH Sec chapter 2 so that genes associated withthe sequence identified by this unique number can be selected for download into See Reselect Genes from Previously Loaded Annotation Allows the user to reselect genes from a parsed GenBank recond that was loaded by either of the above two ways A session history of parsed GenBank records is kept Due to memory limitations a parsed GenBank record is not added to this session history if the sequence was loaded from a local source Reselect Sequence from Last Entrez Query This accesses the list of sequences from the lat Entrez query so that you can go back and select a different sequence This is especially useful if you want o download both the mouse and human version ora gene Save GenBank Records and Sequences to Disk Search GenBank for records that you would like ro download and save on disk so ha future access to both the parsed annotation data and sequence i rapidly cet rn aen C Bin Ee L EE EE The Window Menu This menu contains an item for cach window currently open in the GenePalette session Jus
50. s tutorial the user will leam how to access mammalian genome sequences through both GenBank and Ensembl Also the user will car how to precisely use the integrated interface to traverse from graphical representation to Primary sequence A copy af the sequence downloaded from GenBank is included in the Sequences directory under the main GenePalete directory If you are not connected to the intemet when doing the tutorial you can go to the File menu and select Open Sequence Select the file named tutoral2 scq Detailed information on accessing mammalian genomes will be covered in Chapter 2 Loading a Mammalian Gene Symbols and Data Sources When using GenePalte it is very important to know the acknowledged symbol ofthe gene of interest For mammalian genomes the best way to find a gene symbol as it will appear in GenBank ito search for your pene in LocusLink at NCBI napa bil ib gov LocusLink Figure 7 In our case we will be looking at the gene ZZ which is the same symbol that is used in GenBank records There are two options for lading in sequence and annotations In the first tutorial we leamed how o load sequence in from GenBank The altemate route is to load your sequence from an extemal source that can output a GenBank Flat File such as Ensembl htp www ensembl or As you get used to GenePalette you ill find tht each option has 11s strengths and weaknesses GenBank can be slow to acces but you might find that you like t
51. t annotated in a standard way Please letus know ifan important organism s sequences are not parsable by GenePalctte Tass whem aaa naal ing Fie Foxe 1 Gene Anotstosin a GanBan eco On te taps wra uses spy rn ute o a ene mnt Nite gena aata ars ror fered by Laat ag el Selecting a portion of sequence Once annotation i loaded and compiled one of two things happens there were no gene annotations on the sequence unordered working draft sequence or whatever then the user is prompted to enter a range that they would like to download In the more common situation the user is presented with a list of genes that were parsed from the GenBank record see Figure 1 Chapter 1 for a picture Ifthe first word of your Entrez query matches a gene on the table that row will be highlighted You can click on the genes that you are interested in ooking at If you selecta gene that overlaps another gene all overlapping genes will then be selected Figure Choosing upstream and downstream ranges Once genes from the genomic sequence have been selected the user must decide how much upstream and downstream sequence to download before the parser can move on This decision is made with a dialog that has two slider bars one upstream and one downstream Figure 2 In between the two sliders is text that symbolizes the genes selected from the previous dialog The upstream slider will go anywhere from zero bases upstre
52. t click on the line for the window that you want and it will be brought to the front
53. th of time spent downloading Query hits Ifa user is constantly performing quenes with gt 500 hits then there are definitely some things that can be done to increase search specificity See below for the tips on creating Entrez queries There are 3 pieces of data associated with cach query result The first piece of information displayed in the sequence Selection dialog is whether the sequence is available remotely or locally sce the section below about local storage of sequence The second is a description line hat tells the user what is contained within the sequence The third is a unique identifier called a gi number A gi number is associated with only one sequence and when that sequence record changes in any way the gi number also changes This unique identifier will b used in the next step of the program to request the GenBank record for the selected sequence Tis important to be aware thar GencPalette is successful at parsing 9 9 or at least an overwhelming majority of query result returned However every once in a while there are sequences not visible n the results table and therefore can t be selected through an Entrez query This is because they did not parse properly The problem is usually due to a non standard nomenclature or record Signature in the results page At the moment we have sidestepped this problem by adding a menuitem for lading a sequence directly by gi number GenomeTools ment Load a Sequ
54. the graphical view Experiment with clicking rows and checkboxes of the feature tahle for Hind I one Se Ss e En paste Br Sin Banag ake Interacting with Features inthe Graphical View Another way to acces and visualize features is through the graphical view When you click on a feature present in the sequence three things happen in the main window 1 A red arrow is displayed under the feature in the graphical view 2 S0bp upstream and downstream of the site is loaded into the markup view and 5 the row or that site is highlighted in the data table Figure 3 The red arrow serves as a place marker to remind you what region is presente in the markup View The selection of the feature table row serves as a convenient way o see what the sequence of the match is and to quickly hide unwanted matches to the feature consensus The markup view allows you to see the consensus math in the context of surrounding sequence Click on a Hind I site in the Graphical View to nans a Markup view af the site Using the Markup View Clicking on a feature in the Graphical View causes the sequence flanking ibat feature to be loaded into the Markup View This view provides a convenient way to examine features at the nucleotide sequence level Notice that each feature occurring in the view appears as a box around the matched sequence The feature name and position of the match start are displayed as a abel to the box Click on he label to
55. the resulting gene table for that first word Terms can be added to a query separated by spaces For example compare these two queries EHE drosophila The first query yields 500 sequences while the second gives only 30 Sequence results Terms stringed together with spaces are treated as if the AND Keyword was used to separate them This means that results must match both terms egf and drosophila You can also require that words be seen next to cach other by placing a muli word term into quotes Another rick of the trade is to use the Taal tag to specify organism Cacnorhabitisforgn chromosome An extremely useful trick is to designate a range of sequence lengths that you want to Search To see all Arabidopsis chromosomes here is a query that gives you only 7 results Arabidopsis thaliana ORGN AND 1009000 5000000 SLEN Luckily most queries by gene name give only a few results so you don t have to ype in alot of stuff o get to your gene easily Genome speciic tips for Entrez queries Each organism has its own gene nomenclature and resultantly has its own est way to get to the sequence through an Entrez query Here some tips are listed fora few prominent organisms Links to organism specific resources are found on the GenePalette website un ncpalete org and also the Entrez Genomes page up www acbinlm nih gov 80 entrez query fegi db Genome Drosophila melanogaster he easiest way to get t
56. to keep images created by GencPalerte Open the image you exported in a graphical program to make sure it worked To export the markup view of the Xu I Kpn I enhancer aeae the first ihing to do is to drag ut a selection box surrounding the Xha Land Kpn I sites This will create a markup view of the boxed region Onee the desired section ts marked up go othe File menu and click the Export Markup View submenu select from Export GIF or Export Postscript and choose a fie to export Figure shows what both GIF images look like This concludes the first tutorial The next tora will highlight other main features of GenePalette using the first sequence loaded Nfl ma 1000 sp Hindle Kania Abate Sub s PNE BON E Ed 1364 w 1 141 a 1 v pern nen e 102D NC ra Sooo Po EST EELS NN jon 1010519 Hon 101956 raat ire pota vem Gere hour a apnea vea teen Srna op ee epoca gee Tutorial 2 Creating gel shitt oligos for a mammalian enhancer One typical operation when investigating the regulation ofan enhancer is to create oligonucleotides that span predicted binding sites for use inan electrophoretic mobility shift assay The high level of interconneetivity between the interface components of GenePalette makes it a natural for this task In this tutorial example we will use GencPalette to access a well studied enhancer of the mammalian Nix2 5 gene and create oligos to test some GATA binding sites During thi
57. uence elements relative to cach other and to gene annotations To do this one must have a way to locate and view elements of interest in a sequence We have sic set of tools for marking up a sequence through consensus tions called Features Obviously when performing routine analysis it would be most efficient to have a way of re using features instead of typing them in from memory Resultantiy we created a set of tols for managing Feature Libraries This chapter will explain the features about Features Feature Basies lt Feature is defined as any sequence clement that can be described by nucleotide sequence identity This includes transcription factor binding sites restriction enzyme sites SNPs primers microsatellites RNA regulatory motifs promoter elements ete The first thing to know is the basie nomenclature used o define a feature consensus IUPAC Code Ofcourse you can always define a sequence using A s T s Cs and G s but GencPalette also recognizes the single letter code set by the Intemational Union of Pare and Applied Chemistry for matching multiple bases to a single leter This code below appears in the dialogs for adding features and also appears as a tool tip in fields that require IUPAC code Specifying a variable number ofthe same base Im addition to the IUPAC code GenePalette also al for a variable repeat of the same base Lets say that you have a transcription factor that binds to two cone sequen
58. ver more than Skb a Markup View is not generated and instead a button is presented that allows you to sec the Markup View The reason for suppressing this view in large sequences is hat it can take along time to generate the view and there is a cost in speed to maintain such a large Markup View Adding a New Feature to the Sequence Now that we have added Hind II sites to the sequence we can see how a 2 1 Kb Hind II fragment could encompass the my transcription unit The next step that was done was o take a 1 2 kb c11 36 Il Hind H fragment from the 2 1 kb Hind I fragment Unfortunately our GenePalete restriction library docs not include this site so we are going to have to lok it up and add it directly to the sequence Goto he Feature menu and select Add Feature Type Ecl136 1 into the field labeled Feature Name and type GAGCTC into the field labeled Feature Consensus If you are already attached to the ide of using text symbols for features in the Eraphical view you can type some text into the Symbol Text field CET or Something The notes field is optional so you don t have to worry about putting anything ino it Click OK inthe dialog and you can se that there is an EH Site right between the wo Hind II sites If you drag a box between the upstream Hind UI and 1361 sites you can see that this distance is 1240 bp by looking at the Selected bp daa field in the tp portion af the sequence display I you really want to
59. wser you will see that the 3 and 5 UTR of elle have been identified alaner pom Rainer porn a ran boina mosery eu i Ife lok re we an see be rina tas goes ba 67 r0 456 and dit ne CDS poes For A37 1a 1S3 1 we do eane pack math ean see tat the SUTR abie 700p AO TOP he te UR ma is 183 bp 1436 1253 At the moment the coding range for ma is set to the same range as the first exon All we have to do to is change the size os first exon so that it starts 70bp further upstream and 183 bases further downstream Go to the Transcript menu and select Modify Transeript You will have to select a transit to modify via the transcript selection dialog Select ma and LOK Now you are in the transcript modification dialog Figure 13 This dialog is used both for adding new transcripts and modifying existing transcripts The Name field holds the transcript name as it will appear on both the tab for that transcript as well as in the graphical view The orientation selector is used to indicate the direction of transcription The fields labeled Coding Range are used to indicate the start and stop of protein coding in the sequence you can enter zeros into the two text fields for this transcript to indicate that the transcript does not code for protein The Notes field is an optional field The Color chooser can be used to change the transcript color much as you would use the same chooser in the ab for the transcrip

GenePalette 1.1 Manual (in PDF format)

Contents

Download Pdf Manuals

Related Search

Related Contents