Home

Blixem User Manual

1. 34 ign PAE ILI MX 34 SHOW dU WII 34 Displ y eet 34 Show Unalened Sequence sa eie re etae tee eae Ene eee eae eden eive Pao a eed tue tue e een 34 SHOW SPLICE SUES P P M dei ded 34 Highlight LIESERER dd edu 35 SQUASH E 35 ET EE 35 TROL ee ee EE 35 Feleh EIERE EENS 35 Ve enk EE 35 Load optional data ii i rre TEEN EEN 35 Column VISIOI Yeo eiea aa E E AA aa n Laia ANTEE A ATE EAT 35 Vater 35 KEE 35 MAK GOLD EE 36 ern EE E 36 Coverate VIEW prOp 6ttles cccccescasivcesessccrvecavessventensncesvaencstceuecseveedenedeeccusvendh devectevesgeeveneeseusacevecdecteesvedees 36 IPTE 36 EIDEN E EI EE E 36 Use Print EENS AER 36 IEEE EE 36 E Fe Keyboard shortGcuts iessni sakura ERR KR RR BS AU RR ERR RC RR RE Ra G Introduction This manual explains how to configure run and use Blixem Blixem is an interactive browser of pairwise matches displayed as multiple alignments It is not strictly a multiple alignment tool rather a one to many alignment It is used to check the alignments of nucleotide and amino acid sequences against a reference sequence Blixem is maintained by the Wellcome Trust Sanger Institute and is available as part of the SeqTools package The software can be downloaded from the Sanger Institute s website http www sanger ac uk An aside about
2. Toggle which strand is active by pressing the Toggle button Z on the toolbar or e pressing the t key By default Blixem assumes that the reference sequence passed to it is the forward 15 strand unless otherwise specified by the reverse strand command line argument Big Picture The Big Picture section shows an overview of the reference sequence The reference sequence coordinates are shown along the top You can zoom in to view a shorter range by using the Zoom in button at the top left of the screen Use Zoom out or Whole to zoom out Whole zooms out to view the full length of the reference sequence The big picture consists of two grids showing the alignments for each strand and two sections between these grids showing the transcripts for each strand The grids have a scale on the left hand side showing the percent ID and alignments are plotted against this scale The scale and extents of the grids can both be edited see the Grid properties section in the Settings dialog The active strand alignments and transcripts are shown at the top and the other strand at the bottom The direction of the coordinates is determined by the active strand The active strand can be toggled using the t shortcut key or the Toggle strand button on the toolbar Zoom in Zoom out Whole P 40000 50000 60000 70000 100 f i H H Figure
3. 3 The Big Picture section Red shaded areas in the big picture indicate assembly gaps gaps in the reference sequence Assembly gaps are represented by dashes in the FASTA input file Bumping the transcript view By default exons and introns for the same strand are drawn overlapping each other They can be expanded or bumped by pressing the b shortcut key or by enabling the relevant option in the View dialog see Hiding sections of the window 16 Zoom in Zoom out Whole 40000 50000 60000 70000 100 T i 4 i Figure 4 Expanded transcript view Detail View The Detail View shows the actual sequence data for the match sequences Match sequences are lined up underneath the relevant section of reference sequence and individual bases are highlighted in different colours to indicate how well they match Match colours Reference sequence Cyan Grey Violet Dot Yellow bar exact match mismatch conserved match deletion insertion Figure 5 Alignment colour key Alignment lists There are separate lists of alignments for each strand and reading frame of the reference sequence Each list has a yellow header bar containing the reference sequence At the left the yellow bar shows the reference sequence name and which strand frame it is e g 1 means forward strand reading frame 1 2 means reverse strand reading frame 2 17 Reference Strand First coord in
4. Last coord in sequence and frame displayed Sequence displayed name range Em range Scop eldStat Sequence A NN End AW089533 1 227 84 66 V Match Alignment Start position of End position of sequence score and alignment on alignment on name ID match sequence match sequence Figure 6 Alignment list details Nucleotide mode There are two sections to the detail view in nucleotide mode one for each strand The active strand is shown at the top and defines the coordinate direction increasing if the forward strand is active decreasing if the reverse is active d BX563083 1 179 Active strand gusestres73eas1e 1 1 1 alignments E 3 BN720674 1 568 str 689205 1 239 99 356 ther strand Were 267 95 317 alignments BU159504 1 472 95 520 GENSCAN lt gt 0046298 1 si ri Figure 7 Alignment lists nucleotide mode Protein mode There are three sections in the detail view in protein mode one for each of the three reading frames for the active strand Only the active strand is shown To view the other strand toggle the display using the Toggle strand button or the t shortcut key In protein mode the yellow header bars show the translated reference sequence for that reading frame STOP and MET codons in the reference sequence are highlighted in red and green There is also an additional header section at the top showing the nucleotide sequence 18 Main header shows the nucl
5. a single base rather than an entire codon holding Shift while using the left and right arrow keys You can move the selection to the start end of the previous next match by holding Ctrl while using the left and right arrow keys limited to just the selected sequences if any are selected includes all sequences otherwise Finding sequences The Find dialog allows the user to search for sequences by name Press the Find al button on the toolbar or hit the Ctrl F shortcut key to open the Find dialog eA IX Blixem Find sequences Text search wildcards and DNA search NENNEN NNNNNSDDDSSSSSENN List search Search column Name 4 Back S soars 3 Close ok Figure 18 Find dialog There are three search modes Text search Search for match sequences by name or another column from the Search column drop down box The wild card means any number or zero of any character and means 1 character which can be any character Any sequences whose relevant column data matches the search string will be selected and the display will scroll to the start of the selection List search the same as text search but you can enter multiple search strings 27 by placing them on separate lines in the text box DNA search This searches for a given sub sequence of nucleotides in the reference sequence If the sub sequence is found the display will scroll to the start of the sub se
6. can override Blixem defaults by specifying a data type for specific features Data types can be specified by a source mapping using the source data types stanza or by using the custom dataType tag in the GFF input file Possible key value pairs are the same as for the Program defaults dna match link features by name tru bulk fetch pfetch socket embl pfetch socket fasta user fetch pfetch http embl pfetch http fasta internal protein match link features by name tru bulk fetch pfetch socket fasta user fetch pfetch socket embl pfetch socket fasta internal ensembl variation user fetch variation fetch Source mapping This stanza allows you to map a source to a particular data type The keys should be valid sources that appear in the GFF file and the values must be stanzas specified in the data type stanzas source data types EST Human dna match 11 EST Mouse dna match EST Pig dna match EST Other dna match SwissProt protein match TrEMBL protein match ensembl variation ensembl variation Sources These stanzas allow you to set additional information on a per source basis Currently the only supported keyword is i1e This specifies a file name which can be included in fetch method arguments using the substitution variable Tier2 HepG2 cytosol longPolyA rep2 file wgEncodeCshlLongRnaSegHepg2CytosolPapAlnRep2 bam User settings The following s
7. chr4 04 210623 364887 EST_Human nucleotide_match T9195 79323 121 000000 Target AI095103 1 326 454 percentID 96 9 FASTA chr4 04 44144 154265 tcttgtttctgtaggagaggccatctccatcagctataaccaaaaaaaaa acaaaaaactcctctttttgacaagtttgtaaagcctgtccatctgggtc tataataatcctccaggccctatgccactcctctttattcagccagttca Configuration file Blixem supports ini style configuration files which are used to specify user options and to tell Blixem how to handle particular types of data Blixem can accept config files by one or both of the following methods A default config file called blixemrc located in the user s home directory A file passed on the command line using the c argument The contents of this file will take priority if there are any clashes with the default file The default config file is generally used for display settings that are set from the Settings dialog Blixem saves display settings to this file on exit so it will be created the first time Blixem exits if it does not already exist You can also edit this file by hand or add system settings to it such as the fetch methods if you wish The command line method is useful when Blixem is called as part of a pipeline because it allows the calling program to set specific config options commonly the data handling properties Program defaults Defaults for the program can be specified in the blixem stanza The properties that can be set are described below blixem link features
8. colour for all groups is orange so you may wish to change this if you want different groups to be highlighted in different colours Order When sorting by Group alignments in a group with a lower order number will appear before those with a higher order number or vice versa if sort order is inverted Alignments in a group will appear before alignments that are not in a group You can also hide all sequences that are not part of a group by ticking the Hide all sequences not in a group option This is a quick way of filtering sequences to show only those that you are interested in any sequences that are not part of a group will be hidden Note that any sequences in a hidden group will also still be hidden To delete a group click one of the following buttons This will have an immediate effect i e you don t have to click Apply To delete a single group click on the Delete button next to the group you wish to delete To delete all groups click on the Delete all groups button Running dotter To start Dotter from within Blixem or to edit the parameters for running 32 Dotter right click and select Dotter or use the Ctrl D keyboard shortcut The Dotter dialog will pop up X Blixem Dotter sequence Q15928 1 _ Call on self C HSPs only Full range gt Big picture range gt oy Execute Da f Figure 23 Dotter dialog Select the sequence you wish to run Dotter on before o
9. symbols which will be populated by blixem at run time Use to represent a normal s character e p program name e sh host name u user name m match sequence name s e sr reference sequence name e Ss start coord of feature on reference sequence e end coord of feature on reference sequence e sd dataset e s feature source e sf file name specified in the file tag in the GFF or in the Source stanza pfetch socket fetch mode socket node pfetch sanger ac uk port 22400 command pfetch args client p h u q C F m errors no match Not authorized Separator z output embl 10 pfetch http fetch mode http url http www sanger ac uk 80 cgi bin otter 65 pfetch request request m port 80 cookie jar nfs users nfs_g gb10 otter ns_cookie_jar errors no match Not authorized separator output fasta www fetch fetch mode www url http www sanger ac uk cgi bin otter 65 pfetch request request F m variation fetch fetch mode www url http www ensembl org Homo sapiens Variation Summary request v m bam fetch fetch mode command command bam get args file http hgdownload test cse ucsc edu goldenPath hg19 encodeDCC wgEncodeCshlLongRnaSeq releaseLates t sf chr prefix chr gff feature source S8 chr r start s nd dataset Sd output gff internal fetch mode internal none fetch mode non Data types You
10. the Ctrl Ctrl keys to zoom The toolbar The detail view toolbar contains the following functions Note that the Help and Settings buttons are included in the detail view toolbar even though they apply to Blixem as a whole Je TAa DE e lt lt lt gt gt gt Q Bfs7240 Aa969119 1 399 293 AA969119 1 EST Human Hs Homo sapiens human Figure 12 Detail view toolbar Scroll the detail view range to the left by one Back one page page Scroll the detail view range to the left by one Back one index base B Help Show help about how to use Blixem ir About Show program information al Settings Show the Settings dialog 3 Sort Show the Sort dialog E Zoom in Increase the font size in the detail view a Zoom out Decrease the font size in the detail view el Go to Go to a particular coordinate E First match Go to the first coordinate of the first alignment e Go to the start of the current alignment or the Previous match end of the previous alignment al Go to the end of the current alignment or the Next match start of the next alignment EI Last match Go to the end of the last alignment Acts only on selected sequences if there is currently a selection if no sequences are currently selected then this operation acts on all sequences 21 gt Scroll the detail view range to the right by one Forward one index base Scroll the detail view range to the right by one gt gt Forward one page
11. the name Blixem BLIXEM was originally an acronym for Blast matches In an X windows Embedded Multiple alignment although this is a bit of a misnomer now because Blixem can handle any kind of alignment not just BLAST matches We have dropped the acronym and the capital letters so the correct name is just Blixem Getting Started Running Blixem As aminimum Blixem takes the following required arguments blixem display type N P features file Where eatures file is the path name of a GFF version 3 file containing the alignments and any other features The display type or t argument is the only mandatory argument It defines the display mode n for nucleotide or p for protein Run blixem without any arguments to see further usage information Input files Blixem takes one or two files as input a mandatory GFF version 3 file containing the features and optionally a separate file containing the reference sequence in FASTA format blixem t N P reference sequence file features file If the reference sequence file is not provided the reference sequence must be supplied in FASTA format at the end of the GFF file following a comment line that reads 4tFASTA Note that the reference sequence must always be a nucleotide sequence and match sequences must be the correct type for the mode i e nucleotide sequences for nucleotide mode or protein sequences for prot
12. A C hr4 04 ensembl variation SNP 81040 81040 e Name rs2352935 url http 3A 2F 2Fwww ensembl org 2FHomo sapiens amp 2FVariation 2FSummary 3Fv amp 3Drs2352935 variant sequence T C hr4 04 ensembl variation insertion 82229 82230 Name rs35105663 url http 3A 2F 2Fwww ensembl org 2FHomo_ sapiens amp 2FVariation 2FSummary 3Fv 3Drs35105663 variant sequence G Q oe Q oe chr4 04 Augustus mRNA 119534 119941 ID transcript21 Name AUGUSTUS00000051712 chr4 04 Augustus exon 119534 119941 Parent transcript21 chr4 04 Augustus CDS 119534 119941 0 Parent transcript21 FASTA file A FASTA file has a header line that starts with gt We use a custom FASTA header format that contains the sequence name followed by the start and end coordinates separated by spaces Note that the FASTA sequence range may be different to the GFF file range The next line contains the start of the sequence data The sequence data can be on a single line or separated by newlines it is usually separated by newlines every 50 characters to aid readability chr4 04 44144 154265 tcttgtttctgtaggagaggccatctccatcagctataaccaaaaaaaaa acaaaaaactcctctttttgacaagtttgtaaagcctgtccatctgggtc tataataatcctccaggccctatgccactcctctttattcagccagttca Combined GFF and FASTA file gff version 3 sequence region chr4 04 44144 154265 chr4 04 210623 364887 EST_Human nucleotide_match 79195 39311 95 000000 Target DA692754 1 287 403 percentID 90 6
13. Blixem User Manual Written by Gemma Barson lt gb10 sanger ac uk gt Wellcome Trust Sanger Institute 17 January 2011 Revision History Revision Date Author First revision Blixem v4 1 5 17 01 11 Gemma Barson Updated for Blixem v4 1 9 14 02 11 Gemma Barson Updated for Blixem v4 1 13 25 03 11 Gemma Barson Updated for Blixem v4 1 14 05 04 11 Gemma Barson Updated for Blixem v4 1 17 09 05 11 Gemma Barson Updated for Blixem v4 2 17 06 11 Gemma Barson Updated for Blixem v4 7 02 12 11 Gemma Barson Updated for Blixem v4 14 15 06 12 Gemma Barson Contents Revision PUISCORY E Tanto i fed eo eege eebe EE ee ENEE An aside about the name Blixem o cccecceccccccececcscececcscescscecescsceecsceccscseacaceceacsesacacecsacsecacacescsesecacsceseaees 5 Getting Started us sosaxa iade EC FOE aRi ed A Ed aad dad Fac CR AM MM EMEN NI M M MEAM d aM ad O Running Memes eege iri Hr iancennes essences REFERRE e ES EE EE TE E ERE 6 MPU MeS EE 6 EE 6 ENEE 8 Combined GEF and FASTA file etri rr te Dg E E AEE Pp deed EEEE 8 Contiguration EEN 8 Program defaults asoini t EEE EERE EENE AREE EEEE EE Oi aaa AEEA Ra OE AEEA a ea ieai EE 9 Feth EEN Eed Eege dE 9 Data EV 11 SOULCE MAP EE 11 he EE 12 EC 12 Colour sy alvo MR a a E a A a 13 The Blixem Window ene KEE KREE KEE ER ER ERKER ER EK ER KER ER ER EE KEE EE rer ren een 14 ee D 15 Bip PIO
14. US sc ee Ee ee ee 16 Bumping the transcript VI W iussit the enel ee etae aede ta nn denied d eee a ead iiini ie 16 IRCH 17 Ee 17 ET 17 Nucleotide MOC ivsccs cesccensvesevesisiaccas covaenccuviecsevedesdavees conse odunseacebccdececcadcvasetedsasercutdsvecseduadvecdseceaseneeticdedt 18 TEE 8510 e eege Eege e EE Dee 18 TIX OVS oa ces E Me Me 20 COVETASS VIS Wieren E deeg dere SE deeg 20 eege P Pm 21 Th fati TROU s rerit teet aeo ne e dre etsi verd aeter ree co su ria E extus eek Wes eH E Iu case 23 Hiding Sections of the Ile Segeler Pie E aea det ed 23 jT Ing M D EA Te E M 25 KT sM X 25 III MEE d EENEG 26 NIS eean e i EAE E 26 KEE 26 KEE 26 PINGING REIMEN eege eege EE e 27 COPY ER 28 Sorting al ET 29 Fetching siecle 30 Groupifig SEQUENCES tee Ee ZE EE 20 Creating a group Irom a SelectiOnk eerte ra rr tie qae eh or EROR ee ER Ra FR WEX SEENEN 30 30 Creating group from EE 30 Creating a temporary match set group from the current selocton 31 WiibgunisE 31 RUMMIN GS ele e 32 LEE 33 Te 33 LEE A i i n
15. by name false bulk fetch pfetch socket embl pfetch socket raw user fetch pfetch socket embl pfetch socket fasta internal link features by name If true features with the same name are considered to have the same parent e g exons and introns with the same name are part of the same transcript or matches with the same name are from the same match sequence bulk fetch This specifies the default method to use when batch fetching sequences on start up Its value must be one of the fetch methods specified in the fetch method stanzas The results of the fetch are parsed by Blixem The bulk fetch method can be overriden for specific data types see the Data types section A comma separated list of fetch methods can be specified if alternative fetch methods should be used if the first fetch fails for some reason Each fetch method is tried in turn in the order listed until all sequences have been successfully fetched or we run out of methods to try user fetch This specifies the default method to use when the user interactively fetches a sequence from within Blixem i e by double clicking on a sequence Its value must be one of the fetch methods specified in the fetch method stanzas The results of the fetch are displayed to the user The user fetch method can be overridden for specific data types see the Data types section A comma separated list of fetch methods can be specified if alternative fetch methods should be used if the first fetc
16. e GFF file are highlighted in the reference sequence nucleotide header If the Show variations track sub option is also enabled then an additional line is shown above the nucleotide header showing the alternative bases for each variation Note that the Variations track can be quickly enabled or disabled by double clicking the nucleotide header You can double click a variation to open its URL g t ac a g jt ac a Figure 24 Variations track Show polyA tails When this option is enabled polyA tails are shown and highlighted in the alignment lists and polyA signals are highlighted in the reference sequence nucleotide header If the sub option Selected sequences only is enabled polyA features will only be shown for the currently selected sequences Display options Show Unaligned Sequence When this option is enabled any additional unaligned portions of the match sequences are displayed at the start and end of the alignments If the Limit to sub option is also enabled you can specify the maximum number of additional bases to display If the Selected sequences only sub option is enabled only the currently selected sequence s will display unaligned portions of sequence Show Splice Sites When this option is enabled splice sites are highlighted in the reference sequence nucleotide header for the currently selected sequence s The two bases from the adjacent introns are highlighted in green if they are canonica
17. ein mode GFF file Blixem uses the GFF version 3 file format In this section we give a very brief description of this file format see http www sequenceontology org gff3 shtml for a full description The GFF file should start with the following two comment lines Additional comments can be included but may be ignored gff version 3 sequence region chr4 04 44144 154265 Each subsequent line defines a feature A feature line must have the following 8 tab separated columns reference_sequence_name source type start end score strand phase An optional 9 column defines any tags separated by semi colons Blixem supports the following GFF tags Additional tags can be supplied but may be ignored Target required for alignments Gap required for gapped alignments ID required for parent features Name required for transcripts and SNPs Parent required for child features In addition Blixem supports the following custom tags percentld only applicable to alignments populates the ID column sequence only applicable to alignments supplies the sequence data variant sequence only applicable to variations supplies the variation data url only used by variations GFF3 special characters must be escaped Transcripts Note that exons should have a Parent transcript defined and the Name tag should be set in the parent rather than the child exons Note that Blixem will recognise exons that do not have a Parent ta
18. eotide sequence List headers show the 3 frame translation E rre 1 gccctgttgctecttacgeg aggacattgtgettggaag ttaggagcc alignments Frame 2 alignments Frame 3 w i 44 53 549 1 i Q16587 2 42 44 4 21 alignments Q16587 3 2 ai 44 4 Figure 8 Alignment lists protein mode In the nucleotide sequence header codons are read from top to bottom and then left to right starting at row 1 for frame 1 row 2 for frame 2 etc Middle clicking on a coordinate will highlight the three nucleotides for the selected codon and the currently active reading frame by default frame 1 Left clicking in an alignment list sets the active reading frame LI The 3 nucleotides for selected codon A Darker blue indicates the in reading frame 2 are highlighted in blue nucleotide whose coordinate is shown here V fj f120239 Q5n8301 769 3 gegegaggacattgtegttggaagcttaggagcccaccegcc RUM t Select an alignment or click anywhere in an alignment list to set the active reading frame Figure 9 Selected reading frame and codon Exons Exons are displayed as solid colour blocks in the detail view coloured green for CDS red for UTR Vertical blue lines are drawn at the start and end of the blocks so that it is easy to see whether alignments line up with the exon boundaries In protein mode an exon may not start or end exactly at a codon boundary A partial or split codon like this is indicated in the detail view by cross hatch highligh
19. g if they have a Name tag instead but they may not get grouped correctly with other exons from the same transcript Typically one defines the parent transcript the exons and the CDS regions Blixem will then calculate the missing components in this case the UTR regions and the introns Blixem will recognise other combinations of inputs and will always calculate the missing components as long as enough information is provided Variations SNPs insertions and deletions are supported as well as combined variations One may use the generic sequence alteration type for these but it is good practice to use more specific types such as snp or deletion where applicable Sample GFF file A sample GFF file may look like this denotes that text has been omitted gff version 3 sequence region chr4 04 44144 154265 chr4 04 EST Human nucleotide match T9195 79311 95 000000 Target DA692754 1 287 403 percentID 90 6 sequence GATCTGGC chr4 04 EST Human nucleotide match 79195 79323 121 000000 Target AI095103 1 326 454 percentID 96 9 sequence TTTAAATT chr4 04 ensembl variation deletion 80798 80799 3 Name rs60725655 url http 3A 2F 2Fwww ensembl org 2FHomo sapiens 2FVariation S2FSummary 3Fv 3Drs60725655 variant_sequence AA chr4 04 ensembl variation Sequence alteration 80799 80799 3 Name rs57681246 url http 3A 2F 2Fwww ensembl orgs2FHomo_sapiens 2FVariation S2FSummary 3Fv 3Drs57681246 variant_sequence
20. h fails for some reason Each fetch method is tried in turn in the order listed until the sequence has been successfully fetched or we run out of methods to try Fetch methods These stanzas define custom methods for fetching sequence data Each fetch method must specify the eccn moae key which determines what type of fetch to perform Other keys depend on the fetch mode Valid fetch modes and their required keys are socket node port command args http url port cookie jar request command command args www url request user fetch only opens browser internal user fetch only displays stored sequence none none In addition the following keywords are required for bulk fetch methods separator Specifies the separator between multiple sequence names when they are compiled into a list output Defines the output format and can be one of the following raw raw sequence data each sequence separated by a new line fasta FASTA format e emb1 EMBL format e gff GFF format for re parsing The following optional keywords can also be included for any fetch method errors Specifies a list of known error messages This is used by Blixem to determine whether an error occurred even if the fetch program executed successfully The value should be a comma separated list of the expected error message text e g error no match Not authorized The request and args values can include the following substitution
21. is now hidden Alternatively use the following keyboard shortcuts to toggle visibility of a component 1 Hide top pane in detail view 2 Hide second pane in detail view 3 Hide third pane in detail view protein mode only Ctrl 1 Hide top grid in big picture active strand Ctrl 2 Hide bottom grid in big picture other strand Shift Ctrl 1 Hide top exon view active strand Shift Ctrl 2 Hide bottom exon view other strand 24 Operation Navigation Scrolling Middle click double click and then drag in big picture Click on the highlight box and drag Middle click drag in detail view Click a feature in the big picture Horizontal scrollbar Vertical scrollbars Horizontal mouse wheel Vertical mouse wheel Ctrl left Ctrl right Home End Ctrl Home Ctrl End comma full stop Ctrl Ctrl Go to button or p key Jump to a particular region Dragging moves the highlight box Move the highlight box Select a base Releasing the mouse button scrolls the display to centre on the selected base hold down Ctrl to avoid scrolling Selects that feature and scrolls the detail view vertically so that it is visible if it is in the current detail view range Scroll the detail view range Scroll up down in the detail view or the big picture Scroll the detail view range if your mouse has a horizontal scroll wheel Scroll up down the currently moused over alignment list in the detail
22. l or red if they are 34 non canonical Highlight Differences When this option is enabled matching bases are blanked out and mismatches are highlighted making it easier to see where alignments differ from the reference sequence Squash Matches This groups multiple alignments from the same sequence together into the same row in the detail view rather than showing them on separate rows General settings Font Allows you to change the font that is used to display alignments in the detail view Note that you must select a monospace font otherwise matches will not be shown aligned correctly Blixem will warn you if the font you have selected is not monospace Fetch mode Allows you to change the program used to fetch sequence EMBL entries Currently only available to authorised users within the Sanger Institute Columns Load optional data Click this button to load optional data from EMBL entries currently only applicable to authorised users within the Sanger Institute Note that this operation can take a long time if there are many sequences The button will be greyed out once optional data has been loaded Column visibility Tick un tick the check marks to show hide individual columns Adjust the column width by entering the new width in the text box in pixels Note that if you enter a zero width then the column will be hidden regardless of whether the check mark is ticked or not Greyed out columns are optional data colum
23. log or Find dialog To copy sequence name s to the default clipboard select the sequence s and hit Ctrl C Sequence names can then be pasted into other applications using Ctrl V e The default clipboard can be pasted into Blixem using Ctrl V If the clipboard contains valid sequence names those sequences will be selected and the display will jump to the start of the selection Note that text from the feedback box and some text labels e g the reference sequence start end coords can be copied to the selection buffer by selecting the required text with the mouse or copied to the default clipboard by selecting it and then hitting Ctrl C Text can be pasted from the default clipboard into text entry boxes on dialogs such as the Groups or Find dialog by using Ctrl V 28 Sorting alignments Click the sort button on the toolbar to open the Sort dialog Select the column you wish to sort by from the top drop down box on the dialog You may optionally sort by further columns You can sort by as many columns as you wish by adding further drop down boxes using the Add button eA X Blixem Sort Sort by Identity then by then by select column C Invert sort order sia Add A cance f Apply Dok Figure 19 Sort dialog The default sort order may be ascending or descending depending on what makes most sense for the selected column e g sorting by position is ascending by default but sorting by score
24. man fill color 4 ff0000 line_color bb0000 The group name in square brackets denotes a source and the colours will apply to any features from the GFF file with the same source name As many groups as required can be defined Any features whose source does not have a group in the key file will use default colours The key value pairs give the identifier of the colour and the colour string in hexadecimal format RRGGBB Valid colour identifiers recognized by Blixem are fill_color line_color fill_color_selected line_color_selected fill color ubt line color utr fill color utr selected line color utr selected Only iii color and line color are mandatory the selection colors will be calculated automatically if not specified explicitly a darker shade of the default color will be used when the feature is selected For transcripts the fill color line color etc items are used for CDS regions and different colors can be specified for UTR regions using fill color utr line color utr etc 13 The Blixem Window The Blixem window consists of two main sections an overview section called the big picture and a detail section showing the actual sequence data These sections are separated by a splitter bar so you can maximise the space for the area you are interested in You can also hide sections of the window using the View menu Blixem can show sequences in nucleotide or protein mode X Blixem Variatio
25. menu in Blixem or hit the g shortcut key e To clear the match set group choose the Toggle match set option again or hit the g shortcut key again While it is enabled i e toggled on the match set group can be edited like any other group via the Edit Groups dialog Any settings you change e g highlight colour will be saved even if the match set group is toggled off and then on again f you delete the match set group using the Edit Groups dialog all of its settings will be lost you will get the default settings again the next time you enable the match set group To avoid this disable it by toggling it off using the Toggle match set menu option or g shortcut key rather than by deleting it in the Groups dialog Editing groups To edit a group right click and select Edit Groups or use the Ctrl G shortcut key 3l eo X Blixem Groups L Hide all sequences not in a group Group name Hide Highlight Order cout M gon C E P G oe a Delete all groups So um ete d Figure 22 Groups dialog edit groups You can change the following properties for a group Click on Apply or OK to apply the changes Name You can specify a more meaningful name to help you identify the group Hide Tick this box to hide the alignments in the alignment lists Highlight Tick this box to highlight the alignments Colour The colour the group will be highlighted in if Highlight is enabled The default
26. n Name X Cancel of Apply ox Figure 21 Groups dialog create group Creating a group from a search Right click and select Create Group or use the Shift Ctrl G shortcut key Or Ctrl G if no groups currently exist Select the Text search or List search radio button and enter some text to search for Select the column that you wish to search in the drop down box at the bottom Click OK or Apply 30 Notes List search allows you to enter multiple search strings place each string on a separate line You can use the following wild cards in the search text an asterisk represents any number of characters a question mark represents any single character You can paste text into the search boxes from the selection buffer by middle clicking or from the clipboard using Ctrl V You may paste sequence names directly from another compatible program e g ZMap click on the feature in ZMap and then middle click in the text box on the Groups dialog Grouping in Blixem works on the sequence name alone so the feature coords output by ZMap will be ignored Creating a temporary match set group from the current selection You can quickly create a group from a current selection e g selected features in ZMap or just the current selection in Blixem using the Toggle match set option To create a match set group select the required items and then select Toggle match set from the right click
27. ns DNA fwd offset gff alignment chr4 04_210623 364887 120000 120500 121000 f Sort by Identity vis lt lt e gt gt gt Q BU739888 1 386 2e D s Score ld Start Sequence ENSES lt gt 308610 1 Figure 1 Nucleotide mode There are two panes in the detail view one for each strand The active strand is shown at the top The active strand can be changed by hitting the Toggle button or the t shortcut key 14 eoo X Blixem Variations prot fwd offset gff alignment chr4 04 210623 364887 120500 121000 ctgtgagtcgcgccgtgaggccaactgggcatgtaccccacgttaggtcgtgggtggc i geggccacgaceegceggctaccgtgaagtgccggtecggaatggaacgctaaggctg Start cgacgccctatttcgagatgcatgc tggatt t End 440 57 48 57 40 57 Figure 2 Protein mode There are three panes in the detail view one for r each reading frame of the active strand The other strand can be activated by hitting the Toggle button or the t shortcut key Active Strand The active reference sequence strand in Blixem controls the orientation of the display coordinates are shown increasing from left to right for the forward strand and decreasing for the reverse strand The active strand is always shown at the top i e the top grid and top transcript view in the big picture and the top pane in the detail view In protein mode only the active strand is shown in the detail view One must toggle the strand to view the other strand
28. ns and will only become available once optional data has been loaded Grid properties ID per cell Use this to change the vertical scale of the grid a smaller value means the grid will be more spaced out a larger value means the grid will be more compact 35 Max ID Defines the maximum cut off value for the ID scale Min ID Defines the minimum cut off value for the ID scale Coverate view properties Depth per cell Use this to change the vertical scale of the grid for the Coverage View see the View menu to turn on the Coverage View a smaller value means the grid will be more spaced out a larger value means the grid will be more compact Appearance Use print colours Select this option to make Blixem use grey scale colours suitable for printing Display colours Change any of Blixem s custom display colours such as the colour aligned bases are shown in or the colour stop codons are highlighted in etc There are four colours for each item Normal this is the standard display colour Normal selected this is the colour used when the item is selected if applicable Typically one would use a slightly darker or lighter shade of the Normal colour for this so that the item does not look radically different when it is selected Print this is the standard colour used when the Use print colours option is enabled Print selected this is the colour used when Use print colours is enabled and the i
29. nt Ctri P Se Settings Ctri S i View V Create Group Shift Ctri G Z Edit Groups Ctri G Toggle match set group G Deselect all Shift Ctrit A Dotter Ctri D 3 Close all Dotters Ctri W Quit Ctrl Q Close Blixem and any spawned processes Help Ctrl H Display the user help Print Ctrl P Printing options Settings Ctrl S Edit settings View v Show hide parts of the display Create Group Shift Ctrl G Create a group of sequences Edit Groups Ctrl G Edit properties for groups Toggle match set G Toggle the special match set group on and off This group is a quick way of creating a group from the current selection buffer which should contain match sequence names Deselect all Shift Ctrl A Deselect all sequences Dotter Ctrl D Run Dotter on the currently selected sequence Close all Dotters Close all Dotters that have been opened from this Blixem Hiding sections of the window Use to View dialog to show hide sections of the window 1 Right click and select the View option or hit the v shortcut key 2 Toggle check marks on or off to show hide sections 23 x Blixem View panes Big picture E Show big picture Active strand O Show gria El Show gxons Bump exons Other strand Show grid Show exons Bump exons Alignment lists Show alignment ists Active strand Show active strand Other strand Show other strand Figure 16 The View dialog Active strand grid
30. or ID is descending To get the inverse of the default sort order select the Invert sort order option on the Sort dialog Alignments can also be sorted by group Alignments that are part of a group will then be listed first before any that are not in a group and ordered according to the group s order number See the Groups section for more details Par Q9UIIS 1 Q8TB69 1 Q8TF20 2 29 Fetching sequences Currently only available to authorised users at the Sanger Institute Double click a row to fetch a match sequence s EMBL file Grouping sequences Alignments can be grouped together so that they can be sorted highlighted hidden etc Creating a group from a selection Select the sequences you wish to include in the group by left clicking their rows in the detail view Multiple rows can be selected by holding the Ctrl or Shift keys while clicking Right click and select Create Group or use the Shift Ctrl G shortcut key Note that Ctrl G will also shortcut to here if no groups currently exist Ensure that the From selection radio button is selected and click OK or Apply If you click Apply you will be shown the group you just created so that you can edit it If you click OK the group will be created with the default properties eo X Blixem Groups Create group Edit groups Text search wildcards and M Use current selection List search Search colum
31. page Scrolls to the start of the first alignment from that a Find sequence if any are found 2 Toggle strand Toggle which strand is the active strand Feedback box The feedback box contains information about the currently selected sequence and or coordinate if either is selected Click on a row in the detail view to select a sequence Middle click on a base in the detail view to select that coordinate Text in the feedback box can be selected and copied Reference sequence Match sequence coordinate coordinate 127038 BX503083 1 27 Match sequence Match sequence name length Figure 13 Feedback box Moused over item feedback area The area to the right of the toolbar contains information about the currently moused over item e g a match sequence in the alignment list or a variation in the variations track For a match sequence this information includes the sequence name and optional data such as organism and tissue type that can be parsed from EMBL files currently only available to authorised users To load optional data see the Settings dialog Note that the optional data may be incomplete due to the inconsistent information available from the EMBL files BX503083 1 Hs Homo sapiens human liver Figure 14 Moused over item feedback area 22 The main menu Right click anywhere in the Blixem window to pop up the main menu The options are Figure 15 Main menu ell Quit Ctri Q H Help Ctri H Pri
32. quence and the first base in the sub sequence will be selected Enter your search text in the appropriate box and click the OK button to perform the search By default Blixem will start searching from the beginning of the reference sequence range To start the search from the current position instead click the Forward or Back button instead of OK This will start searching from the currently selected base if there is one selected if not it will start from the beginning of the current detail view display range when searching forwards or from the end of the display range if searching backwards Repeat a Find After clicking OK on the Find dialog press F3 to repeat the search in a forwards direction or Shift F3 to repeat in a backwards direction Alternatively if you had selected the Forward or Back button in the Find dialog then click the Forward or Back buttons again to jump to the next result in that direction Copy and paste When sequence s are selected their names are copied to the selection buffer and can be pasted to another program by middle clicking in that program Sequence names can be pasted from the selection buffer into Blixem by hitting the f keyboard shortcut If the selection buffer contains valid sequence names those sequences will be selected and the display will jump to the start of the selection Sequence names can also be pasted from the selection buffer into text boxes in dialog boxes such as the Groups dia
33. r after opening the dialog The selected sequence name will be shown at the top of the dialog Alternatively if you just wish to edit the settings you do not need to select a sequence To run Dotter with the default automatic parameters just hit RETURN or click the Execute button To enter custom parameters select the Manual radio button and enter the values in the Start and End boxes Tosave the parameters without running Dotter click Save and then Cancel To save the parameters and run Dotter click Execute To revert to the last saved manual parameters click the Last saved button To revert back to automatic parameters click the Auto radio button The coordinates in the Start and End box will be recalculated for the currently selected sequence Reference sequence versus itself To run Dotter on the reference sequence versus itself select the Call on self tick box in the Dotter dialog and then click Execute This can be useful to analyse internal repeats etc see the Dotter manual for more information Dotter HSPs only This starts Dotter in HSP High Scoring Pair mode see the Dotter manual Ee Settings The settings menu can be accessed by right clicking and selecting Settings or by the shortcut Ctrl S Features Highlight variations When this option is enabled bases in the reference sequence that have know variations such as SNPs insertions deletions etc loaded from th
34. tanzas are used to specify display settings via the config file that is settings that the user can change via the Settings dialog in Blixem These are saved to the default config file b1ixemrc when Blixem exits so settings are persistent between Blixem sessions user settings This stanza is used to specify display options that Blixem will use on start up These are currently all true false values which should be given 1 for true or 0 for false except for num unaligned bases which takes an integer value user settings highlight diffs 0 highlight variations 1 show variations track 1 how unaligned 0 how unaligned selected seq 1 imit unaligned 0 how polya site 0 how poly site selected seq 1 how poly sig 0 how polya sig selected seqg 1 show splice sites 0 num unaligned bases 5 squash matches 0 ID mom tim column widths This stanza is used to specify column widths that Blixem will use on start up It can also be used to hide a column by specifying a width of zero Column names should be exactly as they appear in the column headers in Blixem and are case sensitive Widths are specified in pixels column widths Name 120 Source 85 Organism 25 Gene Name 0 Tissue Type 0 Strain 0 Group 0 12 Score 40 SId 45 Start 50 End 80 Colour key file A ini style key file can be supplied via the styles file argument in order to tell Blixem what colour to draw certain features in e g TEST Hu
35. tem is selected 36 Key In the detail view the following colours and symbols have the following meanings Alignment list header Alignment list Alignment list Alignment list Alignment list Alignment list Alignment list Nucleotide header protein mode Alignment list header protein mode Alignment list header protein mode Yellow background Cyan background Violet background Grey background with grey background Yellow vertical line Thin blue vertical line Sky blue background Pale red background Green background Reference sequence Identical residues Conserved residues Mismatch Deletion Insertion Boundary of an exon The three nucleotides for the currently selected codon darker blue indicates the nucleotide whose coordinate is displayed in the feedback box STOP codon MET codon 37 Keyboard shortcuts Ctrl Q Quit Ctrl H Help Ctrl P Print Ctrl S Edit settings V Show hide sections of the display Shift Ctrl G Create group Ctrl G Edit groups or create a group if none currently exist Ctrl A Select all sequences in the current list Shift Ctrl A Deselect all sequences Ctrl D Dotter Left arrow Move coordinate section one index to the left Right arrow Move coordinate section one index to the right Shift Left Same as Left but in protein mode it scrolls by a single nucleotide Shift Right Same as Right but in protein mode it scrolls by a single nucleotide Ctrl Left Scroll to
36. the start end of the previous alignment Ctrl Right Scroll to the start end of the next alignment Up arrow Move row selection up Down arrow Move row selection down Home Scroll to the start of the display End Scroll to the end of the display Ctrl Home Scroll to the start of the first alignment Ctrl End Scroll to the end of the last alignment Zoom in detail view Zoom out detail view Ctrl Zoom in big picture Ctrl Zoom out big picture Shift Ctrl Zoom out big picture to view the whole reference sequence Scroll left one coordinate e Scroll right one coordinate P Go to position T Toggle the active strand G Toggle the match set Group 1 Toggles visibility of the 1 alignment list 2 Toggles visibility of the 2 alignment list 3 Toggles visibility of the 3 alignment list protein mode only Ctrl 1 Toggles visibility of the 1 big picture grid Ctrl 2 Toggles visibility of the 2 big picture grid Shift Ctrl 1 Toggles visibility of the 17 exon view Shift Ctrl 2 Toggles visibility of the 2 exon view Only applicable if a coordinate is currently selected middle click a coordinate to select it Limited to just the selected sequences if any are selected otherwise acts on all sequences 38 39
37. ting and by drawing a dotted blue line rather than a solid line Note that dotted lines may be obscured by solid lines at the same position The true boundary for split codons would really be either a third or two thirds of the way through the character width but Blixem does not draw boundaries through the middle of characters to avoid too cluttered a display tctattttcgctatagggagttcggtatcgcgccattaggatgataacgtcgggataatt gcttatataattctgatctatccaagagtacaaaatagatttaaagattctgaaatacat tcgatgttgacaacgtgcaactaaggacgctcggtgtatggggccgcgccgtgtcctaac i Een i Figure 10 Exons in the detail view Split codons are indicated with cross hatching e g the last codon in the selected exon is a split codon because it does not include all three bases for that codon as you can see from the highlighting in the DNA header Coverage view The coverage view shows a plot of how many alignments there are at each coordinate along the reference sequence It can give an indication of where the regions of interest are Figure 11 Coverage view The coverage view can be shown hidden by ticking unticking the Show coverage view check box on the View dialog which can be accessed from the right click menu or by hitting the v shortcut key The scale of the coverage view is the same as that of the big picture and it can be 20 navigated in the same manner i e use the horizontal scroll bar or middle click to scroll and usethe zoom buttons at the top or
38. ve the selection up down a row using the up down arrow keys Selecting coordinates You can select a nucleotide peptide by middle clicking on it in the detail view This selects the entire column at that index and the coordinate number on the reference sequence is shown in the feedback box The coordinate on the match sequence is also shown if a match sequence is selected By default the display will centre on the selected base when you middle click To select a base without scrolling hold down Ctrl when you middle click For protein matches when a peptide is selected the three nucleotides for that peptide for the active reading frame are highlighted in the header in blue The active reading frame is whichever alignment list currently has the focus click in a different list to change the reading frame Darker blue highlighting indicates the specific nucleotide that is currently selected i e whose coordinate is displayed in the feedback box 26 lt lt gt Q amp 103596 Q5VIY5 2 1 469 181 rtctaatagacgaggaaatattataggaaagatggattttcctc tcgattatactatgatggagtcttgacctctcctgtgttcatc igtactatattataccattttcaagatcatagtattagctctgt Figure 17 The 3 nucleotides for the currently selected amino acid in reading frame 3 Selected nucleotide 103596 is shaded in darker blue You can move the selection to the previous next index using the left and right arrow keys In protein mode you can move the selected nucleotide by
39. view or the big picture Scroll to the start end of the previous next match limited to currently selected sequences if any are selected includes all sequences otherwise Scroll to the start end of the display Scroll to the start end of the currently selected alignments or to the first last alignment if none are selected Scroll the detail view range one nucleotide to the left right Scroll the detail view range one page to the left right Scroll to a specific coordinate position 25 Zooming Zoom in out of the detail view keys and alal Ctrl or Ctrl keys and Zoom in out of the big picture Zoom in Zoom out Zoom the big picture out to view the full length of the i Whole 8P g Shift Ctrl and BEER reference sequence Selections Selecting sequences You can select a sequence by clicking on its row in the alignment list Selected sequences are highlighted in cyan in the big picture You can select a sequence by clicking on it in the big picture The name of the sequence you selected is displayed in the feedback box on the toolbar If there are multiple alignments for the same sequence all of them will be selected You can select multiple sequences by holding down the Ctrl or Shift keys while selecting rows You can deselect a single sequence by Ctrl clicking on its row You can deselect all sequences by right clicking and selecting Deselect all or with the Shift Ctrl A keyboard shortcut You can mo

Blixem User Manual

Contents

Download Pdf Manuals

Related Search

Related Contents