Home
Bachelor Project (IN3700) 2007 Melissa Cheung, 1228161 Software
Contents
1. BULNS SIII 4azluoyou s QweNJesn BUS uuos UOIPSUUONY IHXF VII payesyuayiny 1778 40 199185 noes um sono ds ses welbBold uny 99X1LV ld INOLV Ia INDSVAIN dsn IV 14 unu weJ beig e2uenbas M EISIID 22 2 Indu Bus mIe1sn guna c 1 5 sd indino Mielsn 9 104 s um s 195 p9ZIUOJYIU S indu Bus uoiyejouue Hus Buls sa Jezluoyou s wey sn us 4400 vuoOn9auuo 99xXIMI81sn g pajeonueyiny 1imbuibo7 xu nb s nosSum sono ds ses INDSVAIN SSXF7MIEISNID MEn MIB1ISAIO uny 5 4 Interface Design One challenge of this project is to visualize a lot of biological knowledge and data on one screen for analysis of DNA This Section will give an impression of the layouts options discussed during this project for the visualization of the combination of Spectrogram Analysis and sequence alignment results At first one of the user s whishes was to have the output of a Spectrogram Analysis next to an output of a sequence alignment The layout for the main output in MIDAS is illustrated in Figure 24 sequence alignment results The annotations overview contains all the annotations corresponding on that current spectrogram image The scrollbar can be used to scroll t
2. il on baq a Ln i a on ei Pi l T de h T a m r I 4 Ur o T a a T a a x 3 Table 3 MoSCoW analysis MultiModal Interface for DNA Alignment of Sequences 83 10 Project Reflection After completing MIDAS we can reflect our project and summarize our experiences This was actually our first chance to put our theoretical knowledge into practice Working at a multi international like Philips is certainly not the same as studying at the university At the university you normally know exactly what to do when the deadline is what the assignment is and which persons have the answers to your questions during your assignment In a company these are the things you have to find out by yourself We experienced that at a company all the employees have their tasks and their responsibilities and even more important have their own schedules Each question meeting or appointment can be rescheduled since everybody is very busy We noticed that during our work in a company we also had to deal with costs For example production costs To complete MIDAS we used a set of software tools and some of them were expensive Then you start looking at other possibilities which are not expensive We learned how to make decisions about software purchasing depending on the costs possibilities necessity and learning curve compared to the time we had At the Research department o
3. Connect Login Search for output directory Search for video Start video Set pauze stop play Close MediaPlayer Scroll to region of interest Search and start movie Prepare files run perl program obtain fore fa and back fa Run some ftp program Login put fore fa input fa Specify frame number Specify lines Figure 6 Task diagram chromosome analysis Each block in the diagram is one task By examining all the tasks it becomes clear that a lot of tasks can be automated The user has to hardcode input paths output folders standard settings etc Then the user runs the script files manually on the command line in MATLAB This approach is sensitive for errors and the MATLAB command line does not accept a single type error The user has to stay focused during the whole process when a function call is executed he has to execute the next one with the right parameters This approach cannot be called user friendly 4 2 3 Improved situation The development of MIDAS will contribute to the efficiency during analysis of DNA MIDAS provides an interface where most of the tasks in the current situation are automated or simplified The user does not have to give complete function calls to the command line or set all the parameters
4. Show alignment Figure 9 Workflow Diagram MultiModal Interface for DNA Alignment of Sequences 31 5 Design This chapter describes the Design phase of this project The architectural design is presented in 5 1 This outlines the design of architectural components of MIDAS As follows the technical design and specification is presented in 5 2 In this Section the techniques used in this project are discussed In 5 3 the Java design is presented with class diagrams and sequence diagrams In 5 4 the interface design issues and storyboards can be found 5 1 Architectural design The architectural design describes the possibilities of the system This chapter presents the design goals present outline of the system and subsystems and the structure of the system 5 1 1 General priorities design goals The next goals are the guidelines for the design process Efficient Low maintenance User friendly Reliable Flexible Stand alone 5 1 2 Outline of the design The aim of this project is to develop a tool which integrates Spectrogram Analysis and sequence alignment and provides visualization of patterns in the spectrograms The Spectrogram Analysis is preliminary work executed by Evan Santo This program consists of MATLAB scripts The sequence alignment tool BLAT is originally a web based application as it aligns a sequence with the genome and the genome database are accessible by Internet ClustalW is a tool which al
5. On linux server On linux server ClustalW Multiple Sequence Alignment Tool Sequence alignment Figure 12 System Overview In option 2 the application will be run as a stand alone executable Java program Thus all the source code has to be integrated in Java One approach is to rewrite all the MATLAB scripts into Java classes This approach is not a suitable solution Rewriting all the files is a time consuming process Also by rewriting the scripts into Java code the computational power of the high level matrix optimized language will be lost Thus this leads to the other approach which is to build the MATLAB scripts into Java classes with the MATLAB Builder for Java The converted MATLAB scripts run on a MATLAB Component Runtime which is a stand alone set of shared libraries required for executing MATLAB based components on computers without MATLAB These converted scripts can be called in Java and the GUI visualizes the output The main application creates a connection to a Linux server to get the output of BLAT and ClustalW and displays this in de GUI Option 2 is the solution for this issue Both options were investigated on several criteria to meet the design goals The most important ones are efficiency and performance MultiModal Interface for DNA Alignment of Sequences 36 The first option is to integrate Java in MATLAB and the main application will be a MATLAB executable Although it is possible t
6. MultiModal Interface for DNA Alignment of Sequences 22 Object models An activity diagram represents the overall flow of control of the system This diagram is presented in figure 4 select sequence select spectrogram alignment Run Sequence Alignment Analysis Run Spectrogram Analyis select directory to load choose multiple DNAfile amp load FA file discontiguous DNA choose chromosome DNAfile load external Spectrogram Analysis set up own preferences Run Spectrogram Analysis obtain Spectrogram and SpectroVideo set up own select single preferences sequence Run BLAT login onto server select multiple sequences login onto server set up own preferences Run ClustalW load FA file chromosome finished finished Figure 4 Activity Diagram MultiModal Interface for DNA Alignment of Sequences 23 4 Analysis After setting up the requirements for MIDAS research was done about the preliminary work which is used in MIDAS This research includes investigation of the Spectrogram Analysis Also a user analysis for MIDAS is done in the analysis phase This chapter describes the preliminary work in Section 4 1 The user analysis in Section 4 2 describes the users profile presents task diagrams of current and the improved situation and a workflow diagram 4 1 Preliminary work Spectrogram Analysis The preliminary work consists o
7. This function is able work with arguments which are needed for MIDAS Arguments imagePath Path of the Image txt annotPath Path of the Annotation txt outputPath Path of the output folder where the video will be placed movFileName The name of the movie which is created 6 2 3 Running BLAT or ClustalW The output of the alignment is visualized on a tabbed panel in MIDAS This tab contains the output of the alignment in text format and the annotation next to the partial image of the cluster which was selected by the user as input for the alignment This partial image is created with the new MATLAB function CreatePartiallmage m CreatePartiallmage m New File Description Summary This function is called from within JAVA to create a partial image This image is displayed together with the output of ClustalW This function creates a partial image indexed by startPos and endPos These positions are given as paramaters from the JAVA interface The user selects clusters in this interface The first index of this cluster is startPos and the last index is endPos This program has the restriction that there is a maximum of 2 images can be taken to create a single image from This restriction is checked in JAVA MIDAS before making calls to this function The reason why you can give at max 2 images is that if you take more than 2 the image becomes to big to fit the outputscreen in MIDAS If you accidentally give more images as arguments so endIm
8. HUM chr3 50 HUM chr 19 58 z HUM chr19 58915792 HUH chr14 3055 HUH chr1 10051 HUH chr19 5747 HUH chr19 HUH chr19 HUH chr19 iH HUN chr19 5893 6029 HUN chr2 56063106 HUM chr3 50155713 HUH chr12 53016917 HUH chr9 95980437 HUH chr19 589215942 7 HUH chr19 2 6 l 3 HUM chr3 50 HUM chr 19 58 HUM chr 19 5893 iK H nob con E mom ci ci GG Se R H 1 HUM chr19 55 13 HUN chr19 55915642 HUM chr19 55915642 HUM chr1 100519035 HUH chr19 58936179 HUH chr19 58936179 HUH chr1 71305402 o x 0 Jm wh m l HUH chr19 58915412 HUH chr19 589 HUH chr1 153583015 HUH chr10 102724532 HUH chr1 216 8 HUH chr18 5 HUH chr10 HUH chr22 3657 HUH chr19 472 HUM chri5 86955560 HUH chr15 73433105 b b wW N H BLAT Clustalw Annotation no 421 lt gt Image 9 lt gt Sequence no ClustalW Enter line number Annotation HUM chrl2 53016917 4 Run BLAT Run Clustalw A Hostname NONE Login NONE Authenticated NO Figure 3 Output from a Discontiguous Spectrogram Analysis MultiModal Interface for DNA Alignment of Sequences 118 amp MIDAS Spectrogram Analysis Sequence Alignment Connection Help Create PDF x Spectrogram Analysis Methylated_Cancer_UNmethylated_Normal fa Zinc finger 420 domain c
9. 132449811 112549106 11284028 132449811 115461485 116148438 132449811 121179930 12118941 132449811 47887158 48298411 132449811 112569887 11285712 132449811 130800069 13122487 132449811 31832715 32289003 132449811 54992269 55289096 132449811 93138547 93405230 132449811 47855770 48358381 132449811 120952490 12143767 132449811 108416689 10887202 132449811 1633889 52161574 367678 260337 84730 78685 595 291001 686821 9335 411069 287132 LEAR mn nn ej del je el ej e sel ele el fe Ss 2 le fs Ss fe tS te i ls is Lote ela ale ea OOO OO ela e ml Cen e 131 2 2 le 0 O 01 e O NEN WDE 03 9 WAN WRENN N WON e 131 RES se 5 3 Dutput File Location U Temp smal1lRNAs 200 50 11 52 BLAT testpretty out gt HUM chr12 53016917 53017116 04 200 of 200 chrl12 53016916 S53017116 of 132449811 ACAAAAATTAGCTGGGCATGGGGGTGTGTGCCTGTAGTCCCAGCTACCCGGGAGGCTGAG mRNT III acassassttagcetgggceatgggggtgtgtgectgtagtcccagcetacccgggaggetgag GCACAAGAATCACTTGAATCCGCGGAGGTGGAGCTTIGTAGTGAGCGCGAGTTTGCTCCACTG ITE Dee HA gcacaagaatcactt gaat ccgggaggt ggaggtt gtagt gageggagtttgetccacty Dee TTS COUT OTT cactccagcectgggtgacagagagagacrtgtcogcrassssssssssssssssasssssaag AAGCCCCTGAGATCAAA AT ANINI aaGCCCCTGAGATCAAAGAT
10. I I 1 S as 7 MIDAS Swing Application i i i ah l l 2 u TCP BLAT 1 IP l I 1 Spectrogram Analysis Y ClustalW 1 NBIC Database Copy 1 AE A A A A ee AS a ee es o a 0 Si NA re SS Ss ee I 1 I 1 i High Performce Platform I 1 I Spectrogram Analysis MATLAB I 1 Figure 14 Deployment Diagram MultiModal Interface for DNA Alignment of Sequences 40 5 3 Java Design To design the Java structure of MIDAS the Unified Modeling Language is used Several models have been created as seen in the Technical design and specification In this chapter the class diagram is presented and the class descriptions Because MIDAS is a broad system with many classes containing many methods the attributes and methods are left out 5 3 1 Class Diagram In Figure 15 the class diagram is presented There are four packages MIDAS spectrogramAnalysis sequenceAlignment and tools MIDAS This package contains the main application thus the main function to run MIDAS The outputs of the Spectrogram Analysis and sequence alignment are combined in this package One of the most important objects is the General Settings object GeneralSettings class This general settings object contains all the important information which needs be known and can be used in the whole system The classes for visu
11. has a deep understanding of the psychological characteristics knowledge and experience function and task characteristics and physical characteristics of the user he can fit the product to their needs The user profile is described below Psychological characteristics Attitude Motivation Knowledge and experience Reading experience Type experience Education System experience Application experience Use of other systems Computer knowledge Programming skills Language Function and task characteristics Frequency of use Training Use of system Priority Task structure Physical characteristics Color blindness Sex Age 4 2 2 Current situation interested in information about genes DNA he will be happy using MIDAS because it gives him the possibility to speed up their work during analysis of genes high high high average average average high average English Hindi daily is important because it will help the user in first stage using MIDAS bioinformatics MIDAS is important for the user tasks will decrease using MIDAS possible man woman 25 Analyzing a single chromosome or a discontiguous chromosome is a time consuming job currently A lot of tasks have to be done manually by each employee of Philips Task diagrams give a clear view on the tasks which an employee of Philips performs to analyze a chromosome in the current situation Figure 5 displays the task diagram for analyzing a dis
12. 13 Component Diagram The deployment diagram Figure 14 illustrates the components in their execution ClustalW BLAT Perl Sequence from Spectra K environment This diagram shows the hardware for your system the software that is installed on that hardware and the middleware used to connect the disparate machines to one another The deployable MATLAB Component Runtime converted Spectrogram Analysis MATLAB package and the Main Application MIDAS are located on the client machine The main application MIDAS connects to the Philips server The sequence alignment tools BLAT and ClustalW are located on this server The Perl program to fetch the sequences from the Spectrogram Analysis in order to run sequence alignment is executed on the server which is a Linux environment It is also possible to load a Spectrogram Analysis in MIDAS this would be the case when the Spectrogram Analysis is computationally too heavy to handle for the client machine The Spectrogram Analysis can be executed in MATLAB on a high performance platform and loaded into MIDAS Client Machine Philips Server MATLAB Component Runtime PA Perl code Sequences from Spectra K na ZN
13. 37 Java comments on code The clusters hashmap contains information build from the lines and linesVVithClusters Hashmaps For computing the coordinates of the first hierarchy level the method searches for tvvo sequences with the same cluster name The information can be found in the lines hashmaps VVith this the coordinates can be calculated sequence to sequence on first level The cluster name and coordinates are put in the clusters hashmap Computing the coordinates of higher hierarchy levels is more complex A sequence which belongs to several clusters has several cluster names on his line This information is in the linesWithCluster hashmap The line is read from right to left Most of the cases the clusters coordinates from the right are already computed This is because of the bottom up approach the clusters with sequence sequence relation and sequence clusters will occur earlier due to the fact that the format file is sorted In case of a cluster with cluster cluster relation it is expected that the two following cluster names in the line are the nodes belonging to the cluster name with the duplicates filtered This is only NOT the case for the first cluster name from left because this is the cluster of the sequence paired up with the second cluster name on the line Duplicates are inserted because in some cases it is needed to repeat who the parent is of two nodes due to the way the line is read As soon as the cluster coordinates ar
14. 58861245 58861444 Tu HUH chr19 58863723 58863922X HUH chr19 14 57476362 57476561 TCCAGCCTGGG HUM_chrX_ _53600309 53600508 Dutput File Location C 1TempismallPNAs_200_50_ 08 55 1ClustalW Filtered dnd ix Write to File Hostname 161 85 26 136 Login melissay Authenticated YES Figure 10 ClustalW settings panel The figure displays the Write to File button which gives the possibility to write the ClustalW output to a text file After pushing this button a save dialog will pop up figure 11 EX Write To File Save directory Temp Output name clustaluyZinefinger 200 50 OK Cancel Figure 11 Write to File dialog MultiModal Interface for DNA Alignment of Sequences 125 5 Server Connection In order to run BLAT or ClustalW you must be connected to the Philips Linux server The genome database and both BLAT and ClustalW are located on this server 5 1 Connect To connect to the Linux server these steps must be completed 1 Select Connection from the menu bar and choose Connect to Server The login interface will pop up figure 12 2 Change the hostname if necessary 3 Fill in your login and your password and press OK Linux login Login AA Password Figure 12 Connection panel 4 If you have typed everything right you should be connected now 5 2 Disconnect To disconnect from the Linux server only one action has to be performed Select Connection from the menu b
15. Additional options Show BLAT help Show BLAT output Figure 8 Improved task diagram analyzing chromosome Comparing the tasks diagrams of the current situation with the tasks diagrams of the Additional settings improved situation it becomes clear that a lot of tasks are automated or not necessary anymore This increases the efficiency of analyzing DNA 4 2 4 Workflow A workflow diagram Figure 9 is designed during the analysis phase to visualize the operational part of the work procedure Since the task diagrams of the improved situation are simpler and shorter compared to the task diagrams of the current situation the operations in this workflow diagram are separated into manual tasks and automatic tasks to visualize this difference MultiModal Interface for DNA Alignment of Sequences 30 Analyse Discontiguous Chromosome Analyze video Give settings to interface Manually Give input output settings Give parameter settings Give additional settings Display scrollable video Display annotation Display clusters Analyze alignment Select input for BLAT Manually Select input for ClustalW Give settings to interface Give user information Give additional options Give input output settings Give type options
16. All the coordinates are now in the clusters hashmap If the tree for line 1 7 is drawn in Java first all the cluster names are collected of those lines Next the tree is drawn hierarchically Figure 36 shows in step B that the first hierarchy level is drawn first Which are lines 1 2 4 5 in the format file In step C the second hierarchy is drawn line 3 In step D the third hierarchy is drawn And in step E F the fourth and fifth hierarchy respectively are drawn 6 2 MATLAB functions The preliminary work written by Evan Santo and Nevenka Dimitrova was able to create a DNA Spectrogram with the annotation of sequences This work consisted of several MATLAB script files In project MIDAS some of these files are reused and some new MATLAB functions are programmed In Section 6 2 1 the MATLAB functionality is described for running a discontiguous or chromosome Spectrogram Analysis Section 6 2 2 explains the file for creating a spectrogram video Another file for creating partial spectrogram images is described in Section 6 2 3 Section 6 2 4 describes the MATLAB functions used by building a hierarchical cluster tree To get an understanding of the complete preliminary work we refer to the readme file of Spectrogram Analysis by Evan Santo 6 2 1 Running a Spectrogram Analysis To run a Spectrogram Analysis some changes are made to the preliminary work and some new files were created Changes ClusterChromosomeSpectro m Description Prelimin
17. MIDAS starts MultiModal Interface for DNA Alignment of Sequences 113 3 Spectrogram Analysis MIDAS can be used to run a Discontiguous Spectrogram Analysis or a Chromosome Spectrogram Analysis These are two different actions In the first paragraph is explained how to run a Discontiguous Spectrogram Analysis The second paragraph explains the same for a Chromosome Spectrogram Analysis The output is explained in the third paragraph and how to create and run a spectrogram video on this output is explained in the fourth paragraph In the last paragraph is explained how to load an existing project 3 1 Run Discontiguous Spectrogram Analysis These steps will guide you how to run Discontiguous Spectrogram Analysis 1 In the menu bar select Spectrogram Analysis 2 Select Run New Spectrogram Analysis a new spectrogram settings window figure 51 will appear amp Spectrogram Analysis Settings m Discontiguous Spectrogram Analysis Inputfutput Input File Cc small As Fa Annotation Directory Dif5pectrovideoAnnotation AnnotationFoll_Annotation Di scontig ous Output location C Temp Parameter Settings hromosom i Chromosome Windowsize 200 Windowoverlap 50 Normal method Number of Mean 3 Mumber of Std Additional Settings Clustering Annotatation Annotation Classes T transcriptional G genomic E fepiqenemic nnotation Types Hold Ctrl button to select multiple types Figure 1 Discontiguous Sp
18. This in turn enables a bioinformatician to analyze patterns of a group of sequences Visualisations of other clustering or sorting techniques are also possible by using this technique MIDAS is a standalone application which provides an interface around standard sequence alignment tools such as BLAT ClustalW as well as newer alignment tools such as Spectrogram analysis via integrating MATLAB code server connections and data visualizations MultiModal Interface for DNA Alignment of Sequences 111 2 Installation Before you can start working with MIDAS the application must be installed MIDAS contains an installer which installs the program for you These next steps will guide you through the installation process 2 3 Sy Check your Windows settings MIDAS runs optimal on Windows XP with resolution 1280x1024 Run setup MIDASV Z2 exe Click Install Follow the instructions during the installation NOTE NET environment is not needed just press OK JRE 1 5 Java Runtime Environment must be installed If the JRE is already installed on your computer the installer will inform you and skip the JRE installation Indeo5 codec is needed so you have to follow the steps in the installation of this installer You do not have to follow these steps if you have already installed this codec on your pc The output directory of MIDAS is default set to C MIDAS After the installation push done Run MIDASv1 0 jar in C MIDAS
19. error the bioinformaticus will be informed If bioinformaticus pushes cancel button current windows will close Special requirements The bioinformaticus has to have an account on the Philips server with a login name password and the IP address of the server Use Case relationships Includes Spectrogram analysis Login to Philips server Load existing project Actors Bioinformaticus Pre conditions Use case Use Spectrogram Analysis must be completed in another session or the output folder has created on another grid Post conditions Project is loaded and the spectrogram annotation and the cluster tree is visible on a tab in the interface Basic flow 1 Bioinformaticus selects the folder where the files are located 2 Bioinformaticus gives the input file which was used to create the output files 3 Bioinformaticus clicks on the OK button Alternative flows In case of a system crash the system has to be restarted In case of a connection error the bioinformaticus will be informed If bioinformaticus pushes cancel button current windows will close Use Case relationships Includes Spectrogram analysis Run BLAT on own input Actors Bioinformaticus Pre conditions A valid input file with the FASTA fa extension should be present Post conditions The BLAT output is given to the user on a tab in the interface Basic flow 1 Bioinformaticus sets the required and the additional settings MultiModal Interface for DNA Alignmen
20. for cancer care Bioinformatics refers to the creation and advancement of algorithms computational and statistical techniques and theory to solve formal and practical problems inspired from the management and analysis of biological data Different software tools are used by bioinformatics to analyze DNA One of those tools is BLAT an algorithm for sequence alignment searching large databases of protein or DNA sequences Another tool is ClustalW which produces biologically meaningful multiple sequence alignments of divergent sequences Also a new analyzing approach is the Spectrogram Analysis of genomes With this technique frequency domain analysis is done in the genomes using tricolor spectrograms identifying several types of distinct visual patterns characterizing specific DNA regions The Spectrogram Analysis is created and developed at Philips by Evan Santo and Nevenka Dimitrova Biologically meaningful patterns are found through Spectrogram Analysis which cannot be found by sequence alignment due to the computational restrictions When interesting regions and patterns are found sequence alignment may be required in order to analyze it further Currently the sequence alignment is not integrated with the Spectrogram Analysis into one application This makes searching the genomes a time consuming inefficient and expensive process The aim of this project is provide bioinformatics tools that integrate sequence alignment g
21. is not restricted and the output from the Program is covered only if its contents constitute a work based on the Program independent of having been made by running the Program Whether that is true depends on what the Program does 1 You may copy and distribute verbatim copies of the Program s source code as you receive it in any medium provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty keep intact all the notices that refer to this License and to the absence of any warranty and give any other recipients of the Program a copy of this License along with the Program You may charge a fee for the physical act of transferring a copy and you may at your option offer warranty protection in exchange for a fee 2 You may modify your copy or copies of the Program or any portion of it thus forming a work based on the Program and copy and distribute such modifications or work under the terms of Section 1 above provided that you also meet all of these conditions a You must cause the modified files to carry prominent notices stating that you changed the files and the date of any change b You must cause any work that you distribute or publish that in whole or in part contains or is derived from the Program or any part thereof to be licensed as a whole at no charge to all third parties under the terms of this License c If the modified program normally read
22. is not the original so that any problems introduced by others will not reflect on the original authors reputations Finally any free program is threatened constantly by software patents We wish to avoid the danger that redistributors of a free MultiModal Interface for DNA Alignment of Sequences 93 program will individually obtain patent licenses in effect making the program proprietary To prevent this we have made it clear that any patent must be licensed for everyone s free use or not licensed at all The precise terms and conditions for copying distribution and modification Lol Low GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING DISTRIBUTION AND MODIFICATION O This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License The Program below refers to any such program or work and a work based on the Program means either the Program or any derivative work under copyright law that is TO Sayr amp WOrk containing The Program ora portion OL at either verbatim or with modifications and or translated into another language Hereinafter translation is included without limitation in the term modification Each licensee is addressed as you Activities other than copying distribution and modification are not covered by this License they are outside its scope The act of running the Program
23. life time MultiModal Interface for DNA Alignment of Sequences 85 11 Conclusions and recommendations MIDAS has become a tool which combines Spectrogram Analysis with sequence alignment tools By using MIDAS the bioinformatician is able to analyze a spectrogram together with the genomic annotation and the spectral clustering which can be aligned with the alignment tools BLAT or ClustalW MIDAS is a standalone executable which can be used on single desktops This all combined makes analyzing DNA sequences more efficient less time consuming and less expensive MIDAS brings the bioinformatician efficiency in their work because they can align sequences which they find in a spectrogram all together in one application Before MIDAS all these tasks had to be done manually MIDAS erases 2 3 of all the tasks that were involved in the earlier situation Before project MIDAS was completed the bioinformatician needed to run separate tools and at most one at the time MIDAS eliminates these tasks MIDAS makes sure that all the tasks which do not have to be done by the user are done automatically and that user tasks are simplified Analyzing DNA with MIDAS is not a time consuming job Since MIDAS is a deployed standalone executable which runs independently from developers software no expensive licences are needed to use MIDAS The most expensive licence in the process of analyzing DNA usually is the MATLAB licence MIDAS does not need this licensed so
24. m New File Description Symmary Function Algorithm Arguments permValue linkagePath permutationsPath Output Out This function is called by buildClusterInfo m and searches recursively the structure of the hierarchical tree and builds a string array which is passed to buildClusterInfo m This string array contains the nodes and the leaves of one permutation value This function searches all corresponding clusters of one permValue This linkage table contains three colums two which contain the nodes or leaves of the tree one with the distance which is not of interest for us The permValue can be found in the linkage table in one row In the other column of the same row the corresponding cluster can be found This clusterValue refers to the other rowNumber of the linkage table which contains the other corresponding clusters The search for the other corresponding clusters is done recursively until no other clusters can be found For walking through the tree we used a depth first search algorithm The leaves are searched in pre order The value in the permutation table This is the linkage Table created by MATLAB This is the permutation Table created by out contains a string with clusterNames corresponding to one permValue A clusterName is a rowNumber in the linkageTable 6 3 Problems and solutions During this project several problems occurred Problems like Java bugs performance issues and inconsistencies in o
25. on these Euclidean distances a linkage table is constructed in MATLAB The linkage table creates a hierarchical clustered tree Syntax 2 linkage Y z linkageiY method Description 2 linkageiYj creates a hierarchical cluster tree using the Single Linkage algarithm The input Y is a distance vector of length m 1 m 2Lhy 1 where m is the number of objects in the original data set You can generate such a vector with the pdist function Y can also be a more general dissimilarity matrix conforming to the output format of pdist Figure 28 Syntax linkage MATLAB In the linkage table we can find N 1 linked sequences or group of sequences To understand this the basic definition of a linkage should be clear For example Table 1 There are 6 objects 10 8 Table 1 example linkage The first 6 objects are paired up based on the Euclidean distance Two sequences which have the lowest Euclidean distance are considered as a pair thus are more similar The pairs are new groups of objects thus e object 1 and 2 combined make group 7 As follows it is possible to have group 7 paired up with object 3 The linkage function continues to pair up objects with objects objects with groups of objects or groups of objects with groups of objects based on the Euclidean distances till N 1 pairs exist In MIDAS case the objects are sequences the groups are clusters Thus in this table each node and child can be found an
26. pre conditions Checked Console output Checked Spectrogram Video Checked post conditions Problems No problems Cause Solution Result Watch Spectrogram Video is tested and no errors were found Remarks Test Case Overview all present clustering DNA Actions performed Checked pre conditions Checked Console output Checked clustering Checked post conditions Problems No problems Cause Solution Result Overview all present clustering DNA is tested and no errors were found Remarks MultiModal Interface for DNA Alignment of Sequences 103 Test Case Login to Philips server Actions performed Checked pre conditions Selected a sequence for input for BLAT and in the second test for ClustalW Clicked on run BLAT and in the second test on run ClustalW Filled in hostname Filled in login name Filled in password Pushed OK Checked authenticated message Checked post conditions Problems When a wrong input was given to ClustalW an error message was given to the user By clicking on the OK button of the error message login was started Login should not start if a wrong input is given Cause The if condition in the code used OR instead of AND so there was a bug in the interface Solution We repaired the condition Result Problem solved Remarks After solving this problem each part of the system were login was needed was tested and in the end every test resulted without errors T
27. software and you are welcome to redistribute it under certain conditions type show for details The hypothetical commands show w and show should show the appropriate parts of the General Public License Of course the commands you use may be called something other than show w and show they could even be mouse clicks or menu items whatever suits your program You should also get your employer if you work as a programmer or your school ifvany to sign copyright disclaimer Toz the program 21 necessary Here is a sample alter the names Yoyodyne Inc hereby disclaims all copyright interest in the program Gnomovision which makes passes at compilers written by James Hacker soronan rec Of Ty Coons April 4989 Ty Coon President of Vice This General Public License does not permit incorporating your program INCO proprietary programs If your program is a subroutine library you may consider it more useful to permit linking proprietary applications with the library If this is what you want to do use the GNU Library General Public License instead of this License MultiModal Interface for DNA Alignment of Sequences 99 Appendix C Test Cases Test Case Input Chromosome for analysis Actions performed Problems Cause Solution Result Remarks Checked pre conditions Select chromosome analysis Set input file path Set annotation file path Set output pat
28. sorted format Next all the coordinates are computed Because the file is sorted first all the clusters which consist of a sequence sequence relation are found This is the lowest hierarchy of a hierarchical clustered tree Next all the clusters with sequence cluster relations are found And MultiModal Interface for DNA Alignment of Sequences 60 last but not least all clusters with cluster cluster relations are found By sorting the file the hierarchy levels are known and all the coordinates can be computed correctly In order to draw the binary hierarchical tree all the information of a format file is put into hashmaps Java hashmaps are a fast way to retrieve information without having to loop through any lists which is beneficial performance wise Hashmaps created are Haskmaps which are used to store the information we need to draw the clusters m Blusters Hashmap containing the cluster names with corresponding Kl y71 2 92 coordinates These coordinates ut one cluster are as follows aixi y1 H Ze Ye a b lines Hashmap containing String Line number and String Cluster name This hashmap only Contains line numbers and corresponding Cluster names nE the first hierarchy cluster consisting of line line combination linesWithllusters Hashmap containing String line number and String corresponding clusters names This hasbhmap contains the rest ut Ele hierarchy levels A A A A x x Figure
29. startIm gt 2 it just pastes the first image to the last image and you end up with 2 images into one large image Arguments inputPath The full path of the input image including filename and extension outputPath The full path of the output image including filename and extension startIm first image to start with endim last image to end with startim and endim can be the same if there is only 1 image startPos start position indexed in the large image endPos end position indexed in the large image 6 2 4 Visualize the hierarchical clustered tree To visualize to hierarchical clustered tree in MIDAS two MATLAB script files are developed to build a text file with a special format in order to pass the cluster information to the Java draw method which draws this tree BuildClusterInfo m New File MultiModal Interface for DNA Alignment of Sequences 67 Description Symmary Arguments linkagePath permutationsPath outputPath maxRecursionCounter It takes each value of the permutations table and calls findClustersCorresponding2Perm in a loop When this function has completed the outputfile clusterNamesTree txt is written and ready to be drawn in MIDAS in JAVA Path of the linkage table Path of the permutations table Path where the files should be outputted Maximum number of recursions for more information please read the comments on that part in findClustersCorresponding2Perm m FindClustersCorresponding2Perm
30. used Using these tools separately is a time consuming inefficient and expensive process MIDAS is a tool that integrates sequence alignment genome annotation and spectral clustering and alignment under the same application The challenge in this project is in representing the knowledge and analyzing the genome data The DNA data is first transformed into Fourier domain and clustered in MATLAB based on Euclidean distances between the sequences Our tool allows visualizing the DNA spectra together with a hierarchical tree in a multimodal interface This in turn enables a bioinformatician to analyze patterns of a group of sequences MIDAS is a standalone application which provides an interface around standard sequence alignment tools such as BLAT ClustalW as well as newer alignment tools such as Spectrogram analysis via integrating MATLAB code server connections and data visualizations JAVA is used as the main programming language during the development of MIDAS The Spectrogram Analysis script files in MATLAB are converted into JAVA classes These classes are used to run standalone MATLAB applications from within JAVA Table of Contents i INTRODUCTION sacana 9 2 PROJECT DEFINITION scene 11 Zk Mi 11 2 2 BACKGROUND ti 11 2 3 PROJECT DESCRIPTION na AS 11 2 4 PROJECT 12 2 5 PROJECT PLAN u A AAA AAA A AAA 12 3 REQUIREMENT S iii ds 15 2 1 CURRENT SITUATION o USULU 15
31. user friendly interface and web possibilities are considered This leads to the issue What is the programming language of the Graphical User Interface for integrating spectrogram Analysis BLAT and ClustalW application The options for programming language are e MATLAB e C e Java CF MATLAB could be considered as the easiest option because the Spectrogram Analysis is written in MATLAB MATLAB has toolbox named Guide which provides Graphical User Interface building Disadvantage is that the toolbox is limited in its features Also previous work experience with this toolbox showed that action handlers in MATLAB are not easy to work with As MIDAS is a comprehensive system which needs a user interface that can handle complex action handlers using MATLAB for GUIs is considered as a bad option This narrows the options to C Java and C This projects programmers background lies in object oriented programming All the options support object oriented programming Both Java and C are object oriented languages and as C is a statically typed free form multi paradigm language it also supports object oriented programming Considering the time scope of this project and the programmer s background C is not the best option Although C is almost unlimited in programming possibilities it is a powerful but complex language Learning C and using all its resources would consume too much time in this project There are many
32. watch SpectroVideo lt lt Includes gt gt Overview all V present clustering DNA 5 Biolnformaticus lt lt Includes gt gt Obtain alignment of DA Use BLAT single sequence to genome Login to Philips Server WR lt lt Includes gt gt lt lt Includes gt gt Obtain alignment of Multiple Sequences lt lt Includes gt gt Figure 3 Use Case Diagram MultiModal Interface for DNA Alignment of Sequences 18 Use Cases Input Chromosome for analysis Actors Pre conditions Post conditions Basic flow Alternative flows Special requirements Bioinformaticus Bioinformaticus started MIDAS and must have valid Chromosome input DNA sequence in FASTA format After loading the Chromosome input the system has read this input and additional settings and is ready to run Spectrogram Analysis 1 Bioinformaticus selects Run new Spectrogram Settings 2 Bioinformaticus selects Chromosome Analysis 3 Bioinformaticus inputs valid Chromosome FA file 4 Bioinformaticus sets the required settings 5 Bioinformaticus pushes the OK button in order to run Spectrogram Analysis If the input DNA sequence does not have the FASTA extension or other required settings are not set correctly an error is given to the bioinformaticus and the bioinformaticus has the possibility to correct his input for the system If the bioinformaticus pushes cancel button current windows will c
33. 3 2 SYSTEM DESCRIPTION us 15 322k FUNCTIONAL REQUIREMENTS E d d b 15 3 2 2 NON FUNCTIONAL 16 25 CONS RAINI b baa ab 17 3 24 SYSTEM DE O 18 ANALYSIS AA A Sites 25 4 1 PRELIMINARY WORK SPECTROGRAM ANALYSIS 25 42 SER ANALYSIS ee a 25 7 USER PROFILE dias 26 A22 CURRENT SITUATION illes 26 42 3 IMPROVED SITUATION seen 29 4 24 NV ORKELOW m m mm bal 30 3 DESIGN ad da Aa anma n b 33 3 1 ARCHITECTURAL DESIGN a 33 5 1 1 GENERAL PRIORITIES DESIGN 15 2222 33 SAGA OUTLINE OF THE DESIGN iuern enere nase 33 9 1 MAJOR DESIGN ISSUES iio 34 5 2 TECHNICAL DESIGN AND 5 2 37 95 O EE a ada ee 0 37 5 2 2 ENVIRONMENT AND COMPONENT 5 39 3 3 JAVA DESIGN A RK 41 9 CLASS DIAGRAM ee 41 3 322 DETAILED CLASS DIAGRAMA 43 3 33 DEQUENCE DIAGRAMA 48 DA INTERFACE DES GN 52 MultiModal Interface for DNA Alignment of Sequences 7 6 IMPLEMENTATION cion 55 6 1 VISUALIZING THE HIERARCHICAL CLUSTERED 6 2 55 Geld SURVEY ap 55 6 1 2 HIERARCHICAL CLUSTERED TREE VISUALIZATION 5 5 56 0 2 MATLAB FUNCTIONS nase R Y b s 62 6 2 1 RUNNING A SPECTROGRAM ANA
34. 5461485 11614843 132449811 121179930 12118941 132449811 47387158 48298411 132449811 112569887 11285712 132449811 130800069 13122487 132449811 31832715 32283003 132449811 54992269 55289096 132449811 93138547 93405230 132449811 47355770 48358381 132449811 120952490 12143767 132449811 108416689 10887202 132449811 51633889 52161574 367678 260337 84730 73685 595 291001 686821 9335 411069 287132 424624 456110 296646 266504 502429 485004 455153 527510 1 ek eb e tei e e t el m mi mi Pe me mill m l ml F RO0000000050000005000500s0 ABAQOQOQOQOOQOQOQQOQGOQOQDQGQQGOQG Ni MN 63 9 YON GON mu an m n le e e eje le le Dutput File Location C Temp smallPNAs 200_50_ 11 52 BLAT testpretty out gt HUM chr1Z 53016917 53017116 0 200 of 200 12 53016916453017116 of 132449811 ACAAAAATTAGCTCGGGCATGGGGGTGTGTGCCTGTAGTCCCAGCTACCCGGGAGGCTGAG EEOC UT acassssttagcetgggeatgggggtgtgtegectgtagtcccagcetacccegggaggetgag HEU UATE OOOO HU gceacasagastcacttgastccgggaggtggaggttgtagtgagceggagtttgetccactg CACTCCACCCTGG TGACA C ACACACACT TC CARAARARARARARARARARARARARA IRR TU cactccagcectgggt gacagagagagactgtcgcrassssssssssasss
35. AACAC WERKE bb kii Far ok HUM_chrl9 14 58861245 58861444 HUH chr19 58863723 58863922 HUM_chrl9_ _57476362 57476561 53600309 53600508 GGCGAAACCCCGCCTCTACTCAAAATACAAAAATTAGCCAGGCTIGGTGA Ke KEKE Seren HUM chr19 4 58861245 58861444 HUM_chrl9 4 58863723 58863922 CAGGCGCCTGTAATCCCAGTTTCTCAGGAGGCTGAGGCAGGAGAATTGCT HUM_chrl9 4 57476362 57476561 TGCGTGCCTGTAATGCTAGCTGCTCGGGAGGCTGAGGCAGGAGAATTACT HUM_chrX_ _53600309 53600508 FE RE KH Kar rar RE HUM 19 58861245 58861444 HUM_chrl9 58863723 58863922 HUH chr19 57476362 57476561 a A OFT III munun uu HUM_chrl9
36. ASSpectroAnnotlmagesPanel 1 1 MIDASSequenceAlignmentPanel 1 1 AAA pola Figure 16 MIDAS package GeneralSettings GeneralSettings is an object which can be given to all the classes in the MIDAS system which contains all the important information that is needed to run parts of the system These settings are for example output directories input values for sequence alignment location of files etc MainApplication MainApplication initializes MIDAS MIDAS MIDAS is main frame in which all sub frames are initialized The main control is in this class This class also initializes MATLAB converted code for spectrogram video MIDAS contains two nested classes for layout TabCloselcon for close button tabs and closing the tabs BackgroundPanel for the background image on a panel MIDAS is restricted to run one process either sequence alignment or Spectrogram Analysis at the time for consistency and performance This is handled by disabling the MIDAS interface for the user during a running process MIDASSequenceAlignmentPanel MIDASSequenceAlignmentPanel is the panel for the output of sequence alignment This loads BLAT output or ClustalW output Depending on the mode user input mode 1 or from a Spectrogram Analysis mode 0 it loads the correct output This panel contains a MATLAB converted function the original file is CreatePartiallmage m MIDASSpectroAnnotImagesPanel MultiModal Interface for DNA Alignment of Sequences 43 MIDASSpect
37. DNA Alignment of Sequences 78 Linux login Hostname 161 85 26 156 Login Password Figure 46 Linux login The output of BLAT is presented in Figure 47 It is possible to write the output to a file thus save the output on the desktop pc Figure 48 Spectrogram Analysis Sequence Alignment Connection Help x ClustalW Output Wed Jun 27 12 00 51 IST 2007 i 27 11 58 22 IST 2007 xj Spectra smal RNAs 200 50 111 521 Sequence Aligment Outputs Output File Location C Temp smallRNAs 200 50 11 52 BLAT test psl psLayout version 3 match mis rep Q gap Q gap T gap T gap block blockSizes match HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 HUM chr12 53016917 53017116 424624 HUM chr12 53016917 53017116 456110 HUM chr12 53016917 53017116 296646 HUM chr12 53016917 53017116 266504 HUM chr12 53016917 53017116 502429 HUM chr12 53016917 53017116 435004 HUM chr12 53016917 53017116 455153 HUM chr12 53016917 53017116 527510 HUM chr12 53016917 53017116 132449811 53016916 53017116 132449811 118736654 11910450 132449811 98863996 99124514 132449811 120132372 12021728 132449811 7911551 7990419 3 132449811 601087 601862 3
38. H m H A Zinc finger A20 dom Zinc finger Zinc finger A Zinc finger domain co 1n86 O domain co nc finger Zinc finger 1 domain co In8 0 11 zinc f Zinc finger Zinc finger AZ do O domain co Repeal O domain co Ind zinc fin 20 domain co In8 0 43 Zinc fin 46 Zinc fin 47 zinc fi l l Hostname NONE Login NONE Authenticated NO Figure 41 output chromosome spectrA MIDAS Spectrogram Analysis Sequence Alignment Connection Help Spectra zincfinger_200_50_ 19 18 xi Spectra smallRNAs_200_50_ 08 55 HUM chr19 19 hr 14 100 HUM chr1 207 HUM chrX 15 HUM chr14 100601190 HUM chr14 100601190 HUM chr 14 100 HUM chr14 1006011 HUM chr14 100601188 HUM chr2 j HUH chr19 5 HUH chr19 5 hr14 100601640 r14 1006016 hri4 1006016 HUM chr HUM chri9 HUM chr19 HUH chr19 5 25 HUH chr14 100565974 HUH chr14 1005S HUM chr17 254 HUH HUH chr5 15 HUM chr9 HUM chr14 5 hr14 100 6 hr14 10 45 chr14 100595645 HUM chr 14 100 HUM chr 14 100 HUM chr 14 100 HUH chr10 2 H
39. IGUOUS SPECTROGRAM ANALYSIS zunnnnnnnnnnnnnnunnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn 115 3 2 RUN CHROMOSOME SPECTROGRAM ANALYSIS 116 3 3 OUTPUT PANEL FROM SPECTROGRAM ANALYSIS 118 3 4 RUN SPECTROGRAM VIDEO sica 119 3 5 LOAD EXISTING SPECTROGRAM ANALYSIS PROJECT 120 TLRUNBLAT uu s aa ee L 121 4 2 RUN CLUSTALVVE z daaa a ada Rada Ras acia 123 5 SERVER CONNECTION en s au aad aad 127 5 1 CONNEET a n ll lad n 2 ni 127 2 2 DISCONNECT ada a occ 127 1 System description Major research efforts in Bioinformatics include sequence alignment gene finding genome assembly protein structure alignment protein structure prediction prediction of gene expression and protein protein interactions and the modeling of evolution To perform these specific tasks different tools are used Using these tools separately is a time consuming inefficient and expensive process MIDAS is a tool that integrates sequence alignment genome annotation and spectral clustering and alignment under the same application The DNA data is first transformed into Fourier domain and clustered in MATLAB based on Euclidean distances between the sequences This tool allows visualizing the DNA spectra together with a hierarchical tree in a multimodal interface
40. LYSIS ria 62 0 2 2 CREATING A SPECTROGRAM VIDEO cortaba tl 66 62 2 RUNNING BLA TOR CLUSTAL a ala tal lilas 67 6 24 VISUALIZE THE HIERARCHICAL CLUSTERED TREE outra unos dea 67 0 3 PROBLEMS AND 50 1 5 68 7 TESTING AND 71 71 1 TESTING A 71 7 2 USABILITY TEST nasse anni RR en 71 1 221 OBSERVED USABILITY TES TAI AAA 7 52 2 UNOBSERVED USABILIT TES a nee 72 7 3 MIDAS CONSTRAINTS L nase een 73 MIDAS RN 75 9 83 10 PROJECT REFLECTION 85 11 CONCLUSIONS AND RECOMMENDATIONS 87 12 89 APPENDIX A LICENSE GANYMED SSH 2 A AAA AA 91 APPENDIX B LICENSE FAT JAR 93 APPENDIX TEST CASES L l adad Dayana albay bal lm sabi 101 APPENDIX D USER 107 MultiModal Tnterface for DNA Alignment of Sequences 8 1 Introduction New developments in the medical world are rising fast As Philips invests in these new developments they have a high market share in the medical department A lot of research projects and concepts developed at Philips will contribute to the future of Medical Systems One of those projects is Women s Health Cross model Bioinformatics
41. MIDAS does not detects errors concerning content MultiModal Interface for DNA Alignment of Sequences 124 amp MIDAS Spectrogram Analysis Sequence Alignment Connection Help x ClustalW Output Tue Jun 19 17 20 03 IST 2007 x Spectr smallRMAs 200 50 03 55 ClustalW Multiple Sequence Aligment Partial Image Multiple Sequence Aligment Input Annotations 411 HUMN chr19 57476362 cals HUM chri9 58861245 413 HUH chr19 56861245 414 HUM chr19 58863723 415 _ HUN chr19 58663723 416 HUM chrxX 53 600309 F Multiple Sequence Aligment Outputs Dutput File Location A C Temp smallRNAs 200 50 08 55 ClustalW Filtered aln m CLUSTAL W 1 83 multiple sequence alignment HUM_chrl9_ _58861245 58861444 29 HUM chr19 58863723 58863922 9 GCTGA HUM_chrl9_ _57476362 57476561 6 HUM_chrX_ _53600309 53600508 GCCAGGCGCAGTGGTTCATGCCTGTAATCCCAGCATTTTTGGGAGGCCGA HUM_chrl9 4 58861245 58861444 TGGGCGGATCA CGAGGTCAGGAGTTCGAGACCA GCCTGGCCAATAT HUH chr19 4 58863723 58863922 GGTGGGCGGATCA TGAGGTCGGGAGTTCGAGACCAGCCTGGCCAATAT HUM chr19 4 57476362 57476561 HUM_chrX_ _53600309 53600508 GGCGGGTGGATCACCTGAGGTCAGGAGTTCGAGACCAACCTGGCC
42. MULDAS vL O Multimodal Interface for DNA Al enment of Sequences Bachelor Project IN3700 2007 Melissa Cheung 1228161 Software Technology Computer Science Paul van den Haak 1221760 Media and Knowledge Technology Computer Science Philips Innovation Campus Bangalore India University of Technology Delft Faculty EWI The Netherlands Philips supervisor Dr N Dimitrova 1 U De ft TU Delft supervisor Dr L J M Rothkrantz BSc coordinator Ir M Sepers Technische Universiteit Delft PHILIPS Preface This is the report of the Bachelor Project of two students of Computer Science at the University of Technology Delft in The Netherlands It is required that all Computer Science students participate in an internship at a company to finalize the Bachelor phase During this internship software engineering techniques are used to develop a prototype For this bachelor project we participated in a three months during internship at Philips Innovation Campus in Bangalore India This report describes the software development process of the prototype we developed at Philips We have developed a prototype that allows Bioinformatics to analyze DNA by alignment and Spectrogram k kil The target groups of this report are softvvare engineers computer scientists and bioinformatics The readers of this report should have the basic softvvare engineer knovvledge as this report describes the softvvare development process The inte
43. NTIES INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT INDIRECT INCIDENTAL SPECIAL EXEMPLARY OR CONSEQUENTIAL DAMAGES INCLUDING BUT NOT LIMITED TO PROCUREMENT OF SUBSTITUTE GOODS SERVICES OS DE USE DATA ORY PROF LIS OR BUSINESS INTERRUPTION HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY WHETHER IN CONTRACT STRICT LIABILITY OR TORT INCLUDING NEGLIGENCE OR OTHERWISE ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE The Java implementations of the AES Blowfish and 3DES ciphers have been taken and slightly modified from the cryptography package released by The Legion Of The Bouncy Castle Their license states the following Copyright c 2000 2004 The Legion Of The Bouncy Castle http www bouncycastle org Permission is hereby granted free of charge to any person obtaining a Of this Software and associated documentation tiles Ethe Sottware y to deal in the Software without restriction including without limitation the rights MultiModal Interface for DNA Alignment of Sequences 91 to se copy modify Meroe publish distribute s blicense and or sell copies of the Software and to permit persons to whom the Software is furnished to do so subject to the following co
44. Run multiple sequence alignment ClustalW on users input e Write output of BLAT to a file e Write output of ClustalW to file e Connect to Philips Server e Disconnect from Philips Server e View help file of BLAT e View help file of ClustalW e View help file of MIDAS e View stack trace of processes of MIDAS MultiModal Interface for DNA Alignment of Sequences 81 9 Future development A lot of new ideas improvements and extensions came up during the evaluation phase of this project Due to time it was not possible to implement all these feedback Since MIDAS 1 0 is the start of a new way to analyze DNA there are a lot of possibilities for MIDAS to grow To understand these possibilities and their priority it is wise to summarize these in a MoSCoW analysis which separates the Must haves from the Should haves and the Could haves from the Wishful thinking requirements Table 2 displays the MoSCoW analysis for MIDAS 2 0 equirements Spectrogram Analysis oad input Load a FA file as input for the Spectrogram Analysis Obtain output of Spectrogram Analysis the spectrogram video and the Obtain an overview of all the present clustering in the analyzed sequence Should Have Da Could Have we Wishful thinking Wisualize Clustering e e ele must Have annotation Run the spectrogram analysis with full annotation Run the spectrogram analysis without full annotation Run 5 4 with annotation Run S A without annotation Co
45. Spectrogram Analysis Project This project is restricted that it should have the same format and files as it would have in a MIDAS Spectrogram Analysis output directory Load Spectrogram Analysis Project 1 In the menu bar select Spectrogram Analysis select Load Existing Project The load window will pop up figure 5 2 Give the correct input arguments a Load directory Press Load to select a directory b Input File Press Load to select an input file c Spectrogram Type Select correct type NOTE very important is to give correct input arguments as MIDAS does not detect errors concerning content Load Spectrogram Analysis Seles The directory to be loaded is restricted to several constraints Press help For more Input file is the corresponding input FASTA File of the directory Load directory C Tempizincfinger 10 5 14 19 Input File Dr fSpectral Clusteringzincfinger Fa Spectrogram Type 7 Chromosome Figure 5 Load Spectrogram Analysis panel MultiModal Interface for DNA Alignment of Sequences 120 4 sequence alignment NOTE This option is only available for a Discontiguous Spectrogram Analysis To perform a sequence alignment by running BLAT or ClustalW you must have a current project on an active tab in MIDAS This can either be the Spectrogram Analysis you are currently working on or an existing project which you have to load first On this tab in MIDAS you can run BLAT or ClustalW by following t
46. UM chr12 12960280 HUM chr11 43559020 Sequence Alignment Pre ES Lill A Hostname NONE Login NONE Authenticated NO Figure 42 output discontiguous SpectrA MultiModal Interface for DNA Alignment of Sequences 77 Spectrogram Analysis Sequence Alignment Connection Hel TETAS AS gt m m 2000 3060 On UD 0 1 n On 00 won ob bb ob ob op ob Ob bob Pop b a ab ab ab H BLAT Clustalw Annotation no 1 ClustalW Enter line number Ann i 421 lt gt Image 9 mm non ation HUM chr12 53016917 4 Run BLAT Run Clustalw ol To 437 Hostname NONE Login NONE Authenticated NO Figure 43 output discontiguous SpectrA It is possible to perform sequence alignment on the output of the discontiguous Spectrogram Analysis Figure 43 For sequence alignment in BLAT select one annotation in the annotation overview The selection for BLAT is displayed in the lower left corner of the panel For the selected annotation the BLAT settings panel is loaded Figure 44 All the settings which can be set for BLAT can be set in this panel For multiple sequence alignment an input range can be given to run ClustalW For the selected input range the ClustalW panel is loaded Figu
47. Y APPLICABLE LAW EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND OR OTHER PARTIES PROVIDE THE PROGRAM AS IS WITHOUT WARRANTY OF ANY KIND EITHER EXPRESSED OR IMPLIED INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF MultiModal Interface for DNA Alignment of Sequences 97 MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU SHOULD THE PROGRAM PROVE DEFECTIVE YOU ASSUME THE COST OF ALL NECESSARY SERVICING REPAIR OR CORRECTION 12 IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER OR ANY OTHER PARTY WHO MAY MODIFY AND OR REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE BE LIABLE TO YOU FOR DAMAGES INCLUDING ANY GENERAL SPECIAL INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE PROGRAM INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES END OF TERMS AND CONDITIONS How to Apply These Terms to Your New Programs If you develop a new program and you want it to be of the greatest possible use to the public the best way to achieve this is to make it free software which everyone can redistribute and change under these terms To do so att
48. ach the following notices to the program It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty and each file should have at least the copyright line and a pointer to where the full notice is found lt one line to give the program s name and a brief idea of what it does gt Copyright C lt year gt lt name of author This program is free software you can redistribute it and or modify 1t under the terms of the GNU General Public License as published by the Free Software Foundation either version 2 of the License or at your option any later version This program is distributed in the hope that it will be useful but WITHOUT ANY WARRANTY without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE See the GNU General Public License for more details You should have received a copy of the GNU General Public License along with this program if not write to the Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1307 USA MultiModal Interface for DNA Alignment of Sequences 98 Also add information on how to contact you by electronic and paper mail If the program is interactive make it output a short notice like this when it starts in an interactive mode Gnomovision version 69 Copyright C year name of author Gnomovision comes with ABSOLUTELY NO WARRANTY for details type show v This is free
49. al computations of MATLAB as well as the sequence alignment tools with the Graphical User Interface in Java There are 2 possible options Option 1 Figure 11 MATLAB Matlab Compiler gt Stand alone executable Graphics Mathematical computations uz Create connection to linux server e B LA ee _ gt JA Sequence alignment Main application Interface T Create connection to linux server ClustalW Multiple Sequence Alignment Tool Figure 11 System overview In option 1 the application will be run as a stand alone executable MATLAB program Thus all the source code has to be integrated in MATLAB MATLAB contains a Java Virtual machine JVM and Java Metadata Interface JMI which will run the java classes The GUI is created in Java and will make a connection to a Linux server where BLAT and ClustalW are installed The output of BLAT and ClustalW will be displayed in the GUI MATLAB will produce the computations and images of the Spectrogram Analysis and visualize it in the GUI MultiModal Interface for DNA Alignment of Sequences 35 Option 2 Figure 12 System Overview Stand alone executable The Language of Technical Computing Executable JAR file Graphics Mathematical computations dy SS d 6 JAVA MATLABO Builder for Java Main application Interface
50. alization of the hierarchical clustered tree are also in this package Spectrogram Analysis This package contains the GUIs for input of the Spectrogram Analysis and the call to run Spectrogram Analysis The converted MATLAB functions to Java methods for Spectrogram Analysis are also in this package sequence alignment This package contains the client server classes to connect to the Philips Server Also the GUIs and the execution commands for BLAT and ClustalW are in this package Tools This package contains tools which can be used in all the classes The console for the systems processes is one tool Another tool is the file chooser a directory file browser for selecting files and directories Another tool is to save a text file of some output on your computer MultiModal Interface for DNA Alignment of Sequences 41 Cv SOON JO qua Y VNQ 10 yu ulublly u nb s WeJDeIg SSe ST embly s e lidnq il4AMlesnio s lqen xzH As INDMIEISNIZ 93PL193U 5 n9ul5o7 oeH lu 4 Sf 9a4L49 SN D jaueqsoiydes9aeas 1a sn D uoeo ddyule A a0ej aju s 5 3 2 Detailed class diagram The description of all the classes is given in this section MIDAS User Interface MainApplication ClusterTreeGraphicsPanel ClusterTree AAA AAA E 1 HE 1 1 1 1 User Interface MidasGUI 1 MID
51. annotClass Classes of features to annotate annotType Tyes of features to annotate opDelim Delimiting operator which is operating system specific sequenceName The name of the input file imageScale Scalar for scaling the image in createSpectrolmages m numWin number of windows on one image in each frame in MIDAS windowHeigth scalar to stretch the spectrogram vertically CreateDiscontiguousClusteredSpectrogram m Description Preliminary summary This function returns a matrix of the spectrogram for the DNA sequences given This program is designed to operate for MultiModal Interface for DNA Alignment of Sequences 65 Changes multiple smaller sequences after being read in from a FASTA file formatted as specified in ClusterDiscontiguousSpectrogram m There are many differences between the CreateChromosomeClusteredSpectrogram m and this function many having to do with the way annotation is obtained for each sequence winSize width of the Short Time Fourier Transform STFT window overlap the overlap between two consecutive STFT windows normalMethod method used for the normalization of the pixel values of the color spectrogram numOfMean parameters for normalization numOfStd The matrix is first clustered in STFT space This clustering is then used to reference the RGB image so that the rows windows can be arranged in the correct order The outputPath is given as an input argument to this file and describes
52. ar and choose Disconnect from Server You should be disconnected in a few seconds MultiModal Interface for DNA Alignment of Sequences 127
53. ary General Public License instead You can apply it to vour programs 700 When we speak of free software we are referring to freedom not price Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software and charge for this service if you wish that you receive source code or can get it if you want it that you can change the software or use pieces of it in new free programs and that you know you can do these things To protect your rights we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights These restrictions translate to certain responsibilities for you if you distribute copies of the software or if you modify it For example if you distribute copies of such a program whether gratis or for a fee you must give the recipients all the rights that you have You must make sure that they too receive or can get the source code And you must show them these terms so they know their vignes We protect your rights with two steps 1 copyright the software and 2 offer you this license which gives you legal permission to copy distribute and or modify the software Also for each author s protection and ours we want to make certain that everyone understands that there is no warranty for this free software If the software is modified by someone else and passed on we want its recipients to know that what they have
54. ary summary This program reads the DNA sequence from a FASTA file and generates a clustered RGB image of the spectrogram output This program calls createChromsomeClusteredSpectrogram m These two programs deal exclusively with large contiguous sequences i e whole chromosomes The resulting image is saved to the hard disk so it can later be processed by createClusteredSpectroVideo m This processing will turn the large image into an annotated movie according to the parameters in the createClusteredSpectroVideo m file Author Evan Santo 6 5 2006 BMI Changes This code is adjusted so that it can be called from within JAVA CreateSpectroImages m is called in this function to create the partial images for the scrollable spectrogram in MIDAS This m file is rewritten as a function so that it can be called from within Java MultiModal Interface for DNA Alignment of Sequences 62 The function call to createChromosomeClusteredSpectrogram m is rewritten so that it can take arguments and give back also the freqDims FreqDims is needed as an argument in CreateSpectrolmages m and the function call to createChromosomeClusteredSpectrogram needed the outputPath as an argument Every output is placed in the outputFolder which MIDAS specifies Some new arguments are added For a complete view of the arguments a list is placed below Note For more information about this code check the original source code of Evan Santo In these source files the f
55. ation A C Temp smallRNAs 200 _50_ 08 55 Clustalllf Filtered aln F CLUSTAL W 1 83 multiple sequence alignment HUM chr19 4 58861245 5886144 HUM chr19 58863723 58863922X 6 GCTGA HUM ehr19 4 57476362 5747686l 7 HUM ehrX 4 53600309 53600508 GCCAGGCGCAGTGGTTCATGCCTGTAATCCCAGCATTTTTGGGAGGCCGA HUM_chrl9 4 58861245 58861444 HUM_chrl9 4 58863723 58863922 HUM_chr19_ _57476362 57476561 HUM_chrX_ _53600309 53600508 GGCGGGTGGATCACCTGAGGTCAGGAGTTCGAGACCAACCTGGCCAACAC EE PEEP EE FSEFERER SECTS TETE x HUM chr19 4 58861245 58861444 GGTGAAACCCTGTCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGG HUM_chrl9_ _58863723 S8863922 GGTGAAACCCTGTCTCTACTAAAAATACAAAAATTAGCCAGGTGTGGTGG HUM_chrl9_4 _57476362 57476561 GGTGAAACCTTGTCTCTACTGCAAAATACAAAAATTAGCCAGGTATGGTCGG HUM_chrX_ _53600309 53600508 GGCGAAACCCCGCCTCTACTCAAAATACAAAAATTAGCCAGGCTTGGTGA tttttt ttttttt HUM_chrl9 4 58861245 58861444 CAGGCGCCTGTAATCCCAGCTTCTCAGGAGGCTGAGGCAGGAGAATIGCT HUM_
56. bject casting In this chapter the problems are presented and the solutions Java look and feel inconsistencies MultiModal Interface for DNA Alignment of Sequences 68 MIDAS has a Windows look and feel except for the browse dialogs This is due to the inconsistencies with the jFileChooserDialogs in Java 1 5 There are a number of known jFileChooserDialogs bugs at Java If the look and feel for the browse dialogs are set on the Windows look and feel the Java Virtual Machine crashes on a problematic frame The solution is to set the look and feel for the browse dialogs on Java look and feel when they are initialized and set the look and feel back on Windows if the browse dialog is closed MATLAB object casting MATLAB computes with variables without cast typing the variables are treated as double 0 00 values When the MATLAB scripts are converted to Java these variables are treated as Integer 0 objects In the Spectrogram Analysis scripts a lot of computation and checks are performed on these double variables Only in the script files the objects are not casted to double Variable casting in MATLAB is needed for Java Computations and checks might work in MATLAB but not in Java if some are treated as Integers instead of doubles The solution is to cast every object in MATLAB for consistency in Java Java Hotspot Virtual Machine crashes If a user runs a method on the MATLAB Component Runtime Machine and he gives incorrect inputs concer
57. bserved usability test other participants worked with MIDAS unobserved on their own desktop Different results were obtained from the unobserved test The feedback in this test was that MIDAS did not create the output in the way they were used to receive the output There was a need for better filenames and folder names From the name of the file and the folder the user must be able to see what kind of DNA is used to create this output and what settings were set In general this test resulted positive and the user was able to give detailed feedback and new ideas There were also some new bugs found in this test which were not found during earlier test sessions MultiModal Interface for DNA Alignment of Sequences 72 7 3 MIDAS Constraints As MIDAS is in his first stage there are still some constraints for the system Most of these constraints can be solved by an improved solution Some constraints require a workaround The constraints of MIDAS are summarized here below A workaround or improved solution is described for each constraint The MCR MATLAB Component Runtime runs on his own thread The Java Virtual Machine is not able to interrupt this thread Thus premature termination of MCR MATLAB Component Runtime is not possible Workaround Manually kill the process Locating and fetching files in MIDAS is based on predetermined hard coded names Inconsistency may occur when filenames or files in one directory that MIDAS uses are manip
58. ce for DNA Alignment of Sequences 123 5 Give the correct input arguments a ClustalW location location of ClustalW on the server There is a possibility to Edit the location b Query the selected query input There is a possibility to Show the query in Notepad Output Directory shows the output directory GapOpen number required for gap open penalty Bootstrap number of bootstraps Additional settings give command as in a command shell see ClustalW help for more ClustalW input example ClustalW aja x Input output mono ClustalW location dpyclustalw1 83 linux clustaly1 83 linux Query finger 200 50 11 0 51 Clustalu Fore Fa Output Directory npizincfinger_200_50_ 17 05 fClustalw Type options GapOpen Bootstrap default Additional Options The additonal options of Cluster are set here Please refer to fhe CiustalWk helo Ale for these options Enter your ad d options in fie text area below Enter the argument corresponding to the argument in the Giusta helo file Seperate Pulse arguments Di Spaces Additional Settings lustalyy Help Figure 9 ClustalW settings panel 6 Press OK button ClustalW will run Note The file transfer between the Server and local machine will cause MIDAS to freeze some time Please be patient 7 The output panel will be loaded with the output files figure 10 Note if some mismatch occurred with the input arguments the output panel can be empty as
59. chrl9 4 58863723 58863922 HUM ehr19 4 57476362 57476561 HUM_chrX_ _53600309 53600508 CACTCGCCTGTGGTCCCAGCTACTCAGGAGGCTGAGGCAGGAGAATIGCT Me E AE CASSETTES eee ee HUM_chrl9 4 58861245 58861444 TGAACCCAGGAGGCGGAGGCTGCAGTGAGCTGATGATCGCTCCATGGCAC HUM_chrl9_ _58863723 58863922 HUM ehr19 4 57476362 57476561 HUM_chrX_ _53600309 53600508 6 4 HUM_chrl9 58861245 58861444 TCCA chr19 58863723 58863922X HUM ehr19 4 57476362 57476561 TCCAGCCTGGG HUM_chrX_ _53600309 53600508 9 Output File Location C Temp smallRNAs 200 50 08 55 ClustalW Filtered dnd Write to File Hostname 161 85 26 136 Login melissay Authenticated YES Figure 49 ClustalW output It is also possible to load an external Spectrogram Analysis project into MIDAS for visualization Figure 50 The project to be loaded has a constraint that all the needed files to load visualization in MIDAS are present see MIDAS constraints Load Spectrogram Analysis The directory to be loade
60. contiguous chromosome and Figure 6 displays the task diagram for analyzing a single chromosome MultiModal Interface for DNA Alignment of Sequences 26 Analyze Discontiguous Chr Analyze video annotatfoh Prepare input x Search for input file by hand Search for ann dir by hand Start Matlab 7 Set work path by hand ClusterDiscontiguousChr Set add settings by hand Clustering flag Annotation flag Annotation classes Annotation type Set parameters by handr Input file Annotation directory Window size Window overlap Normal method Number of mean Number of Std Operator delimitor Run cluster m file by Call function on co
61. d tree would look like The sequence alignment outputs are displayed on a separate output screen Each new output screen whether for Spectrogram Analysis or sequence alignment is put on a new tab in MIDAS The bioinformatics can analyze the biological data in one application just by switching between tabs MultiModal Interface for DNA Alignment of Sequences 53 6 Implementation In this chapter the implementation phase is discussed Visualizing the hierarchical clustered tree is one of the innovations of MIDAS The implementation and used algorithms are presented Section 6 1 In Section 6 2 the rewritten and new MATLAB functions can be found The last Section 6 3 discusses the problems during this implementation phase and how they are solved 6 1 Visualizing the hierarchical clustered tree One of the most important requirements of this project is to visualize all the present clusters in a spectrogram for pattern analysis In order to design and implement the visualization of the hierarchical clustering in MIDAS it is needed to investigate how the hierarchical clustering is constructed in the Spectrogram Analysis in MATLAB This is explained in the survey 6 1 1 As follows the implementation of the visualization of the hierarchical clustered tree is explained step by step 6 1 2 6 1 1 Survey In the Spectrogram Analysis the clustering is done binary and hierarchical In this data structure each node always has two children except the
62. d is restricted to several constraints Press help For more Input file is the corresponding input FASTA File of the directory Cu Tempzincfinger_10_5 14 19 Load directory Input File De Spectral Clustering zincfinger Fa 5 Chromosome Figure 50 Load external spectrA project Spectrogram Type MultiModal Interface for DNA Alignment of Sequences 80 Other features in MIDAS for Spectrogram Analysis are creating Discontiguous Spectrogram Video Chromosome Spectrogram Video and watch the videos sequence alignment can also be done on files which the users input This can be selected in the menu bar Sequence alignment for both BLAT and ClustalW are available Connecting to and disconnecting from the Philips Server are options given in the menu bar Help files are available in under the help menu item Help files for BLAT ClustalW and MIDAS can be opened Also a stack trace of MIDAS can be viewed To summarize all the features of MIDAS e Run Discontiguous Spectrogram Analysis on user inputs e Run Chromosome Spectrogram Analysis on user inputs e Load a external Spectrogram Analysis e Create Discontiguous Spectrogram Video e Create Chromosome Spectrogram Video e Watch Spectrogram Video e Run sequence alignment BLAT on the output of a discontiguous Spectrogram Analysis e Run multiple sequence alignment ClustalW on the output of a discontiguous Spectrogram Analysis e Run sequence alignment BLAT on users input e
63. d to which cluster they belong As follows the permutation table is created in the Spectrogram Analysis In the permutation table the new clustered order of the sequence indexes in the spectrogram can be found This table gives an overview of what the original sequence index is of a new clustered sequence index With this information the visualization of a hierarchical clustered tree can be designed and implemented 6 1 2 Hierarchical clustered tree visualization step by step To retrieve the structure of the hierarchical clustered tree it is needed to perform the next steps in order to construct a tree This tree is constructed by recursively retrieving the cluster information from linkage table backwards The algorithm for this search is as follows 1 Look up next sequence from Spectrogram Image which is found in permutation table and remember the original sequence index 2 Find all children recursively of this original sequence index the linkage table and append the cluster information in format file i Pre order depth first search algorithm 3 If there are more sequences go to step 1 4 Write the format file These steps are explained in more detail next 1 Look up next sequence from Spectrogram Image which is found in permutation table and remember the original sequence index MultiModal Interface for DNA Alignment of Sequences 56 Sequence index of clustered spectrogram H TEN eet Sequence
64. e computed they are added to the clusters hashmap If a cluster s coordinates are already computed the next value is read To explain how a line is read with all the possible considerations see the next example For example A 1356 1356 B 1323 1323 E 1274 1274 E 2247 2185 1640 1849 1640 1274 1849 1323 1356 If the reading of line E starts from the right to left A B C and D coordinates are already computed MultiModal Interface for DNA Alignment of Sequences 61 With 1356 and 1323 as leafs and already computed 1849 is the first strange cluster In this case the two following are 1323 and 1356 Thus 1849 is cluster of 1323 and 1356 Next 1274 and 1039 are also known clusters thus we find 1640 1640 consists of 1039 and 1039 As follows we find 1849 again This value was added to the clusters hashmap as soon as the coordinates were computed thus the next value is read 1640 is also already known thus next value 2185 is a strange cluster thus 2185 consists of 1640 and 1849 Next value is read 2247 is the first value of the line In this case cluster 2247 does NOT consist of 2185 and 1640 Because the first cluster name from left is the cluster of the sequence paired up with the second cluster name on the line Thus 2247 is the cluster of sequence E and cluster 2185 The format file in Figure 36 is a much simpler example First the coordinates of cluster 1 and 10 are computed and as follows 16 640 2378 and 3016
65. e conditions Select discontiguous analysis Set input file path Set annotation dir path Set output path Set windows size Set window overlap MultiModal Interface for DNA Alignment of Sequences 101 Problems Cause Solution Result Remarks Test some error input values to see what error message is given to the user Pressed ok Checked post conditions Tree was not visualized but no error was given by our console The image was too large and did not fit on the screen The input file contained 67 sequences On one screen you can visualize 50 sequences max To check whether an input file is bigger than 50 we divide the number of sequences by the max number of sequences which can be displayed If your answer is less than or equal to one you will have one screen with a max of 50 sequences If your answer is more than one you split the images into separate frames of 50 sequences each In this case 67 50 should be more than one but Java casts this answer to an Integer which means that the answer was exactly one The visualization of the cluster tree went wrong because MIDAS saw only one image which was too big and needed two images of the right size We put a cast to double around the division in MATLAB and this solved the problem Problem solved After this problem we tested sequences which were less than 50 or more than 100 just to be sure that our adjustment in the code was working for all the ca
66. e works based on the Program In addition mere aggregation of another work not based on the Program with the Program or with a work based on the Program on a volume of a storage or distribution medium does not bring the other work under the scope of this License 3 You may copy and distribute the Program or a work based on it under Section 2 in object code or executable form under the terms of Sections 1 and 2 above provided that you also do one of the following a Accompany it with the complete corresponding machine readable source code which must be distributed tinder the terms of Sections 1 and 2 above on a medium customarily used for software interchange ary b Accompany it with a written offer valid for at least three years to give any third party for a charge no more than your cost of physically performing source distribution a complete machine readable copy of the corresponding source code to be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange or c Accompany it with the information you received as to the offer to distribute corresponding source code This alternative is allowed only for noncommercial distribution and only if you received the program in object code or executable form with such an OTTer 1m accord with Subsection DO above The source code for a work means the preferred form of the work for making modifications to it For an executable wo
67. ectrogram Analysis Settings panel 3 Select Discontiguous the default panel is the discontiguous spectrogram settings panel 4 Give the correct input arguments on the settings panel a Input file Press Open to select an input FASTA file b Annotation Directory Press Open to select an annotation directory c Output Location Press Save to select a directory for saving your output d WindowSize window size should be an even number between 0 and 50000 MultiModal Interface for DNA Alignment of Sequences 115 WindowOverlap window overlap is at maximum windowSize 1 Normal method select an option in the dropdown list Number of Mean must be a positive number Number of Std must be a positive number Clustering check for clustering uncheck for without clustering Annotation check for annotation uncheck for without annotation i Annotation Classes at least one should be checked ii Annotation Types at least one should be selected NOTE Example input found_spectral_noalign fa will run approximately 10 minutes with annotation 5 Press OK and wait for the console to appear The console will show the processes MIDAS is executing 6 The output will be loaded See output panel for Spectrogram Analysis 2 3 Ban 3 2 Run Chromosome Spectrogram Analysis These steps will guide you how to run Chromosome Spectrogram Analysis 1 In the menu bar select Spectrogram Analysis 2 Select Run New Spectrogram Analysis a ne
68. enome annotation and spectral clustering and alignment under the same application MIDAS Multimodal Interface for DNA Alignment of Sequences will integrate the Spectrogram Analysis ClustalW and BLAT into one stand alone application and extend the application to display genomic annotation as well as several statistics obtained by data mining techniques The user interface will cater to the needs of the Bioinformatics researchers An iterative and incrementally software development approach is chosen for this project In this approach the first phases are requirements and planning During these phases research will be done on bioinformatics and requirements of MIDAS The next phases are analysis design and implementation of the MIDAS system Following testing the system and evaluation with users will determine the improvements and new requirements for the next iteration MultiModal Interface for DNA Alignment of Sequences 9 The structure of this report is based on the software development process The project definition can be found in chapter 2 In this chapter the aim background and survey project environment and plan are discussed With the project definition defined the requirements of MIDAS are formed in chapter 3 The next phase is to analyze the system which is described in chapter 4 In chapter 5 the design of MIDAS is presented The architectural technical java and interface design can be found in this chapter As follows the im
69. error the bioinformaticus will be informed by an error message and he can try again MultiModal Interface for DNA Alignment of Sequences 20 Special requirements None Use BLAT on output of Spectrogram Analysis Actors Bioinformaticus Pre conditions Use case Use Spectrogram Analysis and Login to Philips server must be completed Post conditions BLAT output is generated Basic flow 1 Bioinformaticus sets the required and the additional settings 2 Bioinformaticus presses OK button 3 BLAT output is being generated Alternative flows In case of a system crash the system has to be restarted In case of a connection error the bioinformaticus will be informed If bioinformaticus pushes cancel button current windows will close Special requirements The bioinformaticus has to have an account on the Philips server with a login name password and the IP address of the server Use Case relationships Includes Spectrogram analysis Login to Philips server Use ClustalW on output of Spectrogram Analysis Actors Bioinformaticus Pre conditions Use case Use Spectrogram Analysis and Login to Philips server must be completed Post conditions ClustalW output is generated Basic flow 1 Bioinformaticus sets the required and the additional settings 2 Bioinformaticus presses OK button 3 ClustalW output is being generated Alternative flows In case of a system crash the system has to be restarted In case of a connection
70. es and directories have to exist for MIDAS in order to provide the visualization There is no improved solution or workaround for this constraint Loading a project from a location with spaces in its path will cause problems when the Microsoft Wordpad is initialized in order to view a stacktrace or annotations Workaround Search the files manually and open them MultiModal Interface for DNA Alignment of Sequences 73 Sliding through a Spectrogram Analysis with full annotation is a heavy process There is not yet an improved solution or workaround for this constraint MultiModal Interface for DNA Alignment of Sequences 74 8 MIDAS MIDAS Multimodal Interface for DNA Alignment of Sequences is a stand alone application which integrates Spectrogram Analysis and sequence alignment to analyze DNA also providing a visualization of patterns in spectrograms In this chapter the features along with screenshots of MIDAS are introduced Examples of features are run a Spectrogram Analysis and perform sequence alignment load external spectrogram projects For detailed information about how to use MIDAS and its features the MIDAS manual can be found in Appendix D As MIDAS is a stand alone application it is provided with an installer MIDAS requires a Windows platform and a 1280x1024 screen resolution The start screen of MIDAS is presented in Figure 38 All the features can be found in the menu bar Spectrogram Analysis sequence al
71. est Case Use BLAT Actions performed Checked pre conditions Set the normal settings and additional settings Click on OK button Checked output BLAT Checked Console output Checked post conditions Problems No problems Cause Solution Result Use BLAT is tested and no errors were found Remarks Test Case Use ClustalW Actions performed Checked pre conditions Set the normal settings and additional settings Click on OK button Checked output ClustalW Checked Console output MultiModal Interface for DNA Alignment of Sequences 104 Checked post conditions Problems No problems Cause Solution Result Use ClustalW is tested and no errors were found Remarks Test Case Load existing project Actions performed Checked pre conditions Selected the folder where the output files are located from the existing project Select the input file which is used to create the existing project we want to display Checked Console output Checked post conditions Problems No problems Cause Solution Result Load existing project is tested and no errors were found Remarks Test Case Run BLAT without existing project Actions performed Checked pre conditions Set the normal settings and additional settings Click on OK button Checked output BLAT Checked Console output Checked post conditions Problems No problems Cause Solution Result Run BLAT is tested and no erro
72. f PIC Bangalore we learned that it is very important not to lose preliminary work Since many students will come and go their work has to be well documented so that it can be continued by other employees or new trainees We documented MIDAS in a way that a new programmer does not have re invent the wheel Being a computer science student does not mean that you are familiar with the bioinformatics field During the development of MIDAS we created an interface for bioinformatics which has to fit to their needs We have investigated this field for three months and saw some interesting possibilities to combine computer science with biological knowledge Being a trainee in India certainly brings some other challenges unrelated to work and education These are challenges as differences in culture food hygiene music and weather conditions We learned that India is a country which has a lot of extremes warm and cold spicy and sweet rich and poor dry and rainy loud or very silent Finishing your Bachelor with a Bachelor Project at Philips Bangalore in India certainly presented us a lot of challenges After completing Project MIDAS during a three months internship we can look back to our work at Philips and our stay in India as something which broadened our view on the world further educated us in the world of computer science and bioinformatics gave us the opportunity to put our pre training into practice and gave us good memories which will last for a
73. f Sequences 37 Spectrogram Analysis Spectrogram Analysis is developed in MATLAB at Philips Innovation Campus by Evan Santo This is a program composed of different executable script files which produces a Spectrogram Analysis of genomes and also outputs a Spectrogram Video With this technique frequency domain analysis is done in the genomes using tricolor spectrograms identifying several types of distinct visual patterns characterizing specific DNA regions The patterns and their frequency characteristics are related to the sequence characteristics of DNA Biological meaningful patterns are found through this method For MIDAS the BLAT and ClustalW software are located on an intern Linux server This is to prevent inaccessibility due to invisibility of the Internet and because it runs faster on an intern server Thus the application needs to communicate with the server in order to run BLAT or ClustalW Spectrogram Analysis was rewritten to make it compatible for conversion into Java classes The conversion of MATLAB code into Java classes is nowadays easier with a new tool of MATLAB called MATLAB Builder for Java Below we discuss all the plug ins libraries and tools which are needed MATLAB Compiler 4 6 MATLAB Compiler lets you automatically convert your own MATLAB programs into self contained applications and software components and share them with end users Applications and components created using MATLAB Compiler do not require MATLAB a
74. f Spectrogram Analysis which is created by Evan Santo Spectrogram Analysis is a powerful tool for analyzing the composition and harmonic properties of genomic sequences The first step in Spectrogram Analysis is converting the DNA sequences into binary indicators sequences The next step is to apply Fourier transforms to obtain the frequency spectrum of each base The magnitude of a frequency component reveals how strong a certain pattern of the nucleotide base is repeated at that frequency To improve the readability of the results each nucleotide base is represented by a color By combining the frequency spaces of the four bases a color spectrogram will be presented This color spectrogram visualizes the compositions and harmonic properties of a sequence Spectrogram Analysis will allow for the detection of large and imperfect repetitive elements in the genomes that are very hard to search for by current sequence space methods With the Spectrogram Analysis it is possible to systematically analyze the whole genome and analyze cross species by considering the frequency spectrum Hierarchical clustering based on Euclidean distances is applied on the Spectrogram Analysis The clustering is done because the major drawback of DNA Spectrogram Analysis is the reliance on human perception and memory for the detection of complex patterns This is impractical for whole chromosomes Thus clustered spectrogram images allow the extraction of biological meanin
75. f information mining DNA sequence alignment finding important motifs and methods for result visualization and clustering Important aspects cover pulling in information from multiple modalities of measuring processes in gene regulation transcriptomic data proteomic data gene copy number polymorphisms DNA protein interactions in one unified framework Currently these modalities are all investigated separately It can be expected that the problem of multimodality will be tackled in order to solve the problem of causality in molecular events related to gene expression 2 3 Project description Currently the sequence alignment tools like BLAST and ClustalW are not integrated with the Spectrogram Analysis into one application This makes searching the genomes a time consuming inefficient and expensive process Our tool has to e Integrate the Spectrogram Analysis and sequence alignment tools into one application and extend the application to display genomic annotation as well as several statistics obtained by data mining techniques Make the bioinformatics work less time consuming Make the bioinformatics work efficient Make the bioinformatics work less expensive Be a standalone executable Have a user interface which will cater to the needs of the Bioinformatics researchers Be tested thoroughly Be thoroughly tested during a usability test Be supported with a programmer s manual and a user s manual The name for this project wil
76. ftware because it is compiled into a standalone There was a strong collaboration between the developers of MIDAS and the bioinformatician MIDAS could be designed to fit the bioinformatician s needs During the development of MIDAS there have been performed several tests for the code and for the interface A usability test is done and the feedback is used to redesign and improve the system The usability test resulted very positive which confirmed that the bioinformatician enjoyed using MIDAS To make sure that MIDAS can be used by everyone and even be extended by programmers there are two manuals written the Programmers Manual and the User Manual In the last evaluation phase of this project and during the end presentation for the research group of PIC Bangalore a lot of new ideas for extensions came up To summarize these new requirements and investigations on their priorities the MoSCoW analysis represents the new list of requirements for MIDASv2 0 We recommend that MIDAS will be extended to give the bioinformatician more tools information and statistics to analyze DNA MIDASv1 0 is only the beginning MultiModal Interface for DNA Alignment of Sequences 87 12 Literature Scientific papers N Dimitrova Y H Cheung M Q Zhang Analysis and Visualization of DNA Spectrograms Open Possibilities for the Bioinformatics Research ACM MM 2006 Oct 25 2006 pp 1017 1024 N Dimitrova E Santo Improvement of spectral analysis as a
77. ftware development Testing and Evaluation allows the developers to analyze their application and improve if necessary In Section 7 1 the testing phase is described In Section 7 2 the usability tests are described And in Section 7 3 the constraints of MIDAS are found 7 1 Testing Software testing is the process used to measure the quality Aspects as correctness completeness security capability reliability efficiency maintainability compatibility and usability in of the system are important during the testing phase Testing in this project is done by GUI testing and source code testing The source code is tested with own test functions and also by the use of JUnit regression testing framework The following detailed test cases can be found in Appendix C e Input Chromosome for analysis Input Discontiguous DNA for analysis Use Spectrogram Analysis Create Spectrogram Video Watch Spectrogram Video Overview all present clustering DNA Login to Philips server Use BLAT Use ClustalW Load existing project Run BLAT without existing project Run ClustalW without existing project Disconnect from Philips Server Source code testing is continuously used during the development of MIDAS The major quality aspects of the integration of different tools in one application are reliability and correctness For example in MIDAS part of the output of the Spectrogram Analysis is used as input for ClustalW It is necessary that the part of the output
78. g Spectrogram Analysis is powerful due to the use of binary indicators instead of nucleotide base symbols By using binary indicators and converting them into the Fourier domain space all kinds of signal processing techniques can be used to analyze these signals Computations on these signals are less heavy than computations on long nucleotide base symbols Thus Spectrogram Analysis is able to analyze very long sequences up to whole genomes References N Dimitrova Y H Cheung M Q Zhang Analysis and Visualization of DNA Spectrograms Open Possibilities for the Bioinformatics Research ACM MM 2006 Oct 25 2006 pp 1017 1024 N Dimitrova E Santo Improvement of spectral analysis as a genomic analysis tool 4 2 User Analysis In the analysis phase it is important to understand the user of your product Getting a complete understanding of your user will contribute to the quality of your end product As a first step in a user analysis it is important to form a user profile Section 4 2 1 After forming this profile the user was analyzed in the current situation Section 4 2 2 and in the improved situation Section 4 2 3 the situation after project MIDAS is completed Section 4 2 4 contains the workflow diagram which presents the workflow in the new improved situation MultiModal Interface for DNA Alignment of Sequences 25 4 2 1 User profile A well formed user profile will contribute to the success of MIDAS When the developer
79. genomic analysis tool Jaques Cohen Bioinformatics An Introduction for Computer Scientists Brandeis University MA 02454 MATLAB tutorials These can be found on the website of Mathworks http www mathworks com MATLAB Builder for Java 1 0 MATLAB Compiler 4 0 MATLAB Statistics Toolbox Other resources Wikipedia http www wikipedia com Java API http java sun com j2se 1 5 0 docs api MultiModal Interface for DNA Alignment of Sequences 89 Appendix A License Ganymed SSH 2 Copyright c 2005 2006 Swiss Federal Institute of Technology ETH AUELCO Department of Computer Science http www inf ethz ch Christian Plattner All rights reserved Redistribution and use in source and binary forms with or without modification are permitted provided that the following conditions are mel a Redistributions of source code must retain the above copyright notice this list of conditions and the following disclaimer b Redistributions in binary form must reproduce the above copyright notice this list of conditions and the following disclaimer in the documentation and or other materials provided with the distribution c Neither the name of ETH Zurich nor the names of its contributors be used to endorse or promote products derived from this software without specific prior written permission THIS SOFTWARE TS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS LS AND ANY EXPRESS OR IMPLIED WARRA
80. gt HUM chr12 53016917 53017116 0 177 of 200 chr12 118736654 119104509 of 132449811 ACABBBATTAGCTGGGCATGGGGGTGETGTGCCTGTAGTCCCAGCTACCCGGGAGGETGAG Tn NOU UTES A scassaasttagcetgggeatggtggtgeatgectgtagtceccagctactcegggaggetgag GCACAACAATCACTTGAA 367676 LALA Pe Bl n nu qcaqdqaqaatcacttdaacccaaq datdaacccaddadqdcadadqcttacadqtdaqcc Write to File Hostname 161 65 26 136 Login wijnandp Authenticated YES Figure 48 Write to file interface MultiModal Interface for DNA Alignment of Sequences 79 The output of ClustalW is presented in Figure 49 On this panel the input given is also displayed This is done by presenting the partial image and the annotations The output of ClustalW is appended on the panel It is also possible to write the output from ClustalW in a file on the desktop pc amp MIDAS Spectrogram Analysis Sequence Alignment Connection Help x ClustalW Output Tue Jun 19 17 20 03 IST 2007 xj Spectra smallRNAs_200_50_ 08 55 ClustalW Multiple Sequence Aligment kad Partial Image Multiple Sequence Aligment Input Annotations 411 HUM chr19 57476362 412 HUH chr19 56861245 413 HUM chri9 58861245 414 HUH chr19 56863723 415 HUH chr19 56863723 416 HUM chrX 53 600309 v Multiple Sequence Aligment Outputs Output File Loc
81. gure 21 and third Figure 22 are sequence diagrams of running sequence alignment in BLAT and ClustalW respectively In the sequence diagrams of run BLAT and run ClustalW two objects were left out due to space and the clarity of the sequence diagram These two objects are e LoginGUl uses the class ServerConnection for connection to the server BLATExecutables extends the class ServerExecutables e ClustalWExecutables extends the class ServerExecutables MultiModal Interface for DNA Alignment of Sequences 48 we beig souenbes sisAjeuy esuanbas QZ ambiy 1x9 wa s s ndjno indino sBumasjelsuan jes 2510 19 0 sisAyeuyono ds DP ULOH dUuOnoE yo usnd 151 snon6nuo ls q 10 sinduj 195 lqISIA l ued s qs um sono ds Bula sn snon nuo lsiq pajas lQqISIA l ueduHOsSum sono ds Suu sni Q SWOSOWOIYD Joajes INOSsBumasonoeds wesbBold unu sishjeuyweibonoadsqeneny T8uegIsiadsbumesomsds jaueguHosbunjesonoeds 1 IMSSVAUIN FERD snonbIUO09 SIg s9aDey98ea gVILYN s sAjeuy vwue 1bo1199ds unu we beig e2usnbas IW7I IZ 2 paues uno99y Jasn isd ndino uledino Bus ewenino Buus 440 Bulls qp Buns uareig Buns y7gun ep y9eq e310 greso y Bua7uiM aziguim zbes bas guu1 pwu sBul1s e11o9dsuwo yJeouenbasieb p9ZIUOJYIU S indu Buys uo ijejouue Duns
82. h Set start position Set orientation Set clustering flag Set annotation flag Set windows size Set window overlap Test some error input values to see what error message is given to the user Pressed ok Checked post conditions Tree was not visualized but no error was given by our console The image was too large and did not fit on the screen The input file contained 67 sequences On one screen you can visualize 50 sequences max To check whether an input file is bigger than 50 we divide the number of sequences by the max number of sequences which can be displayed If your answer is less than or equal to one you will have one screen with a max of 50 sequences If your answer is more than one you split the images into separate frames of 50 sequences each In this case 67 50 should be more than one but Java casts this answer to an Integer which means that the answer was exactly one The visualization of the cluster tree went wrong because MIDAS saw only one image which was too big and needed two images of the right size We put a cast to double around the division in MATLAB and this solved the problem Problem solved After this problem we tested sequences which were less than 50 or more than 100 just to be sure that our adjustment in the code was working for all the cases After these tests everything seemed to work fine Test Case Input Discontiguous DNA for analysis Actions performed Checked pr
83. he process which is described in this paragraph 4 1 Run BLAT 1 Press sequence alignment in order to start sequence alignment Two buttons will appear Run BLAT and Run ClustalW 2 Select in the Annotation list an annotation for BLAT 3 Press Run BLAT 4 Login will appear if you are not logged on the server yet You need to have an account for the Philips Linux Server Note The file transfer between the Server and local machine will cause MIDAS to freeze some time Please be patient 5 Give the correct input arguments d b arpan BLAT location location of BLAT on the server There is a possibility to 20 1 the location Query the selected query input There is a possibility to Show the query in Notepad Database the database is on the Linux server Give the correct database Output file set name and extension Database Type select the database type Query Type select the query type Additional settings give command as in a command shell see BLAT help for more information on the options BLAT input example MultiModal Interface for DNA Alignment of Sequences 121 SEN Input Output BLAT location cadappliblat 33 Query lTempyzincfinger 200 50 17 05 BLAT Fore Fa Database data hq17 chr13 Fa Type options Database Type Query Type Additional Options Phe additional options of SLAT are set here Please refer to fhe SLAT helo file for these opone Enter our additional options In the
84. help file editor hlp SHM is your editor and you can produce the help file with SHM and the enclosed compiler http www fileflash com program 2348 License Freeware MultiModal Interface for DNA Alignment of Sequences 38 In order to make MIDAS a stand alone application more software tools were used One of these tools is the FAT jar a special deployment tool for Eclipse Another tool is used to create a windows installer As follows the description of these software tools are presented FAT Jar The Fat Jar Eclipse Plug In is a Deployment Tool which deploys an Eclipse java project into one executable jar In addition to the eclipse standard jar exporter referenced classes and jars are included to the Fat Jar so the resulting jar contains all needed classes and can be executed directly with java jar no class path has to be set no additional jars have to be deployed Jars External Jars User Libraries System Libraries Classes Folders and Project Exports are considered by the plug in The Main Class can be selected and Manifest files are merged The One JAR option integrates a specialized Class Loader written by Simon Tuffs http one jar sourceforge net which handles jar files inside a jar Individual files and folders can be excluded or added to the jar http fjep sourceforge net License GNU General Public License GNU GPL GPL Appendix B Setup Editor v2 1 0 33 Setup will create very small Windows insta
85. hrough the whole Spectrogram Analysis output The sequence alignment results are displayed next to the spectrogram image The disadvantage is combining this all on one screen makes the output panel very busy Spectrogram Overview Imaze Figure 24 Storyboard 1 After a meeting where the requirements and design were discussed it became clear that it is not a priority to have the sequence alignment results and Spectrogram Analysis output on one screen Instead a visualization of the patterns on a spectrogram image has a higher priority The visualization of the patterns can be done by visualizing a hierarchical clustered tree The issue was how to display the clustering by colored based lines each color one hierarchy as in Figure 25 or by drawing a hierarchical tree as in Figure 26 MultiModal Interface for DNA Alignment of Sequences 52 Spectrogram eee Overview Image He Figure 25 Storyboard 2 Overview Image Figure 26 Storyboard 3 The final decision was to visualize the hierarchical clustered tree as a hierarchical tree In the hierarchical approach it is more clear to see which sequence belongs which clusters than in the color based approach Also the hierarchical tree is closer to reality of how a hierarchical clustere
86. icancy of the alignment Main interface how background Create a background for MIDAS to decorate the design Scale images horizontal When images become to big for the interface scale them horizonticall System alure handling System iz fast eployabe Memory check a T LI a m l Saved data will not be lost The code must be optimized for speed Create a deployable version of MIDAS Check memory before running a project and give warning when there is not enough memory System output cannot have errors statistics must be correct Possibility to save your project Reliable Save project Feedback Make the cluster tree clickable so that the user can interactivily select 4 cluste Make sure that all the names are biological correct The video name should be the same as the input file with the corresponding Wwindowsize and overlap Set a default database by running ELAT based on the selection from the user Window based menu Build up the menu like windows use new open save save as etc Ranges of parameters in panel Show the ranges of the parameters in the panel so that the bioinformaticus knows what to do Create a back button to reset project and run it again with new settings Create separate panels for BLAT and Clustalw a Output folder should contain the windowsize overlap and inputfile name Uze correct names in panels Spectrovideo name Back button for changes mn m T I a
87. ignment Connection to and form server and view help files MIDAS Spectrogram Analysis Sequence Alignment Connection Help Midas Multimodal Interface for DNA Alignment of Sequences By Melissa Che ung Paul van den Haak amp Nev enka Dimitrova Hostname NONE Login NONE Authenticated NO Figure 38 MIDAS start screen Spectrogram Analysis can be started from the menu bar of the start screen If this option is selected the Spectrogram Analysis settings will be loaded Figure 39 Figure 40 It is possible to run a Discontiguous or Chromosome Spectrogram Analysis In this settings panel all the input arguments can be given as in the original Spectrogram Analysis MultiModal Interface for DNA Alignment of Sequences 75 Spectrogram Analysis Settings Discontiguous Spectrogram Analysis Input Output 4 Input file C smallRNAs fa m Annotation Directory D SpectroYideo Annotation Annotation Full_Annotation Discontigtous Y l E Output location C Temp Save Parameter Settings Chromagome Windowsize 200 Windowoverlap 50 Normal method super Mean Number of Mean 3 Number of Std Additional Settings Clustering Annotatation Annotation Classtes C T transcriptional G genomic E epigenomic Annotation Type s Hold Ctrl button to select multiple types Figure 39 Discontiguous Spectrogram Analysis amp Spectrogram Analysis Settings Chromosome Analys
88. igns multiple sequences ClustalW can be installed on Windows Linux and MacOS For this project the sequence alignment tools and genome database are accessible from an internal Philips Linux Server The system contains the following layers Figure 10 1 Graphical User Interface 2 Matlab Clustered Tree Matlab Spectrogram Analysis BLAT Sequence alignment ClustalW Sequences Alignment 3 Database connection Figure 10 Layer overview 1 The Graphical User Interface is a user friendly environment to control the application The user is able to run Spectrogram Analysis and sequence alignment and see a visualization of the patterns of the spectrogram MultiModal Interface for DNA Alignment of Sequences 33 2 This layer contains all the mathematical functionality 3 The BLAT program connects to a database to perform the sequence analysis 5 1 3 Major design issues During the conceptual phase two major issues came up The first issue was to decide in which programming language the Graphical User Interface should be implemented After deciding on the programming language the second issue was were to put the main control of the program Graphical User Interface programming language The major issue for designing and implementing the Graphical User Interface is deciding which programming language is the best option To decide on the programming language aspects like performance client server connections error handling platform dependencies
89. index of original spectrogram Figure 29 Permutation table Figure 29 is an example of a permutation table The column indexes correspond to the sequence index of the clustered spectrogram The first row contains the original sequence index for each column thus each clustered sequence index The sequence index of the original sequence is used in the linkage table to construct the binary hierarchical tree This leads to step 2 2 Find all children of this original sequence index the linkage table and append the cluster information in format file i Pre order depth first search algorithm Sequence index of original spectrogram BE p 2 3 If index gt max n sequences index max cluster name 4 Distance 232 22 4706 778823 2373 778897 EaD a 4798 C77 2907 Dee z r sal Tee 2377 42 4277 5325 77 8968 Hierarchical Cluster 2380 0923 5500 77 9099 2382 5539 5543 773228 2383 3144 2241 77 9415 Figure 30 Linkage table In the linkage table the paired up sequences clusters or sequence with cluster can be found see survey of this chapter for definition of linkage table Clusters are represented by their cluster name See Figure 30 The original sequence index is found in the linkage table and also the corresponding node C which combines these two as a cluster This corresponding node C can be another sequence which makes this cluster a leaf If the corresponding node C is ano
90. is Input Output mv gt gt Input File D Spectrovideo Methylated_Cancer_UNmethylated_Normal Fa Input annotation file D SpectroVideo Annotation Annotation Full_Annotation HUM chr15 annot Output location C Temp Discontigbous Parameter Settings Chromosome Start Position 29562621 Orientation Windowsize 200 Windowoverlap 50 Normal method Super Mean Number of Mean 3 Number of Std Additional Settings Clustering Annotatation Annotation Classies T transcriptional G genomic E epigenomic Annotation Type s Hold Ctrl button to select multiple types Figure 40 Chromosome Spectrogram Analysis The output of a Chromosome Spectrogram Analysis is presented in Figure 41 The annotation overview with the annotations next to corresponding the spectrogram image also if the spectrogram is clustered the hierarchical clustered tree is drawn The slider can be used to scroll through all the images of the Spectrogram Analysis The output of a discontiguous Spectrogram Analysis is presented in Figure 42 MultiModal Interface for DNA Alignment of Sequences 76 MIDAS Spectrogram Analysis Sequence Alignment Connection Help Create PDF x Spectrogram Analysis Methylated_Cancer_UNmethylated_Normal fa Zinc finger AZ domain co Ex11 O zinc fi 11 0 dem Zinc finger A20 domain zinc fin er OH bone o un Zinc finger A
91. l be MIDAS Multimodal Interface for DNA Alignment of Sequences MultiModal Interface for DNA Alignment of Sequences 11 2 4 Project Environment Project Start Date April 10 2007 Project End Date June 29 2007 Philips Research Internship Project Students Melissa Yuen Shan Cheung myscheung gmail com Wijnand Paul van den Haak paulvandenhaak gmail com University of Technology Delft Mentor Dr Nevenka Dimitrova Nevenka Dimitrova philips com Reliable Care Solutions Department Philips Research India Bangalore PRI B Philips Innovation Campus MFAR Manyata Tech Park Manyata Nagar Nagavara Bangalore 560 045 INDIA 2 5 Project Plan The software engineering technique used in this project is iterative and incremental development In this process the software system is developed incrementally This allows the developer to take advantage of what was being learned during the development of earlier incremental deliverable versions of the system The advantages come from both the development and use of the system The life cycle of iterative and incremental development is displayed in Figure 1 Analysis amp Design Implementation Deployment Initial Planning Evaluation Testing Figure 1 Life cycle iterative and incremental software development process Requirements Planning A Gantt chart is a chart that illustrates a project schedule It shows the start and finish dates of the tasks In this vi
92. leaf nodes Constructing the binary hierarchical clustered tree is based on Euclidean distances For computing the Euclidean distances the MATLAB function pdist X euclidean is used Syntax 5 2 Y IX distance Y IE Bdistfurni Y minkowski p Description Y pdist X computes the Euclidean distance between pairs of objects in n by p data matrix X Rows of X correspond to observations columns correspond to variables is a row vector of length in 11 n 2 corresponding to the in 1 n 2 pairs of observations in 2 The distances are arranged in the order 11 4 1 3 Aah 24 0 Y ls commonly used as a dissimilarity matrix in clustering or multidimensional scaling To save space and computation time Y is formatted as a vector However you can convert this vector into a square matrix using the squaretform function so that element iyin the matrix where j corresponds to the distance between objects and in the original data set Y pdist X distance computes the distance between objects in the data matrix X using the method specified by distance where distance can be any of the following character strings that identify ways to compute the distance euclidean Euclidean distance default Figure 27 Syntax pdist MATLAB The Euclidean distance is computed between two sequences of the Spectrogram Analysis For N number of sequences there are N N 1 2 distances Based
93. llations easily and quickly without all the overhead of the big tools Perfect for emailing a few files to your friends or distributing very small applications without the end user having to stress over how to install and run them It has a full script editor to make your life even easier a command to create Windows shortcuts and a tool to quickly create self extracting zip files Optional compression is now fully integrated http www glenn delahoy com software index shtml License Freeware 5 2 2 Environment and Component Models To illustrate and clarify the relations between the components of MIDAS in their execution environment the component diagram and deployment diagram of MIDAS are created The component diagram s main purpose is to show the structural relationships between the components of a system In Figure 13 the MIDAS component is related to all the other components The user commands the Graphical User Interfaces of the Spectrogram Analysis Hierarchical Clustered Tree ClustalW and BLAT He can use these GUIs to input the arguments he prefers The outputs of these computations are visualized in the MIDAS component MultiModal Interface for DNA Alignment of Sequences 39 SpectroAnalysis Hierarchical Clustered Tree MIDAS A r a epee Figure
94. lose none Use Case relationships includes Spectrogram Analysis Input Discontiguous DNA for analysis Actors Pre conditions Post conditions Basic flow Alternative flows Special requirements Bioinformaticus Bioinformaticus started MIDAS and must have valid Discontiguous DNA input DNA sequence in FASTA format After loading the Multiple DNA input the system has read this input and additional settings and is ready to run Spectrogram Analysis 1 Bioinformaticus selects Run new Spectrogram Settings 2 Bioinformaticus selects Discontiguous Analysis 3 Bioinformaticus inputs valid Discontiguous FA file 4 Bioinformaticus sets the required settings 5 Bioinformaticus selects the OK button in order to run Spectrogram Analysis If the input DNA sequence does not have the FASTA extension or other required settings are not set correctly an error is given to the bioinformaticus and the bioinformaticus has the possibility to correct his input for the system If bioinformaticus pushes cancel button current windows will close none Use Case relationships includes Use Spectrogram Analysis Use Spectrogram Analysis Actors Pre conditions Post conditions Basic flow Alternative flows Bioinformaticus Use case Input Chromosome for analysis or Input Discontiguous DNA for analysis must be completed first The Spectrogram Analysis has completed 1 Bioinformaticus is aware that Spectrogram Analysis is
95. lustering check for clustering uncheck for without clustering Annotation check for annotation uncheck for without annotation i Annotation Classes at least one should be checked Annotation Types at least one should be selected NOTE Example input Methylated_Cancer_UNmethylated_Normal fa will run approximately 10 minutes with annotation 5 Press OK and wait for the console to appear The console will show the processes MIDAS is executing 6 The output will be loaded See output panel for Spectrogram Analysis 2 3 x TY MultiModal Interface for DNA Alignment of Sequences 117 3 3 Output panel from Spectrogram Analysis After you have completed a Spectrogram Analysis you can analyze the output MIDAS works with tabbed panels The output panel for Spectrogram Analysis contains the content of the Annotation file the Spectrogram Images a clustered hierarchical tree and a slider If the Spectrogram Analysis type corresponds to a Discontiguous Spectrogram Analysis you can push the sequence alignment button to make a BLAT ClustalW panel visible figure 3 This feature is not yet available for Chromosome Spectrogram Analysis so only the annotation the spectrogram images and the cluster tree are visible here figure 4 ES MIDAS Spectrogram Analysis Sequence Alignment Connection Help x SpectrA smalIRNAs_200_50 11 52 HUM chr12 53016767 HUH chr22 36569725 HUH chr19 HUM chr19 HUH c
96. mage For sequence 1 the original sequence index is 3196 Figure 29 By using the search method in the linkage table for the first sequence the binary tree in Figure 33 is constructed Thus sequence 3196 is paired up with 3202 in the linkage table with cluster name 1 row 1 The cluster name found is e appended in the first line of the format file Thus the first line contains the cluster name 1 Figure 33 Tree sequence 1 2 For sequence 2 the original sequence index is 3202 Figure 29 This cluster was already found in sequence 1 thus sequence 1 and sequence 2 are a cluster named 1 Thus the second line in the format file also contains the cluster name 1 Sequence 3 has the original index is 3175 Figure 29 Using the search method the corresponding node is found 3284 and 3284 gt N 3274 This means this sequence is paired up with another cluster mw 3284 3274 10 The name of the cluster of which this sequence is paired up is 10 The next step is to go to row number 10 rows are cluster names then the nodes of this cluster are found Cluster 10 is a 3181 3187 leaf The information appended in the format file is 16 and 10 Thus the Figure 34 Tree sequence 3 5 E MultiModal Tnterface for DNA Alignment of Sequences 58 search ends here for sequence 3 and the following tree is found Figure 334 Step 1 and 2 are also performed for sequence 4 5 6 and 7 Sequence 4 and 5 make cluster 10 part of Figure 33 For se
97. manually MIDAS performs this task Figure 7 gives an impression of the tasks which must be performed by the user in the improved situation by analyzing a discontiguous chromosome Figure 8 displays analyzing a single chromosome Analyse Discontiguous Uhr Analyze video annotatfoh Get settings from user Y Input output settings Y Input file fa Annotation directory Output location Parameter settings Window size Window overlap Normal method Number of mean Number of Std Additional settings X Clustering flag Annotation flag Annotation classes Annotation types Show scrollabe video Show annotation Show clusters Analyze allignment Y Get input from user for BUAT Login to Linux server v Get user information v Host Username Passvvord Get settings from user v Input output settings v BLAT location on server Query file fa Database file fa Output file Format of output file Type options gt Database type Query type Additional options v Additional se
98. men s Health Cross model Bioinformatics for cancer care This project aims to develop a tool for Bioinformatics Section 3 1 discusses the current situation In the Section 3 2 the system description will be presented This includes the survey functional and non functional requirements constraints and system models System models will include use case models and object models 3 1 Current situation Bioinformatics refers to the creation and advancement of algorithms computational and statistical techniques and theory to solve formal and practical problems inspired from the management and analysis of biological data Major research efforts in the field include sequence alignment gene finding genome assembly protein structure alignment protein structure prediction prediction of gene expression and protein protein interactions and the modelling of evolution wikipedia Different software tools are used by bioinformatics which perform specific tasks The best known software is BLAST BLAT BLAST BLAT is an algorithm for sequence alignment searching large databases of protein or DNA sequences The NCBI provides a popular web based implementation that searches their massive sequence databases For multiple sequence alignment of DNA or proteins another tool is used ClustalW ClustalW produces biologically meaningful multiple sequence alignments of divergent sequences It calculates the best match for the selected sequences and lines them
99. mmandLine Check when function is ready Inspect output by hand v Search for output directory Inspect files CreateSpectroVideo Set settings by hand Image path Annotation path Video path Moviefile name Run video m file by hand Call function on commandLine Check when function is ready Inspect output by hand v Search for output directory Search for video Start video Set pauze stop play Close MediaPlayer Scroll to region of interest Close Matlab Analize alignment Prepare input Y BLAT input z Check spectro image Search and start movie Specify frame number Specify lines get Sequence from Spettra Prepare files run perl program obtain fore fa and back fa Copy files to Linux server Run some ftp program Login put fore fa input fa Run BLAT 7 Connect to linux server v Start telnet Connect Login Search for work directory Run BLAT on commandline Convert to pretty format Obtain output Run some ftp program Login get output psl Figure 5 Task diagram discontiguous analysis Analyze Chromosome v Analyze video annotattoh Search for input file by hand Search for ann file by hand Set work path by hand ClusterChromosome Y Set add settings by hand Clustering flag Annotation flag Annotation classes Annotatio
100. n the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Program If any portion of this section is held invalid or unenforceable under any particular circumstance the balance of the section is intended to apply and the section as a whole is intended to apply in other circumstances It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices Many people have made generous contributions to the wide range of software distributed MultiModal Interface for DNA Alignment of Sequences 96 through that system in reliance on consistent application of that System nt Les UN vo the autchor achsr to decide ad He or she ls wel line to distribute software through any other system and a licensee Cannot impose that choice This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License 6 T the distribution and or use of the Program 16 restricted in certain countries either by patents or by copyrighted interfaces the original copyright holder who places the Program under this License may add an explicit geographical distribution limitation excluding those countries so that distribution is permitted onl
101. n existing Spectrogram Analysis directory This class loads a Spectrogram Analysis project The directory should contain several files and directories in order to work Files e Annotation txt AnnotationFULL txt not necessary e Fourier txt e Image txt Directories with files e Annotation Annotation mat Clustering Distance mat Linkage mat Permutations mat FreqMatrix FreqMatrix mat Sequence Either sequenceDiscontiguous mat or sequenceChromosome mat SpectroImages All spectrogram images should be in this folder MultiModal Interface for DNA Alignment of Sequences 45 sequenceAlignment User Interface LoginGUI User Interface BLATGUI User Interface ClustalWGUI BLATExecutables IN ClustalWExecutables IN ServerExecutables 1 ClustalVVFilterDuplicates A PA sequenceAlignment Figure 18 Sequence alignment package ServerConnection ServerConnection connects to the Philips server and authenticates It uses ganymed ssh2 for JAVA for SSH connection LoginGUI LoginGUI is the interface for the server connection The connection is created in ServerConnection ServerExecutables SeverExecutables contains server commands file synchronization methods Uses ganymed ssh2 for JAVA for a SSH connection ClustalWExecutables ClustalWExecutables passes the commands to the server ClustalWExecutables extends ServerExecutables This class gives the run CustalW command and
102. n type Set parameters by hand Input file Annotation file Start position Orientation Window size Window overlap Normal method Number of mean Number of Std Operator delimiter Run cluster m file by hati Call function on commandLine Check when function is ready Inspect output by hand Y Search for output directory Inspect files CreateSpectroVideo Analize alignment v Set settings by hand Image path Annotation path Video path Moviefile name Run video m file by hand Call function on commandLine Check when function is ready Inspect output by hand BLAT input 7 Prepare input y Start Matlab Close Matlab Prepare input iS Run BLAT y Check spectro image 7 get Sequence from Spettra Copy files to Linux server Connect to linux server Y Search for work directory Run BLAT on commandline Convert to pretty format Obtain output Run some ftp program Login Start telnet
103. nd annotation must be visible on one screen Clusters can be chosen by giving the start and end index of the cluster A sequence can be selected as input for BLAT with a mouse click Every component should be clearly visible Large components should be scrollable Security issues e For the connection to the Philips Linux server a Secure Shell protocol must be used Documentation e There should be an user manual e There should be a programmers manual e The code must be well documented MultiModal Interface for DNA Alignment of Sequences 16 Hardware consideration e MIDAS can visualize the interface with a minimum resolution of 1280 x 1024 without any loss of information e A user needs a mouse and a keyboard to give input to MIDAS General Requirements e MIDAS must be a deployable standalone executable 3 2 3 Constraints Platform constraints Runs on Windows platforms only Process constraints This project stands for 420 hours per head MultiModal Interface for DNA Alignment of Sequences 17 3 2 4 System Models In this section the use case diagram the use cases and the activity diagram are presented The use case diagram presents the goals of the bioinformaticus These goals are described in more detail in the use cases below Use case Diagram Use case Diagram MIDAS lt lt Includes gt gt input Chromosome for analysis Use Spectro Analysis input Distcontiguous S N DNA for analysis
104. nditions The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software THE SOFTWARE 1S PROVIDED AS IS WITHOUT WARRANTY OF ANY KIND EXPRESS OR IMPLIED INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM DAMAGES OR OTHER LIABILITY WHETHER IN AN ACTION OF CONTRACT TORT OR OTHERWISE ARISING FROM OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE MultiModal Interface for DNA Alignment of Sequences 92 Appendix B License FAT jar GNU GENERAL PUBLIC LICENSE Version 2 June 1991 Copyright C 1989 1991 Free Software Foundation Inc 59 Temple Place Suite 330 Boston MA 02111 1507 USA Everyone is permitted to copy and distribute verbatim copies of this license document but changing it is not allowed Preamble The licenses for most software are designed to take away your freedom to share and change it By contrast the GNU General Public License is intended to guarantee your freedom to share and change free software to make sure the software is free for all its users This General Public License applies to most of the Free Software Foundation s software and to any other program whose authors commit to Mono la Some other Free Software Foundation software is covered by the GNU Libr
105. ning content which will not cause a MATLAB error the MATLAB function will not have a correct or no result This will cause the Java Virtual Machine to crash and give a Java Hotspot error This is because the Java code expects some output of MATLAB only no correct result will be provided Thus the Java Virtual Machine crashes This problem is actually also a constraint on MIDAS The solution is to provide error handling on this level but then expert biological knowledge is needed MIDAS expects its users to be expert users Java heap space memory problems The first approach of visualizing all the spectrogram images was to have all the images scrollable Thus the user could scroll from image 1 to the last image This approach demands all the images to be loaded at once The Java heap space memory cannot take the size of the amount of large images to be loaded One solution is to expand the Java heap space memory This solution is not the best solution for stand alone application thus another approach was implemented The other approach is to use a Java slider With this approach only one image will be loaded at one moment thus the heap space memory will not be exceeded Only disadvantage is that the slider does not flow through the images but reloads each image MultiModal Interface for DNA Alignment of Sequences 69 7 Testing and Evaluation This chapter describes the Testing and Evaluation phases These are very important phases in so
106. nnect to Philips server agin Connect Disconnect Connection side A 7 m E VU m Lo m lu gt imi on 3 Je y 5 H sib a uz Ul D E I cr m un m m a un H cu l r un LI T E cr E Pa cr m uz ET E cr uz m cr T m m a LI ET X Give hostname login name and password Connect to the server Disconnect to the server Connect in a secure way to the Philips server at PIC Bangalore from everywhere in the world Connection with pre known use Build an interface from which known users can select their name and save passwords for connection BLAT Load input Load input which the bioinformaticus selects in the annotation list fram the Spectrogram Analysis Obtain the output of the BLAT sequence alignment Check inputted FA file for correctness Run BLAT on user specified input file without running 5 4 before hows output Check input Run Blat separate Clustana Load input Load input which the bioinformaticus declares by giving a start line and an end m Show output Obtain the output of the ClustalW sequence alignment Check input Check inputted FA file for correctness Run Clustal separate Run ClustalW on user specified input file without running 5 4 before Display a graph of p value score the signif
107. nymore to run www mathworks com MATLAB Builder for Java 1 MATLAB Builder for Java extends the MATLAB Compiler with tools for automatically generating Java classes from your MATLAB algorithms You can run MATLAB based classes created by using Builder for Java outside the MATLAB environment referencing them the same way as any other Java class www mathworks com Ganymed ssh2 Ganymed SSH 2 for Java is a library which implements the SSH 2 protocol in pure Java tested on J2SE 1 4 2 and 5 0 It allows one to connect to SSH servers from within Java programs It supports SSH sessions remote command execution and shell access local and remote port forwarding local stream forwarding X11 forwarding SCP and SFTP There are no dependencies on any JCE provider as all crypto functionality is included http www ganymed ethz ch ssh2 License BSD licenses the Berkeley Software Distribution license Appendix A The Graphical User Interface of MIDAS relies on the Java JFC package JFC is short for Java Foundation Classes which encompass a group of features for building graphical user interfaces GUIS and adding rich graphics functionality and interactivity to Java applications For 7 the 7 toolkit is used vvhich is a part of the JFC j2se 1 5 0 d K A stand alone application is only complete with help files A software tool is used to create a windows based help file hip Shalom Help Maker v0 6 1 Shalom Help Maker is a Windows
108. o Ex11 0 zinc fi Zinc finger 420 domain co Ex11 Zinc finger A20 dom Zinc fin Zinc finger A20 don Zinc finger A20 don Zinc finger A20 domain co Exi11 Zinc finger 420 domain co Ex1i OTUD CpG Isl Z inc fin zinc fi Zinc finger A20 don Zinc finger A20 dom Zinc finger A20 dom Zinc finger 420 domain co Ex11 C Zinc finger A20 don Zinc fin Zinc fin Zinc finger A20 don Zinc finger A20 domain co 1n8 O Zinc finger A20 domain co Ini zinc finger AZ d Zinc finger A20 d Zinc finger A20 domain co In6 C Zinc finger A20 domain co Ex11 01 Zinc finger A20 don Zinc fi Zinc finger A20 dom Zinc finger A20 don Zinc finger A20 don Zinc finger A20 domain co Repes Zinc finger A20 domain co Ins 58 Zinc finger A20 domain co Ex11 OTUD7 3pUTR O Zinc finger A20 domain co Ex11 Zinc finger 420 domain co Ex11 Zinc finger A20 dom Zinc finger AZ domain co 1n86 O Zinc finger A20 don Zinc finger A20 dom Zinc finger A20 don 2 Hostname NONE Login NONE Authenticated NO Figure 4 Output from a Chromosome Spectrogram Analysis One Spectrogram Image contains 50 sequences As the output panel shows one spectrogram image at the time the corresponding 50 annotations and the cluster tree for that part are loaded The slider can be used to scroll thro
109. o integrate Java into MATLAB two disadvantages are found e Integrating Java into MATLAB is not as easy as it seems It is a time consuming process e MATLAB is an interpreted language thus slower than Java Next to the disadvantages of the integration of the two languages there were some other issues e The GUI is in Java where the programs should be linked to each other It seems illogical to make a MATLAB executable e Creating MATLAB executables requires the MATLAB Compiler 4 5 The issue is that MATLAB Compiler does not support Java Objects thus compiling the application as stand alone is not possible The second option uses new MATLAB Deployment Target MATLAB Builder for Java 1 0 This is an extension of the MATLAB Compiler 4 5 This tool converts MATLAB algorithms into standard Java classes The new builder eliminates the time consuming and error prone process of recoding an algorithm created in MATLAB into Java Disadvantage is that the new software has to be purchased Also it is necessary to know how to use the MATLAB tools Considering these two options the second one is the only solution because it is not possible to create a stand alone application of MATLAB integrated Java files in MATLAB 5 2 Technical design and specification This Section describes the technical design and specifications of MIDAS In Section 5 2 1 the technologies used for MIDAS are described software libraries and used plug ins Next in Section 5 2 2
110. of the Spectrogram Analysis is reliable and correct in order to perform multiple sequence alignment In this example source code testing is used to provide reliability and correctness With GUI testing little inaccuracies are discovered Examples are wrong warning message pops up when clicking a button and empty help files GUI testing measures the usability consistency and capability of the system 7 2 Usability test In the evaluation phase it is very important to evaluate your product using usability tests If these participants are the actual end users of your product they can give feedback so that the product fits to their needs Since MIDAS is developed in an iterative incremental software process each iteration needs an evaluation in order to see if the last iteration is reached or that iteration is needed to improve the product The usability test of MIDAS consists of two parts Observed usability test 7 2 1 and the unobserved usability test 7 2 2 7 2 1 Observed usability test The participants during the last evaluation phase where the actual end users of MIDAS the bioinformatics of Philips Research Bangalore MultiModal Interface for DNA Alignment of Sequences 71 A set of assignments was given the participants including a manual on how to complete these assignments After each assignment they were asked to give their feedback The participants were also observed during their test session in order to investigate their behavi
111. or distribute the Program or its derivative works These actions are prohibited by law if you do not accept this License Therefore by modifying or distributing the Program or any work based on the Program you indicate your acceptance of this License to do so and all its terms ane conditions or copying distributing or modi tying the Program or works based on it 6 Each time you redistribute the Program or any work based on the Program the recipient automatically receives a license from the original licensor to copy distribute or modify the Program subject to these terms and conditions You may not impose any further restrictions on the recipients exercise of the rights granted herein You are not responsible for enforcing compliance by third parties to this License 7 If aS a consequence of a court judgment or allegation of patent infringement or for any other reason not limited to patent issues conditions are imposed on you whether by court order agreement or otherwise that contradict the conditions of this License they do not excuse you from the conditions of this License If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations then as a consequence you may not distribute the Program at all For example if a patent license would not permit royalty free redistribution of the Program by all those who receive copies directly or indirectly through you the
112. our the errors made and how intuitive MIDAS actually is An overview of these results is given below The results from this usability test were very positive and a lot of ideas came up during these sessions to improve correct or extend MIDAS Since it was not possible to implement each specific need of the user choices have been made To get a deeper understanding of all the needs and their priority a MoSCoW analysis can give the answer In the Improvements chapter 9 a MoSCoW analysis is given for the next iteration to MIDAS 2 0 The test persons were asked to complete the following assignments Install MIDAS Feedback Indeo5 codec pack should be included in the installer Java runtime environment should be included in the installer Suggestions Run New Spectrogram Analysis Feedback Use correct biological names in the panels Back button for changes in your settings Suggestions Show the ranges of the settings on the panel Create a windows based menu Output panel for Spectrogram Analysis Feedback Separate the BLAT panel and the ClustalW panel from each other Suggestions Make the cluster tree clickable to select a cluster Spectrogram Video Feedback It works perfectly fine Suggestions Make it possible that you do not need to wait during video creation but let it run in the background Load Spectrogram Analysis Project Feedback It works perfectly fine Suggestions 7 2 2 Unobserved usability test Besides the o
113. plementation phase will be discussed in chapter 6 which explains the visualization of a hierarchical clustered tree MATLAB functions and problems and solutions of this project Chapter 7 is dedicated to the testing and evaluation phase where the usability tests and evaluation is discussed In chapter 8 MIDAS and its features are presented The final phase in this project is defining which improvements can be made for future versions which can be found in chapter 9 MultiModal Interface for DNA Alignment of Sequences 10 2 Project Definition 2 1 Aim Provide bioinformatics tools that integrate sequence alignment genome annotation and spectral clustering and alignment under the same standalone application 2 2 Background In bioinformatics the main topics are data analysis and knowledge representation In both topics there are multiple open challenges 1 Visualization and interactive tools that pull in information from a variety of genomic tools and resources Currently the computational tools offer only limited visualization capabilities so there needs to be better information presentation 2 Enhanced statistical processing with biological knowledge representation Data analysis and processing currently is dominated with methods that incorporate statistical analysis and pattern recognition 3 The full analysis however should include an array of tools that will work together to pull in the information from the multiple modalities o
114. puts files on and retrieves from the server ClustalWGUI ClustalWGUI is the User Interface for ClustalW This GUI uses ClustalWExecutables for running ClustalW and synchronizes files with the LINUX Server On this GUI additional settings can be set for ClustalW ClustalWFilterDuplicates MultiModal Interface for DNA Alignment of Sequences 46 ClustalWFilterDuplicates filters the duplicate sequences for ClustalW This class reads a file and writes a file with the duplicates removed if needed BLATExecutables BLATExecutables passes the commands to the server BLATExecutables extends ServerExecutables This class gives the run BLAT command and puts files on and retrieves from the server BLATGUI BLATGUI is the User Interface for ClustalW This GUI uses BLATExecutables for running ClustalW and synchronizes files with the LINUX Server On this GUI additional settings can be set for BLAT Tools Figure 19 Tools Write ToFileGUI WriteToFileGUI is an interface which allows a user to set an output directory and the name of the file to be written from the output obtained in MIDAS This is used for BLAT and ClustalW outputs Console Console replaces the JAVA console Console displays the JAVA console output ConsoleGUI ConsoleGUI is the interface of the Console ConsoleGUI contains a text area which prints the stacktrace and displays a progress bar FileChooserGUI FileChooserGUI is a user interface for browsing jFileChooser ja
115. quence 6 and 7 the trees in Figure 35 are constructed 2378 1479 5652 1187 3914 3274 3274 NM 1187 3914 3196 3202 3175 3284 3274 3196 3202 3175 3284 274 3181 3187 3181 3187 Figure 35 Tree sequence 6 and tree sequence 7 Java draw method After these steps the format file is written the result is visible in Figure 36 A Here you can see the Spectrogram Image next to format file As said above each line contains the clusters for that sequence As you might notice sometimes duplicates of a cluster name appear in the format file This is done because that is how the Java draw method knows on which hierarchy level some cluster needs to be drawn At first all the coordinates of the clusters are computed and as follows the tree will be drawn This approach has been chosen because MIDAS displays 50 sequences at most at one image For each image MIDAS loads the corresponding cluster coordinates and draws the clustering for that Image MultiModal Interface for DNA Alignment of Sequences 59 Unsorted Sorted 1 1 1 2 1 2 1 3 1610 4 10 4 10 3 10 5 10 3 1610 6 2378 640 1 16 10 6 2378 640 1 16 10 3016 2378 640 640 1 16 10 3016 2378 640 640 1 16 10 2378 640 1 16 10 3016 2378 640 640 1 16 10 Figure 36 Java draw method Figure 36 shows the unsorted format file For the java draw method we first sort the file on number of clusters of each line which results in the
116. r screenshot This sets the height of one window so that the annotation fits next to the window The image is vertically stretch with this scalar ClusterDiscontiguousSpectro m Description Preliminary summary FASTA format This program reads multiple DNA sequences from a FASTA file and generates a clustered RGB image of the spectrogram output This program calls CreateDiscontiguousClusteredSpectrogram m These two programs deal exclusively with multiple sequences provided in FASTA format with headers that are formatted as described below This will be primarily of interest when wanting to obtain higher resolution and more focused studies of specific features annotated in genomes Allowing for cross species and interchromosomal comparisons to be made The resulting image is saved to the hard disk so it can later be processed by createClusteredSpectroVideo m This processing will turn the large image into an annotated movie according to the parameters in the createClusteredSpectroVideo m file Author Evan Santo 6 26 2006 gt Species_Prefix chr or StartPos EndPos sequence atcgtct MultiModal Interface for DNA Alignment of Sequences 64 gt Next sequence header Changes This code is adjusted so that it can be called from within JAVA CreateSpectrolmages m is called in this function to create the partial images for the scrollable spectrogram in MIDAS This m file is rewritten as a function so that it can be called f
117. re 45 All the additional settings can be set in this panel The BLAT and ClustalW tool are located on the Philips Server If the user is not connected and authenticated yet that has to be done first Figure 46 ClustalW Input Output Input Output BLAT location cadappl blat 33 ClustalW location cadappliclustal v1 53 linux Query C Tempizincfinger 200 50 117 051 BLAT fore fa Query 5 200 50 111 521 dustalvy Filtered Fa Database data hg17 chr19 Fa Output Directory pismallRNAS 200 50 111 521 Clustalvy Output file output Type options GapOpen 8 Bootstrap default Additional Options Type options Database Type Query Type 500 The additional options of ClustalW are set here Please refer to the ClustalW help file for these options Enter your additional options in the text area below Enter the argument corresponding to the argument in the ClustalW help file Seperate multiple arguments by spaces The additional options of BLAT are set here Please refer to the BLAT help file for these options Enter your additional options in the text area below Enter the arqument corresponding to the argument in the BLAT help file Seperate multiple arguments by spaces Additional Settings Additional Settings ClustalW Help Figure 45 ClustalW settings panel MultiModal Interface for
118. resting chapters for the software engineers and computer scientist are chapter 2 till 7 as these chapters contain detailed information about each softvvare development phase Also a programmer s manual is appended in the appendix Bioinformatics might be interested in chapter 6 1 which describes how the visualization of the patterns in spectral clustering is implemented and chapter 8 which describes the developed prototype and its features Hereby we would like to thank Dr Nevenka Dimitrova Research Fellow and Senior Director our supervisor at Philips for this project She has guided us in this project and introduced us to the world of Bioinformatics and Spectral Analysis Thanks to Dr Chetan Mittal for explaining about the spectral ordering technique and supporting us in making this prototype suitable to the bioinformatics Also we would like to thank Dr L J M Rothkrantz as our supervisor at the University for his feedback and support Thanks to Dr ir C A P G van der Mast for feedback about usablilty And Dr P G Kluit for helping with our Java design issues June 29 2007 Melissa Yuen Shan Cheung Wijnand Paul van den Haak Summary Major research efforts in Bioinformatics include sequence alignment gene finding genome assembly protein structure alignment protein structure prediction prediction of gene expression and protein protein interactions and the modeling of evolution To perform these specific tasks different tools are
119. rk complete source code means all the source code for all modules it contains plus any associated interface definition files plus the scripts used to control compilation and installation of the executable However as a special exception the source code distributed need not include anything that is normally distributed in either source or binary form with the major components compiler kernel and so on of the operating system on which the executable runs unless that component MultiModal Interface for DNA Alignment of Sequences 95 itself accompanies the executable If distribution of executable or object code is made by offering access to copy from a designated place then offering equivalent access to copy the source code from the same place counts as distribution of the source code even though third parties are not compelled to copy the source along with the object code 4 You may not copy modify sublicense or distribute the Program except as expressly provided under this License Any attempt otherwise to copy modify sublicense or distribute the Program is void and will automatically terminate your rights under this License However parties who have received copies or rights from you under this License will not have their licenses terminated so long as such parties remain in full compliance 5 You are not required to accept this License since you have not Signed it However nothing else grants you permission to modify
120. roAnnotImagesPanel is the panel for output of Spectrogram Analysis contains spectrogram images slider to go through the images the annotation and the cluster tree Additional is the sequence alignment panel to select annotation from the annotation overview to run on BLAT or to give a range to run on ClustalW NOTE The sequence alignment panel is now only available for Discontiguous SpectrA as the code for extracting the sequences from SpectrA only supports Discontiguous sequences ClusterTree ClusterTree builds a Hierarchical tree of Clusters It reads the file created by MATLAB buildClusterInfo m findClusterCorrespondingToPerm m Then it sorts the information into hashmaps and computes the coordinates needed to draw a clustered Hierarchical Tree NOTE If one sequence has more than MAX_RECURSION_COUNTER clusters the branch for this will sequence will not be drawn CLusterTreeGraphicsPanel ClusterTreeGraphicsPanel is the panel with the hierarchical tree The component paints the tree according to the coordinates computed by ClusterTree The Spectrogram Images corresponding with this tree all have a fixed dimension with a maximum size of 505x750 Thus the panel is fixed to the height of 750 The tree is drawn for each range the coordinates are translated in order to display the correct part of the tree NOTE If one sequence has more than MAX_RECURSION_COUNTER clusters the branch for this will sequence will not be drawn spectrogramAnaly
121. rom within Java The function call to createDiscontiguousClusteredSpectrogram m is rewritten so that it can take arguments and give back also the freqDims FreqDims is needed as an argument in CreateSpectrolmages m and the function call to createDiscontiguousClusteredSpectrogram needed the outputPath as an argument Every output is placed in the outputFolder which MIDAS specifies This m file now always creates a text file of the short annotation because this is used to get the sequence out of a spectrogram by using the Perl program getSequenceFromSpectra pl in MIDAS This programs this this annotation file as input If a user selects annotationstate ON than the full annotation text is also generated in AnnotationFULL txt Some new arguments are added For a complete view of the arguments a list is placed below Note For more information about this code check the original source code of Evan Santo In these source files the full comments will explain his work Arguments segFile Path of input file FA format outputPath Path of folder where the output will be placed stftWindowSize Length of a sequence that will be considered a window stftWindowOverlap Overlap between adjacent windows annotDir Directory were the annotation files are placed annotState Flag for annotation on 1 or off 0 cluFlag Flag for clustering on 1 or off 0 normalMethod Method for normalizing numOfMean Numbers of mean numOfStd Numbers of Std
122. rs were found Remarks Test Case Run ClustalW without existing project Actions performed MultiModal Interface for DNA Alignment of Sequences 105 Problems Cause Solution Result Remarks Checked pre conditions Set the normal settings and additional settings Click on OK button Checked output ClustalW Checked Console output Checked post conditions No problems Run ClustalW is tested and no errors were found Test Case Disconnect from Philips Server Actions performed Problems Cause Solution Result Remarks Checked pre conditions Click on the option Disconnect from server from the menu Check the popup given by the system which explains that the system is disconnected Checked post conditions No problems Disconnect to Philips Server is tested and no errors were found MultiModal Interface for DNA Alignment of Sequences 106 Appendix D User Manual MIDAS VLO Multimodal Interface for DNA Allenment of Sequences Melissa Yuen Shan Cheung Software Technology Computer Science Technical University of Delft The Netherlands myscheung gmail com Wijnand Paul van den Haak Media and Knowledge Technology Computer Science Technical University of Delft The Netherlands paulvandenhaak gmail com April 2007 June 2007 Table of contents E SYSTEM DESCRIPTION NS 111 2 INSTALLATION aa a a aaa 113 3 SPECTROGRAM ANALYSIS id 115 3 1 RUN DISCONT
123. running and sees the progress of analysis in a console output 2 Bioinformaticus obtains spectrogram images and annotation If the Spectrogram Analysis could not be completed an error is given to the bioinformaticus in case of an IOException or RuntimeException In case of a system crash system must be restarted MultiModal Interface for DNA Alignment of Sequences 19 Special requirements none Use Case relationships uses Input Chromosome for Analysis Input Discontiguous DNA for analysis Create spectrogram video Actors Bioinformaticus Pre conditions Use case Use Spectrogram analysis must be completed first Post conditions The spectrogram video has been created Basic flow 1 Bioinformaticus selects create Spectrogram Video option for a chromosome or a discontiguous chromosome 2 Bioinformaticus is able to watch Spectrogram Video in a MATLAB pop up screen during its creation Alternative flows If the spectrogram video could not be created an error is given to the bioinformaticus in case of an IOException or RuntimeException Special requirements none Use Case relationships Includes Use Spectrogram analysis Watch spectrogram video Actors Bioinformaticus Pre conditions Use case Use Spectrogram analysis and Create spectrogram video must be completed first Post conditions The spectrogram video has been created Basic flow 1 Bioinformaticus selects Open video current tab option 2 Bioinformatic
124. s commands interactively when run you must cause it when started running for such interactive use in the most ordinary way to print or display an announcement including an appropriate copyright notice anda notice that there is no warranty or else saying that you provide a warranty and that users may redistribute the program under MultiModal Interface for DNA Alignment of Sequences 94 these conditions and telling the user how to view a copy of this License Exception 1f the Program itself is interactive but does not normally print such an announcement your work based on the Program is not required to print an announcement These requirements apply to the modified work as a whole If identifiable sections of that work are not derived from the Program and can be reasonably considered independent and separate works in themselves then this License and its terms do not apply to those sections when you distribute them as separate works But when you distribute the same sections as part of a whole which is a work based on the Program the distribution of the whole must be on the terms of this License whose permissions for other licensees extend to the entire whole and thus to each and every part regardless of who wrote DOi Thus AL 2S nor the intent Oi this section to Glain Tights Or cones your rights to work written entirely by you rather the intent is to exercise the right to control the distribution of derivative or collectiv
125. ses After these tests everything seemed to work fine Test Case Use Spectrogram Analysis Actions performed Problems Cause Solution Result Remarks Checked pre conditions Checked Console output Checked spectrogram images and annotations Checked post conditions No problems Use Spectrogram Analysis is tested and no errors were found Test Case Create Spectrogram Video Actions performed Checked pre conditions Select create Spectrogram video option from menu for a chromosome or a discontiguous chromosome Inspect the video during its creation Checked post conditions MultiModal Interface for DNA Alignment of Sequences 102 Problems When a video is manually erased from the folder the system still thinks that the video is created and gives a message that the video exists Cause The system only checked if the video folder was present instead of checking the video file Solution The system currently checks if there is any file in the video folder which has an extension avi Result After solving the problem the next test for Create Spectrogram Video resulted without errors Remarks A few more tests have been executed to check what happens if there is an avi file if there is no avi file if the bioinformaticus clears the folder and even when the bioinformaticus deletes the folder Every test resulted positively Test Case Watch Spectrogram Video Actions performed Checked
126. similarities between Java and C However there are also many differences with C being described as a hybrid of C and Java with additional new features and changes Both C and Java offer a broad range of resources for designing GUIs and action handlers Differences can be found in the exception handling Java supports checked exception while C only supports unchecked exception Checked exceptions enforce the programmer to declare all exceptions thrown in a method and to catch all exceptions thrown by a method invocation This is more consistent in error handling C is a relatively new language with a smaller user base than Java This is due to the fact that Java is older and Java s network computing support the Java applets are often used on the web Java is the best option for MIDAS The exception handling is very useful as MIDAS is a rather complex system Also the larger user base history and the large amount of useful open source packages Java offers can contribute to the performance and consistency challenges of MultiModal Interface for DNA Alignment of Sequences 34 MIDAS Of course integrating three different tools into one application might be troublesome in consistency The network computational support is very convenient for MIDAS as it has to communicate with the Philips Linux Server And in the future MIDAS might become a web application Main control of MIDAS The following issue is How to combine the mathematic
127. sis Figure 17 Spectrogram Analysis package MATLABSpectrogramAnalysis MATLABSpectro contains the Java methods for the Spectrogram Analysis MATLAB scripts These scripts were made by Evan Santo The methods are build to JAVA by the MATLAB JAVA BUILDER 1 0 The converted scripts are MultiModal Interface for DNA Alignment of Sequences 44 ClusterDistcontiguousSpectrogram m ClusterChromosomeSpectrogram m createChromosomeClusteredSpectogram m createDistcontiguousClusteredSpectogram m createClusteredVideo m createSpectrolmages m For the detailed documentation about this code please refer to the MATLAB code SpectroSettingsGUI SpectroSettingsGUI is the GUI for the Spectrogram Analysis Settings Panel This class initializes the SpectroSettingsCHRPanel and SpectroSettingsDISTPanel SpectroSettingsPanel SpectroSettingsPanel contains the general Spectrogram settings for Distcontiguous and Chromosome Clustering SpectroSettingsCHRPanel SpectroSettingsCHRPanel is the panel of chromosome Spectrogram Analysis This class extends SpectrogramSettingsPanel This panel contains all the settings to run a chromosome Spectrogram Analysis SpectroSettingDISTPanel SpectroSettingsDISTPanel is the panel of discontiguous Spectrogram Analysis This class extends SpectrogramSettingsPanel This panel contains all the settings to run a discontiguous Spectrogram Analysis LoadsSpectroAnalysisGUI LoadsSpectroAnalysisGUI is the interface for loading of a
128. ssassaaasag AAGCCCCTGAGATCAAAGAT X T aaGCCCCTGAGATCAAAGCAT gt HUM chr12 53016917 53017116 04 177 of 200 chrl2 118736654 119104509 of 132449811 MUDA maa TACT ee scasssasttagcetgggeatggtggtgeatgectgtagtcccagctactcegggaggetgag 367676 inn h n mum mA qcadqdaqaatcacttdaacccadd daatdaacccaddadqdcadaqcttacaqtdadqcc lt write to File Hostname 161 85 26 136 Login wijnandp Authenticated YES Figure 7 BLAT output panel The figure displays the Write to File button which gives the possibility to write the BLAT output to a text file After pushing this button a save dialog will pop up figure 8 Write To File Save directory 12 Temp Output name clustalFincFinger_200 50 Figure 8 Write to File dialog 4 2 Run ClustalW 1 Press sequence alignment in order to start sequence alignment Two buttons will appear Run BLAT and Run ClustalW 2 Give the range in the input fields for ClustalW of the sequences to align 3 Press Run ClustalW 4 Login will appear if you are not logged on the server yet You need to have an account for the Philips Linux Server Note The file transfer between the Server and local machine will cause MIDAS to freeze some time Please be patient MultiModal Interfa
129. stering and with full annotation Run Spectrogram Analysis with clustering with full annotation Run Spectrogram Analysis with clustering without annotation obtain the output of Spectrogram Analysis projects in a folder Philips Linux server e Login by giving hostname username and password e Connect to the server e Disconnect to the server Use BLAST BLAT e Load one sequence selected by user from the Spectrogram Analysis annotation list e Obtain the output of the BLAST BLAT sequence alignment e Run BLAT separately Use ClustalW e Load one cluster specified by the user from the Spectrogram Analysis by giving the start and end index of that cluster e Obtain the output of the ClustalW sequence alignment e Run ClustalW separately Load external projects e Load a project which has generate his output on a high performance grid Help files e Open and read help files for MIDAS BLAT and ClustalW Save e Save the output of BLAT or ClustalW as text files 3 2 2 Non Functional Requirements Quality requirements e Performance characteristics o System is fast e Reliable o Information statistics are reliable and correct Error handling and extreme conditions e Failure handling o Saved data will not be lost User Interface and human factors e Clustering spectrogram image and annotation must be visible on one screen e ClustalW alignment partial spectrogram image and annotation must be visible on one screen BLAT alignment partial spectrogram image a
130. sualization it is easy to get an overall impression on the tasks and deadlines It is also possible to see the dependencies between the tasks which tasks are done simultaneously and which are not The tasks are divided among twelve weeks as illustrated in figure 2 The project schedule for MIDAS is illustrated in the Gantt chart Figure 2 MultiModal Interface for DNA Alignment of Sequences 12 Figure 2 Gantt chart Ganttchart Bachelor Project Philips 2007 1 Introduction 1 1 Formalities Philips 1 2 Read paper about bioinformatics 2 Analyse phase 2 1 Research amp Requirements 3 Design phase 3 1 Technical specifications 3 2 Architectural design 3 3 Interface design 4 Implementation phase 41 Integrate sequence Aligment with spectrA 4 2 Interface and Visualization 4 3 Implement shell interface 44 implement overal system Testing 5 Evaluation amp Redesign phase 5 1 Userbility test 5 2 Analyze concept redesign 5 3 Implement redesign Testing 5 4 Deployable 6 Final phase 6 1 Testing Finishing touch software 6 2 Write report Presentation T Back up time 13 MultiModal Interface for DNA Alignment of Sequences 3 Requirements New developments in the medical world are rising fast As Philips invests in these new developments they have a high market share in the medical department A lot of research projects and concepts developed at Philips will contribute to the future of Medical Systems One of those projects is Wo
131. t of Sequences 21 Alternative flows 2 Bioinformaticus presses OK button 3 BLAT output is being generated In case of a system crash the system has to be restarted In case of a connection error the bioinformaticus will be informed If bioinformaticus pushes cancel button current windows will close Run ClustalW on own input Actors Pre conditions Post conditions Basic flow Alternative flows Bioinformaticus A valid input file with the FASTA fa extension should be present The ClustaLW output is given to the user on a tab in the interface 1 Bioinformaticus sets the required and the additional settings 2 Bioinformaticus presses OK button 3 ClustalW output is being generated In case of a system crash the system has to be restarted In case of a connection error the bioinformaticus will be informed If bioinformaticus pushes cancel button current windows will close Disconnect from Philips Server Actors Pre conditions Post conditions Basic flow Alternative flows Special requirements Bioinformaticus Use case Login to Philips server must be completed Bioinformaticus is disconnected from the Philips server 1 Bioinformaticus selects Disconnect from server in the menu 2 A pop up is given to the bioinformaticus that the system is disconnected form the Philips server In case of a system crash the system has to be restarted None Use Case relationships Includes Login to Philips server
132. text area below Enter the argument coresponding to the argument inthe SLAT helo file Seperate m ihniec arguments by spaces Additional Settings BLAT Help Figure 6 BLAT settings panel 6 7 Press OK button BLAT will run Note The file transfer between the Server and local machine will cause MIDAS to freeze some time Please be patient The output panel will be loaded with the output files figure 7 Note if some mismatch occurred with the input arguments the output panel can be empty as MIDAS does not detects errors concerning content MultiModal Interface for DNA Alignment of Sequences 122 Spectrogram Analysis Sequence Alignment Connection Help Sequence Aligment Outputs Output File Location C Temp smallRNAs_ 200 50 11 52 BLAT test psl psLayout version 3 rep Q gap T gap T gap blockSizes 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 53016917 53017116 132449811 53016916 53017116 132449811 118736654 11910450 132449811 98863996 99124514 132449811 120132372 12021728 132449811 7911551 7390419 3 132449811 601087 601862 3 132449811 112549106 11284028 132449811 11
133. the component diagram and the deployment diagram are described in order to clarify the MIDAS components in its environment 5 2 1 Software The programming environments are Java and MATLAB Java is used for the User Interface integrating the programs and for the connection to the Linux server MATLAB is used for its computational functionality for the data mining part of this project For programming in Java and MATLAB the next software was used Eclipse 3 2 Java MATLAB R2006b MATLAB MIDAS runs BLAT ClustalW and Spectrogram Analysis under one application The specifications of these programs are given below BLAT BLAT is a software tool for bioinformatics which performs rapid mRNA DNA and cross species protein alignments BLAT is more accurate and 500 times faster than popular existing tools for mRNA DNA alignments and 50 times faster for protein alignments at sensitivity settings typically used when comparing vertebrate sequences http www genomeblat com ClustalW ClustalW is a general purpose multiple sequence alignment program for DNA or proteins It produces biologically meaningful multiple sequence alignments of divergent sequences It calculates the best match for the selected sequences and lines them up so that the identities similarities and differences can be seen Evolutionary relationships can be seen via viewing Cladograms or Phylograms http www ebi ac uk clustalw MultiModal Interface for DNA Alignment o
134. the normalization of the pixel values of the color spectrogram numOfMean parameters for normalization MultiModal Interface for DNA Alignment of Sequences 63 Changes NormalizeImage m Description Preliminary summary Changes numofstd The matrix is first clustered in STFT space This clustering is then used to reference the RGB image so that the rovvs windows can be arranged in the correct order The outputpath is given as an input argument to this file and describes where the output must be placed This function normalizes the RGB_Image such that all elements have values smaller than one Two methods are supported statisticalMax or absoluteMax Numerical values are casted to doubles because Java treats uncasted variables from MATLAB as integers CreateSpectrolmages m Nevv File Description Summary Arguments outputPath RGB Image imageScale numWin windowHeight This function is called by ClusterDiscontiguousSpectrogram or by ClusterChromosomeSpectrogram in the last command and takes the RGB_Image This RGB_Image is splitted into images of n windows each and than saved in the output folder which the user specifies These separate images are used by MIDAS to make the complete movie scrollable in the interface to set the path where the images are placed to get the images from freqDims used in loop to know how much images are needed set the scaling of the images Number of windows pe
135. ther cluster thus not a leaf and the tree has more branches In this case the corresponding node C is bigger than the N number of Sequences The branch can be found by C N which is the cluster name of the MultiModal Interface for DNA Alignment of Sequences 57 next branch These steps are repeated till the leafs of one tree are found The search algorithm used to construct the tree in this linkage table is the pre order depth first search algorithm The search order of the pre order depth first search F algorithm is demonstrated in Figure 31 The order is F B A D C E G 1 H 5 e The cluster names are appended in a format file Each line number in the format file corresponds to the sequence A d index Thus the first line in the format file is sequence 1 second line is sequence 2 etc Each sequence belongs to 2 H one or several clusters In the format file the cluster names belonging to one sequence are appended in the Figure 31 Pre order depth first corresponding line 3 If there are more sequences qo to step 1 For every sequence the corresponding clusters should be found 4 Write the format file The format file is written for the draw function in Java To illustrate this algorithm see the next example For this example the smallRNAs fa first image first seven sequences are used smallRNA contains 3274 N sequences The Spectrogram image is presented in Figure 32 Figure 32 Partial Spectrogram I
136. ttings Show BLAT help Show BLAT output Figure 7 Improved task diagram analyzing discontiguous chromosome MultiModal Interface for DNA Alignment of Sequences 29 Analyse Chromosome v Analyze video annotatfo A Analyze alignment Get settings from user v Input output settings Input file fa Annotation file Output location Parameter settings Start position Orientation Window size Window overlap Normal method Number of mean Number of Std Additional settings Show scrollabe video Show annotation Show clusters Get input from user for BLAT Login to Linux server Y Clustering flag Annotation flag Annotation classes Annotation types Get user information Get settings from user v Host Username Passvvord Input output settings BLAT location on server Query file fa Database file fa Output file Format of output file Type options Database type Query type
137. ugh all the Spectrogram Images After each transition between images the corresponding annotations and cluster tree are loaded real time The labels of the slider are sequence numbers 3 4 Run Spectrogram Video The Spectrogram Video can be created in MIDAS for a Chromosome or Discontiguous Spectrogram Analysis output The videos that are created or shown belong to the current active and visible tab of MIDAS Create Spectrogram Video 1 First a Spectrogram Analysis output panel tab should be visible Either from Run Spectrogram Analysis or Load Existing Project 2 In the menu bar select Spectrogram Analysis select Create Spectrogram Video and select either Create Discontiguous Spectrogram Video or Create Chromosome Spectrogram Video corresponding to the Spectrogram Analysis Type of the current tab 3 The Spectrogram Video will be created in MATLAB Do not interrupt the video because during the creation of the video MATLAB records the current desktop Watch Spectrogram Video 1 Select the Spectrogram Analysis output panel tab from which the video should be played MultiModal Interface for DNA Alignment of Sequences 119 2 In the menu bar select Spectrogram Analysis select Open Video Current Tab 3 The video will be shown in Media Player if it exists If the video does not exists an error message will pop up telling you to create the video first 3 5 Load existing Spectrogram Analysis project In MIDAS you can load an existing
138. ulated Improved solution Generate unique filenames in Java Errors Exceptions JVM hotspot crashes during MCR execution transfer between Server and local machine failures will be shown on the console These errors are mostly caused by incorrect input arguments On this level MIDAS do not provide error handling Improved solution To handle these errors expert knowledge is needed Only the Bioinformatics know the correct input combinations It is not possible to run BLAT or ClustalW on Chromosome Spectrogram Analyses The code used for extracting Sequences of Spectrograms getSequenceFromSpectra pl by Evan Santo only works for Discontiguous Spectrogram Analyses The BLAT and ClustalW function for Chromosome are disabled Improved solution getSequenceFromSpectra pl should be extended to work for Chromosome Spectrogram Analysis The clustered hierarchical tree from Spectrogram Analysis be very extensive as some sequences have above 500 levels The cluster tree in MIDAS is build recursively in MATLAB The MCR MATLAB Component Runtime machine can only compute 500 recursions Thus the cluster tree build in MIDAS is restricted to 200 recursions A sequence with more than 200 levels will not be drawn Improved solution change the format of the input file on which the tree is build Loading an external Spectrogram Analysis is constrained to the format of a MIDAS Spectrogram Analysis output directory This means that certain fil
139. ull comments will explain his work Arguments segFile Path of input file FA format outputPath Path of folder where the output will be placed stftWindowSize Length of a sequence that will be considered a window stftWindowOverlap Overlap between adjacent windows annotFile File were the annotation is placed annotState Flag for annotation on 1 or off 0 cluFlag Flag for clustering on 1 or off 0 normalMethod Method for normalizing numOfMean Numbers of mean numOfStd Numbers of Std annotClass Classes of features to annotate annotType Tyes of features to annotate opDelim Delimiting operator which is operating system specific sequenceName The name of the input file startPos This is the chromosomal coordinate that the sequence from the input file starts at ori Orientation can be either or imageScale Scalar for scaling the image in createSpectrolmages m numWin number of windows on one image in each frame in MIDAS windowHeigth scalar to stretch the spectrogram vertically CreateChromosomeClusteredSpectrogram m Description Preliminary summary This function returns a matrix of the spectrogram for the DNA sequence This program is designed to operate exclusively for large contiguous sequences in conjunction with ClusterChromosomeSpectrogram m winSize width of the Short Time Fourier Transform STFT window overlap the overlap between two consecutive STFT windows normalMethod method used for
140. up so that the identities similarities and differences can be seen Evolutionary relationships can be seen via viewing Cladograms or Phylograms Another analyzing approach is the Spectrogram Analysis of genomes With this technique frequency domain analysis is done in the genomes using tricolour spectrograms identifying several types of distinct visual patterns characterizing specific DNA regions The patterns and their frequency characteristics are related to the sequence characteristics of DNA Biologically meaningful patterns are found through this method 3 2 System Description MIDAS will integrate sequence alignment tools BLAT and ClustalW with the Spectrogram Analysis The application will visualize the output of Spectrogram Analysis by displaying the genome annotation spectrogram image and the clustered hierarchical tree To analyze the output the bioinformaticus bioinformatician is able to interactively select genome annotations for multiple sequence alignment 3 2 1 Functional Requirements User Bioinformaticus Bioinformaticus is able to Use Spectrogram Analysis e Load input FA format file for Spectrogram Analysis e Visualize output of Spectrogram Analysis SpectroVideo annotation e Visualize an overview of all the present clustering in the analyzed sequence e Run Spectrogram Analysis without clustering and without annotation MultiModal Interface for DNA Alignment of Sequences 15 Run Spectrogram Analysis without clu
141. us is able to watch Spectrogram Video Alternative flows If the spectrogram video could not be shown an error is given to the bioinformaticus in case of an IOException or RuntimeException Special requirements none Use Case relationships Includes Use Spectrogram analysis and Create spectrogram video Overview all present clustering DNA Actors Bioinformaticus Pre conditions Use case Use Spectrogram analysis must be completed first Post conditions The clustering is visible Basic flow 1 Bioinformaticus selects one spectrogram image 2 Bioinformaticus is able to see clustering in one image Alternative flows If the clustering could not be completed an error is given to the bioinformaticus in case of an IOException or RuntimeException Special requirements none Use Case relationships Includes Use Spectrogram analysis Login to Philips Server Actors Bioinformaticus Pre conditions The server must be online and user must have a Linux account Post conditions Login is authenticated and connection is established Basic flow 1 Bioinformaticus fills in hostname 2 Bioinformaticus fills in login name 3 Bioinformaticus fills in password 4 Bioinformaticus pushes OK button to login 5 Login is authenticated and connection is established Alternative flows If bioinformaticus pushes cancel button current windows will close In case of a system crash the system has to be restarted In case of a connection or authentication
142. va sun com is used for browsing through own directories This class contains methods to obtain a pathname to load or save the files needed for input output for MIDAS FileFilterAnnot FilterFileANNOT extends FileFilter to filter the correct format for input of MIDAS This filter filters annot files FileFilterFA FilterFileFA extends FileFilter to filter the correct format for input of MIDAS This filter filters fa files MultiModal Interface for DNA Alignment of Sequences 47 5 3 3 Sequence Diagrams Sequence diagrams illustrate the flow the system by depicting the interactions between the objects As MIDAS is a comprehensive system which has a great deal of features many sequence diagrams can be produced In this Section only the most important sequence diagrams are presented the flows which are not that obvious The main functionality of MIDAS is to run a Spectrogram Analysis and provide a visualization of the output and from the output the user has to be able to select annotations for sequence alignment Sequence diagrams are created for this main functionality This functionality has been split up in three sequence diagrams The first sequence diagram Figure 20 illustrates the interactions between objects when the user runs a Spectrogram Analysis This sequence diagram describes the choices the user has analyzing a chromosome or discontiguous and illustrates the flow of running a discontiguous Spectrogram Analysis The second Fi
143. w spectrogram settings window figure 2 will appear MultiModal Interface for DNA Alignment of Sequences 116 ES 5pectrogram Analysis Settings m Chromosome Analysis Input Output Input File Del Spectrovideo Methylated Cancer _UNmethylated Mormal Fa Input annotation file D Spectrovideo Annotation Annotation Full4nnotation HUM chr 15 annot Output location C Temp Discontigbous Parameter Settings Chromosome Start Position 29562621 Orientation Windowsize n Windowoyverlap SO Normal method Super Mean jw Number of Mean 3 Mumber of Std Additional Settings Clustering Annotatation Annotation Class es T transcriptional 6 genomic E epigenomic Annotation Types Hold Ctrl button to select multiple types Figure 2 Chromosome Spectrogram Analysis Settings panel 3 Select Chromosome 4 Give the correct input arguments on the settings panel Input file Press Open to select an input FASTA file Input annotation file Press Open to select an annotation file Output Location Press Save to select a directory for saving your output Start Position must be a positive number Orientation either or WindowSize windowSize should be an even number and between 0 and 50000 WindowOverlap window overlap is at maximum windowSize 1 Normal method select an option in the dropdown list Number of Mean must be a positive number Number of Std must be a positive number C
144. where the output must be placed This m file now always creates a text file of the short annotation because this is used to get the sequence out of a spectrogram by using the Perl program getSequenceFromSpectra pl in MIDAS This programs this this annotation file as input If a user selects annotationstate ON than the full annotation text is also generated in AnnotationFULL txt CreateSpectrolmages m New File This script file is explained earlier in this Section NormalizeImage m This script file is explained earlier in this Section 6 2 2 Creating a spectrogram video CreateClusteredSpectroVideo m Description Preliminary summary Changes This routine will create a Spectrogram video from 1 large RGB image and a corresponding annotation track as created and formatted by the ClusterChromosomeSpectrogram or ClusterDiscontiguousSpectrogram m programs Usually the Linux machines with the massive memory are used to create the clustered spectrogram image and annotation track These variables are then dumped to disk by a file writer These saved files then have to be dynamically loaded into the workspace on a Windows machine so the large image can be processed into multiple frames and rendered as one video Author Evan Santo This code is adjusted so that it can be called from within JAVA MultiModal Interface for DNA Alignment of Sequences 66 This m file is rewritten as a function so that it can be called from within Java
145. y in or among countries not thus excluded In such case this License incorporates the limitation as if written in the body of this License 9 The Free Software Foundation may publish revised and or new versions of the General Public License from time to time Such new versions will be similar in spirit to the present version but may differ in detail RO address new problems or concerns Each version is given a distinguishing version number If the Program specifies a version number of this License which applies to it and any later version you have the option of following the terms and nditions either of that version or of any later version published by the Free Software Foundation If the Program does not specify a version number E this License you may choose any version ever published by the Free Software Foundation 10 If you wish to incorporate parts of the Program into other free programs whose distribution conditions are different write to the author to ask for permission For software which is copyrighted by the Free Software Foundation write to the Free Software Foundation we sometimes make exceptions for this Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally NO WARRANTY 11 BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE THERE IS NO WARRANTY FOR THE PROGRAM TO THE EXTENT PERMITTED B
Download Pdf Manuals
Related Search
Related Contents
Samsung XQB48-2188 用户手册 Samsung SyncMaster KOHLER K-8804-2BZ Installation Guide Universal Dual Band WiFi Range Extender WN2500RP Magnat WSB 50 PRO HP 4400ca Maintenance and Service Guide Last updated Fantec SNT-135SATA-1 3.5" SATA HDD Mobile Rack Black Copyright © All rights reserved.
Failed to retrieve file