Home

Scalable Execution of LIGO Astrophysics Application in a Cloud

1. 43 Figure 28 Web Portal Screen Shot 8 iue E ve ndi 44 29 Web Portal Screen 9 5 5 To Bi 45 Figure 20 Web Portal Screen Shot 46 1 7 1 Introduction 1 1 Project Background Gravitational waves GW are ripples thought to occur in the fabric of space time that result from interstellar collisions explosions or movement of large and extremely dense objects such as neutron stars Those ripples can then pass through the space time that Earth occupies causing a distortion which Advanced LIGO 15 meant to pick up Currently several interferometric Gravitational Wave detectors around the world such as LIGO VIRGO GEO600 TAMA300 have been collecting data that could then be used by scientists for searching GWs 3 There are numerous search technologies for GW data known to the community There are many potential sources of GW of three main types stochastic burst and continuous Each search for a specific type of source will need to be performed over the certain parameter space with an optimal balance required between parameter space mismatch and computational resources Most of these searches can be represented as a workflow consisting of tasks linked through data dependencies Each workflow can then be replicated using different parameter set Numerous scientists could then use these multiple workflows for analyzing and searching G
2. Making this possible with ease efficiency and scale will help reach a long awaited breakthrough earlier than most people anticipate 1 2 Aim Leverage the strengths of cloud computing for computation intensive and storage intensive applications use computer resources in a scalable manner balance the load of resources dynamically according to their capabilities maximise the performance and efficiency 1 3 Objectives The problem of executing multiple workflows for multiple users 1s challenging Parallel executions of workflows could lead to resource contention as each workflow instance requires same set of data as input specific number of compute resources and are bound by deadlines set by users Thus the challenging tasks are l Allocate Cloud resources to tasks workflows and users effectively to avoid resource contention dynamic resource provisioning problem 2 Minimize delay in executing workflows task workflow users scheduling problem 3 balance the load of different resources In this project we accomplish all the goals above 1 4 Motivation The main motivation of this project is to solve the issues with huge computation of Gravitational Wave search which are encountered by the Physicist who use LIGO 1 9 applicaion It provide well structured solution for the coming LIGO data centre in Australia We also demonstrate it at the Fourth IEEE International Scalable Computing Challenge SCALE 20
3. esee 3 21 Figure 9 CCE number of tasks Engine capacity load calculating algorithm 3 22 Figure 10 Load Balancing Sequence 3 22 Figure 11 Engine 1 Worker 4 16 Tasks vs Execution 4 26 Figure 12 Engine 1 Worker 4 16 Tasks vs Time 4 26 Figure 13 Engine 2 Worker 4 8 16 Tasks vs Execution 4 28 Figure 14 Engine 2 Worker 4 8 16 Tasks vs Time 4 28 Figure 15 Engine 4 Worker 4 8 16 Tasks vs Execution 4 30 Figure 16 Engine 4 Worker 4 8 16 Tasks vs Time 4 30 Ewure DU enda 4 3 Figure 18 1000 tasks with 500 workers Tasks vs 4 32 Figure 9 Source o Scarch Resulb Rhe UM TETUR EEUU Ee Ee eee RE 4 33 Figure 20 Search Result of Source B3 ae e tei nt t ed tecti eiua e uu 4 34 Figure 21 Web Portal Screen Shot TL i ue eH betta e o et b bct eid 38 Figure 22 Web Portal Screen SINC C D 2 ndn bie Rode 39 Fieure 25 Web Portal Screem DOE tus tutes utei ce cob 40 Figure 24 Web Portal Screen uu eor rb P e rr het ei 4 25 Web Portal Sereen Shol S obteniendo deiade boe o A 42 20 eb Portal SCreen DOE ER 42 Figure 27 Web Portal ScreenShot 7 aes eto bea bett EB druide csv iei
4. DESIGN 3 16 3 1 SYSTEM ARCHITECTURE 3 16 3 2 LIGO GRAVITATIONAL WAVE SEARCH SYSTEM DESIGN 3 17 3 3 LOAD BALANCER 3 18 3 3 1 LOAD BALANCER ALGORITHM 3 19 3 3 2 CCE ALGORITHM 3 21 3 4 ENGINE 3 23 3 4 1 WORKFLOW ENGINE 3 23 3 4 2 ENGINE WEB SERVICE 3 23 3 5 WORKER 3 23 3 6 PORTAL 3 23 3 6 1 PROPERTY SETUP 3 24 3 6 2 TASK GENERATOR 3 24 3 6 3 TASK SUBMISSION 3 24 3 6 4 MONITOR 3 24 4 EXPERIMENT 4 25 4 1 DATA COLLECTION AND ANALYSIS 4 25 4 1 1 DIFFERENT COMBINATION WITH THE FIXED NUMBER OF SOURCES 4 25 4 1 1 4 Engine 1 4 26 4 1 1 2 Engines 2 4 28 4 1 1 5 Engines 4 4 30 4 1 1 4 Combined Data 4 31 4 1 2 LARGE SCALE EXPERIMENT WITH 500 COMPUTING WORKERS 4 32 4 2 APPLICATION OUTPUT 4 33 5 CONCLUSION AND FUTURE WORK 5 35 6 BIBLIOGRAPHY 37 7 APPENDICES 38 APPENDIX A USER MANUAL 38 Figures Figure 1 Components of Workflow 2 12 Figure 2 Gravitational Waves and LIGO nnne 2 14 Figure 3 Livingston Observatory Livingston 00151 2 15 Figure 4 System Architecture for Autonomic Scaling of PaaS Services 3 16 Figure 5 System Entities RelationsDitp z ue e aet y he eas 3 18 Figure 6 Load balancer Activity 3 19 Figure 7 Description of symbols used in the work see 3 20 Figure 8 PaaS load balancing algorithm
5. for source B8 each single point represents a result of one task We plot the graph with 1000 result which been stored on Amzon 53 dynamically 1000 dots link together and become the line above There is one dot which goes far away from line 15 the signal We have highlighted with description From the point view of Physics this 1s the signal of gravitational Wave 4 33 Plot for Source B3 77bba 48500 mean C 1 6 gt max C 48000 no signal 47500 47000 46500 46000 C statistic 45500 45000 44500 44000 43500 600 605 610 615 620 625 630 635 640 645 frequency Hz Figure 20 Search Result of Source B3 The graph above Figure 17 shows a small scale search with 40 tasks some points are missing as the failure of tasks but it 1s easy to identify the signal from this graph there are two signals in the graph which goes out of normal distribution of points I have also highlight them with blue description 4 34 5 Conclusion and Future Work Cloud data centres are prevalent nowadays there are a large number of IT enterprises selling their computer resource in the form of computation However the cooperation of computer resources in a scale manner 15 still a big challenge In this work we provide the whole system solution in order to maximum the usage of each computer resource and scale the size of system based on the amount of request from users We design four layered system which
6. higher Through comparing rest 2 charts we can see that once all the workers are busy the tasks left have to wait until a worker 15 available 4 29 4 1 1 3 Engines 4 4 Workers 58Workers 216 Workers Figure 15 Engine 4 Worker 4 8 16 Tasks vs Execution Time ewwSubmitTime FinishTime ewSubmitTime FinishTime 12 00 00 12 57 36 12 28 48 10 48 00 12 00 00 9 36 00 11 31 12 1 4 7 10 13 16 19 22 25 1 3 5 7 9 111315171921 eseSubmitTime FinishTime 3 21 36 2 52 48 2 24 00 1 55 12 12345678 9101112 Figure 16 Engine 4 Worker 4 8 16 Tasks vs Time At the third time we run experiment with 4 engines and we allocate 4 8 and 16 workers for each engine The first graph 15 similar to the last group which means the number of engine do not affect the performance of individual task From graphs of tasks vs time we have a clear idea that the time of submission is decided by the availability of 4 30 workers The last graph is almost flat as number of workers is greater than number of tasks Therefore nearly all task are submitted once the workers are ready 4 1 1 4 Combined Data 4 Workers X8 Workers 16 Workers 8 Workers 4Workers 16 Workers 4 Workers 8 16 Workers 2 909 Figure 17 Combined Result When combining all the data of this series experiment together we plot a graph as Figure 16 It 1s not hard to know the average execution time of LIGO ap
7. it Tasks left 15 useful for decide the number of new engines and new workers that are still needed to run Algorithm 2 CCE number of tasks Engine capacity load calculating algorithm 1 begin Set the capability of each free worker to Set the threshold of completion time of a single task to be M AX CompTime for each Engine e Eengines do Get average task completion time of e during the last n minutes Compute the availability of e where MAXCompTime M AX ime Compute the capability c of e according to percentage of availability where c a numberof workersofe maxtasksperworker J bi Refresh compute resource status 9 if New workers are ready then 10 Increase the capability of its engine NtW 11 not scheduled tasks Calculate tasks remaining to be assigned to a engine 12 return not scheduled tasks 13 end Figure 9 CCE number of tasks Engine capacity load calculating algorithm 1 1 1 j Authetication i i 1 1 Configure Parms Invoke Submit Tasks Monitor Resources Have tasks Calculate Engine Capability Load Monitor Query Execute Enough Resource Send Task Assign Tasks to Engine 2 No tasks Terminate unused I i Resources Store Result eum Result
8. of compute resources workers to each workflow engine that is running For e g if there are 3 engines WpE 5 then each engine will have 5 instances of EC2 under its management NtW Number of Tasks NtW that will be submitted to each worker Higher values will enable multiple tasks to be submitted to a worker which will in turn run in parallel in the worker as separate processes sizeof Array This function returns the length of the array that is passed to it as a parameter CC E integer This is the capacity calculation algorithm The argument to the algorithm is the size of the task queue MAXCompTime This value signifies the maximum completion time for a task in the application being submitted by the user Figure 7 Description of symbols used in the work First put all the tasks need to be done into a queue called Q refresh running computer resources put new resources into array of engine E engines and queue of worker to assign R workers respectively according to the type and property of instances Because it takes time to start computer resource required by the last loop we wait until there are enough workers in W to fully filly empty space of one engine of E or waiting list of computer resources R 15 empty Then we compute the capability of each engine of E with algorithm CCE In other words we measure how many new tasks the existed engines can handle during a reasonable time span If existed engines 15 not enough to
9. the scaling out and scaling in of the compute resources needed for hosting multiple PaaS services at the middleware level 2 a provisioning algorithm that instantiates compute resources at the infrastructure level required for running user applications We also describe the detailed functionality of different components of LIGO Gravitational Wave search system Through large amount of experiments we analyse the challenges when executing the application and demonstrate the feasibility of our scalable platform design using Amazon Cloud services Acknowledgements A large project such as this 1s impossible with only the power of individual It requires the coordination across different group with different people Many great 1deas are generated within the discussion among researchers and staffs I am only a member of professional research team There are many people who gave me valuable suggestions and help Firstly I would like to thank my coordinator Professor Rajkumar Buyya who 15 Professor of Computer Science and Software Engineering and Director of the Cloud Computing and Distributed Systems CLOUDS Laboratory at the University of Melbourne Australia He provided me the opportunity to work on such important project and offered me access to the necessary resources to conduct the research and experiments Also he provided supervision and guidance from the top level Secondly I would like to thank my supervisor Dr Suraj
10. workers in the Cloud 19 for each Engine e Eengines do 20 repeat 21 Assign tasks in to 22 until current engine reaches it maximum load 23 Start execution of tasks assigned to engine e 24 until ail the submitted tasks have been finalized 25 end Figure 8 PaaS load balancing algorithm 3 3 2 CCE Algorithm First set the capability of each free worker to be cw which means the maximum number of tasks can be allocated to the worker to ensure they will be done in a reasonable time Set the threshold of completion time of a single task to be tt this threshold 15 the standard to check whether the engine 15 overloaded or not If it 15 overloaded we stop assigning tasks to it Otherwise we assign more tasks to it Then for each engine of E we compute the average completion time ti during the last n minutes then we compare the average completion time ti with threshold tt to gain the percentage of availability of engine ei The capability ci of the whole engine ei 15 its availability multiple by the number of worker of ei multiple by maximum number of tasks per worker cw ci means how many more tasks can be assign to the engine ei Then we assign the worker in the queue W to the free spaces of engine ei of ZE because these workers are newly joined free worker we increase the capability of that engine ei 3 21 cw Finally we calculate how many tasks are still left after assigning to ei E according to the capability of ei and return
11. 1 1 held in conjunction with CCGrid 2011 conference in Newport Beach CA 2 Technology Review 2 1 Middleware Workflow Engine We describe a process of constructing and experimenting with the workflow on the Grid and cloud computing environment We then present the components of Cloudbus Workflow Management System GWMS 2 1 1 Workflow Deployment Cycle In the input phase scientists provide the input data batch scripts sample output files and application requirements In order to run the application on the Grid with heterogeneous resources the executables are required to be compiled and installed can be submitted at runtime at both remote and local sites For quality assurance or initial results verification this step involves the testing of conformance of our execution with that of the sample results provided by the scientists Once the initial results are verified the workflow structure and its details need to be composed in the designing phase In this phase the automated generation of the workflow in terms of the workflow language used can also be done by taking into account the run time parameters that users might want to change during execution 7 In the Execution phase the resources where the application can be executed need to be setup The resource list its credentials and the services provided by each resource need to be inserted into the catalogue When experiments are conducted repeatedly and in time resource av
12. 589 Server Name ec2 50 19 159 151 compute l amazonaws com ec2 184 72 71 91 compute l amazonaws com ec2 50 17 70 201 compute l amazonaws com ec2 107 20 15 52 compute l amazonaws com ec2 184 73 3 221 compute l amazonaws com ec2 107 20 15 52 compute l amazonaws com ec2 184 72 200 86 compute l amazonaws com Current task batchprocess batchprocess batchprocess batchprocess batchprocess batchprocess batchprocess Figure 30 Web Portal Screen Shot 10 Finished Failed 46
13. Notify Result Available lt View Result ree EM Get Result 1 I 1 o Figure 10 Load Balancing Sequence Diagram 3 22 3 4 Engine There are two components in each engine One 15 workflow engine which is responsible for scheduling tasks and allocates tasks to worker The other one 1s engine web service which receives tasks and command from load balancing based on SOAP web service 3 4 1 Workflow Engine Workflow engine manages the workers that belong to it and tasks been assign to it It can monitor the status of workers and the execution of tasks dynamically It was an workflow 3 4 2 Engine Web Service Once a engine starts running first it receives the configuration files from load balancer which contains necessary configuration parameters of workflow engine Then task files including application file service file and credential file will be received for computation After all task files transfer successfully a command of starting broker will be triggered The broker retrieves all the received tasks and pushes them to the task queue all the tasks will be assigned to different workers for execution At the mean time engines keep checking the status of workers and tasks 3 5 Worker The worker is the actual computer resource that executes LIGO Gravitational Search application Once worker have receive task file from the engine it start executing After it finish the task it will sen
14. Pandey who is a Research Fellow at the Cloud Computing and Distributed Systems Laboratory CLOUDS of the Computer Science amp Software Engineering department at the University of Melbourne Australia He guided me on the project from very beginning design to the finalization of the paper Not only when drawing the big picture of the project but also implementing the detailed part of program he 1s keeping helping me all the way through He 1s an expert in both academic research and software development The discussion with him enlightens me on many perspectives of the project I highly appreciate all the help he provide Lastly I would like to thank all the researchers of the Cloud Computing and Distributed Systems Laboratory the Department of Computer Science and Software Engineering and Physics Department of Melbourne University We thank Amazon for providing us an educational grant that enabled us to experiment using EC2 and S3 Cloud resources Table of Contents 1 INTRODUCTION 1 8 1 1 PROJECT BACKGROUND 1 8 1 2 1 9 1 3 OBJECTIVES 1 9 1 4 MOTIVATION 1 9 2 TECHNOLOGY REVIEW 2 11 2 1 MIDDLEWARE WORKFLOW ENGINE 2 11 2 1 1 WORKFLOW DEPLOYMENT CYCLE 2 11 2 2 INFRASTRUCTURE AMAZON EC2 CLOUD 2 12 2 2 1 AMAZON EC2 FUNCTIONALITY 2 13 2 3 APPLICATION LIGO LASER INTERFEROMETER GRAVITATIONAL WAVE OBSERVATORY 2 13 2 3 1 GRAVITATIONAL WAVES RIPPLES IN THE FABRIC OF SPACE TIME 2 13 2 3 2 How LIGO WORKS 2 14 3
15. Scalable Execution of LIGO Astrophysics Application in a Cloud Computing Environment Presented By Dong Leng Submitted in total fulfilment of the requirements of the degree of Master of Engineering in Distributed Computing The Cloud Computing and Distributed Systems CLOUDS Laboratory The Department of Computer Science and Software Engineering The University of Melbourne Australia Abstract LIGO gravitational wave search application requires a system that can provide resources on demand Platform as a Service 5 1s such a solution that helps manage user applications in a scalable manner by using resources delivered through a virtualized Cloud data centre It is a necessary characteristics for the PaaS layer to dynamically adjust the load on computing platforms based on agreed quality of service QoS parameters with Cloud users In order to make the platform services scalable there 1s a need for a decentralized service design and autonomous provisioning algorithms in contrast to a centralized PaaS service design In this report we present a software design for LIGO application which enables multiple PaaS services to run on decentralized PaaS layer that can manage dynamic load arising from application services running on top of it There are two core algorithms in this system design which automate the load balancing at multiple layers of the Cloud software stack 1 a dynamic resource provisioning algorithm that facilitates
16. Ws 3 Without support for scheduling and management of data and tasks 1n the worst case the parallel execution of these multiple workflows will be reduced to sequential execution due to contention of common Cloud resource Currently users of LIGO data such as Ms Sammut are bound to submit their search procedures using Condor DAG scripts to compute resources located primarily in Germany and the USA that are managed by Condor Once the Australian Data Center 1s established data generated locally resides 1n the Australian Data Center where the computations can be carried out The main impact of the solution scalable executions of multiple workflows will be to design from the ground up a user centric optimally configured Data Center in preparation for the day when the data is accessible to a wider community of users e g traditional electromagnetic astronomers 1 8 demonstration of a working prototype dynamically provisions middleware components for load balancing purposes and uses virtualized memory for efficient I O will show the extensibility and usefulness of virtualized Data Centers e g Cloud data centers provided by Amazon GoGrid etc in terms of cost savings and resource utilizations From Physics point of view a successful detection of a source like Sco X 1 will help to determine the origin of the GW quadrupole A nondetection will yield upper or lower limits on many of the quantities still under investigation 6
17. ailability and conditions will have changed This requires services and credentials list to be generated for each execution with the help of the catalogue The application 1s then executed on the Grid Usually debugging and testing 1s done while the application 1s being executed but this depends on the software development process being used Depending on the analysis of the results from the output phase the workflow design can be further optimized according to the requirements of the user These are the basic steps involved in constructing most of the scientific workflows to be 2 11 executed on Grid but doesn t generalize all applications The process can get complicated when more user and system requirements are added 7 User Interface Gridb us Workflow Engine SFTP Lr utt Cluster Workstation Cloud Services Grid Resources Figure 1 Components of Workflow Engine 2 2 Infrastructure Amazon EC2 Cloud Amazon Elastic Compute Cloud Amazon EC2 is a web service that provides resizable compute capacity in the cloud It is designed to make web scale computing easier for developers Amazon EC2 s simple web service interface allows you to obtain and configure capacity with minimal friction It provides you with complete control of your computing resources and lets you run on Amazon s proven computing environment Amazon EC2 reduces the time required to obtain and boot new server instances to m
18. ansfer 2 3 Application LIGO Laser Interferometer Gravitational Wave Observatory 2 3 1 Gravitational Waves Ripples in the Fabric of Space Time Albert Einstein predicted the existence of gravitational waves in 1916 as part of the theory of general relativity He described space and time as different aspects of reality in which matter and energy are ultimately the same Space time can be thought of as a fabric defined by the measuring of distances by rulers and the measuring of time by clocks The presence of large amounts of mass or energy distorts space time in essence causing the fabric to warp and we observe this as gravity Freely falling objects whether a soccer ball a satellite or a beam of starlight simply follow the most direct path in this curved space time 8 2 13 When large masses move suddenly some of this space time curvature ripples outward spreading in much the way ripples do the surface of an agitated pond Imagine two neutron stars orbiting each other A neutron star 1s the burned out core often left behind after a star explodes It is an incredibly dense object that can carry as much mass as a star like our sun in a sphere only a few miles wide When two such dense objects orbit each other space time 15 stirred by their motion and gravitational energy ripples throughout the universe LUN lt J Li NN the force of which is so great gt it creates rip
19. ation files ready Now we may run the application with single click it 15 shown in Figure 25 After a short time the running tasks will be recorded in the database We can monitor the details of running tasks dynamically by clicking Monitor from left navigation as shown in Figure 28 44 Also users can see running worker for each task Figure 29 Monitoring tables for execution 46785bf4 e504 41c8 afea 61752b38f5f4 Application ID b57bfa9d 957e 4662 8d01 475e454189fe ee393d92 6a67 4af4 b561 d2669f22196f d896126f 153b 4105 943d f6b71db5067a 28 0 687 6 97 45 7 a2c7 7068eafea2ea Node name ofjobs execution jobs jobs batchprocess 3 0 1 2 batchprocess 1 0 1 0 batchprocess 1 0 1 0 batchprocess 7 0 1 6 Figure 29 Web Portal Screen Shot 9 Number Jobsin Finished Failed Submit Finish 1 4 25 Tue Jul 19 01 04 25 EDT 2011 1 4 27 Tue Ju Tue Jul 19 19 01 04 27 01 11 01 EDT EDT 2011 2011 1 4 28 1 12 51 Tue Ju Jul 19 19 01 04 28 01 12 51 EDT EDT 2011 2011 1 4 46 Tue Jul 19 01 04 46 EDT 2011 Time N A 06 34 08 23 N A 45 Application ID b57bfa9d 957 4662 8401 475 454189 993492 6 67 4 4 b561 d2669f22196f d896126f 153b 4105 943d f6b71db5067a 28e5c687 6f97 45c7 a2c7 7068eafea2ea 2bd928b9 89d1 4b79 afda 42c11e793118 4aee8118 2505 49a3 689 57005 482 26 B6f81db8 8e39 4029 9795 cBe8fo2af
20. d result back to engine 3 6 Portal Users can manage all their tasks and cloud resources through web portal All the operations are Ajax driven The container of portal is Tomcat server JavaServer Faces and Richfaces are used for font end implementation 3 23 3 6 1 Property Setup the configuration of workflow database and load balancing can be accessed from configuration section of portal 3 6 2 Task Generator Users first submit a signal source file with regular parameters then set up the customized parameters After set up all the parameters properly use java internal xml property to generate task xml files the number of files depends on the search band parameter 3 6 3 Task Submission For each source of each user the task files are saved in a separate folder When submitting tasks the files of folder will be retrieved and push the tasks in a task queue Tasks will be polled out from queue when load balancer can assign it to engine for execution 3 6 4 Monitor Portal enable users monitor all the tasks computer resources and the relationship between them Tasks are categorized by application when user submit the application ID all the tasks of that application will be listed in a table users can monitor the progress of each task such like the submission time time consumption and the number of times task fails Also there 1s another table for describing tasks 1s executed by which worker 3 24 4 Experi
21. data centres 3 2 LIGO Gravitational Wave Search System Design Based on the system architecture described in the Section 3 1 we design the system for LIGO Gravitational Wave Search Figure 5 shows the entities of LIGO Gravitational Wave Search System and their interactions We select Amazon EC2 as cloud resource provider The whole system 1s deployed to Amazon EC2 to gain better performance and lower latency There are four main components in this system web portal load balancer engine and worker Web portal takes the input of users and provides the monitor of running tasks and results Load balancer takes the responsibility of distributing tasks and instantiating computer resources dynamically This 1s done in an autonomic manner based on the work load of the system Engine is an middleware that used to schedule and assign tasks to different workers Worker 15 the actual resource that executes the LIGO application We will discuss each of these components with details in the following sections 3 17 Amazon EC2 Cloud Invoke Query Upadte Request Resource Database Load Amazon EC2 Balancer Server Transfer Command Assign Tasks Figure 5 System Entities Relationship 3 3 Load Balancer Load balancer 15 the core of the whole system It manages all the computer resources from cloud data centre The activities of load balancer show in Figure 6 Based on the amount o
22. e start the experiment by one engine and limit the number of workers to be 4 8 and 16 Unfortunately the application still have some bugs most jobs of 8 workers do not record done time successfully Therefore we could not collect the result for workers 8 group From the Figure 9 we can see most of tasks are done around 8 mins when the worker number goes up to 16 1 evolves a little higher failure less tasks are been done In Figure 10 there are two charts The left side 15 for 4 workers when the number of task reach the multiple of 4 there are sharp upwards It is reasonable that all the workers 4 26 are busy The right one 15 for 16 workers trend 15 flat as most of tasks could be handled at once 4 27 4 1 1 2 Engines 2 8 Workers 84 Workers 16 Workers Figure 13 Engine 2 Worker 4 8 16 Tasks vs Execution Time ewsSubmitTime FinishTime ewSubmitTime FinishTime 14 24 00 2 24 00 M 13 12 00 12 00 00 10 48 00 0 00 00 1 4 7 1013161922252831 1 3 5 7 9 411 13 15 17 ewwSubmitTime FinishTime 23 31 12 23 02 24 22 33 36 Figure 14 Engine 2 Worker 4 8 16 Tasks vs Time At the second time we run experiments with 2 engines The workers are still limit to 4 8 and 16 respectively This time we get the done times for all 3 groups From the first chart we can see engine with 4 and 8 workers have similar execution time the 4 28 execution time of engine with 16 15 slightly
23. f request load balancer calculates the number of computer resources in need and sends the resource requests to cloud administrate server Load balancer adjusts the capability of system in a scalable manner Once more requests come in load balancer increases the capability of system by starting more instances On the other hand once less tasks left it decreases the capability of system by terminating instances As all the tasks have been done all the instances will be shut down 3 18 else new workers can fill at least one engine else Has job left Calculate Engines Capabili Enough Start New Engines and workers else Assign Jobs to Engines Invoke Start Engine Has free engine Stop engine and its workers Figure 6 Load balancer Activity Diagram 3 3 1 Load Balancer Algorithm 3 19 Symbol Description Q A queue to maintain the list of tasks submitted to the system for execution as applications by end users Engines A set of compute resources where the workflow engine has been installed All the resources in this set will be functioning as PaaS middleware EC2 An Amazon Elastic Cloud computing unit An instance vm EC2 is a virtual machine that is running in Amazon set of EC2 resources that are configured to execute end user applications WpE Workers per Engine WpE This constant directs the algorithms to allocate upto the value of WpE number
24. flow WORKFLOW NAME UPLOAD CONFIG FILE zi Add Stop configB4 cfg 5 uploading mew sar Figure 25 Web Portal Screen Shot 5 Available Config Files 7103 AVAILABLE WORKFLOWS Refresh Qo Application ID Name Input Files 46785bf4 e504 41c8 afea 61752b38f5f4 59 configB9 cfg px 2a65e242 3771 4bfc af68 1e0c44034748 source Bl px configB1 cfg 9ccd2300 6a33 4059 b6de 0e459c62faec 57 px configB7 cfg Figure 26 Web Portal Screen Shot 6 Show Graph Edit Delete Run Show Graph Edit Delete Run Show Graph 42 6 Setup the application parameters input name of source that we just uploaded Then click CREATE XML button The application files of task will generate immediately We can browser these files as shown in Figure 27 Generate new Ligo Workflow Application 000 HOME Iigosoft Iscsoft lalpulsa 75936746 76800746 CREATE XML Figure 27 Web Portal Screen Shot 7 43 Available LIGO Application Files Individual Frequencies for application 10e52a87 d45e 4947 b362 dfce7 dfa50c6 7104 AVAILABLE WORKFLOWS AND 1000 AVAILABLE XML FILES Name Actions ligo50 0 xm Delete ligo51 0 xml Delete liqo52 0 xm i Delete ligo53 0 xml Delete liqgo54 0 xm Delete ligo55 0 xm Delete ligo56 0 xm Delete liqo57 0 xm i Delete ligo58 0 xm i Delete liqo59 0 xm Delete 1 Figure 28 Web Portal Screen Shot 8 7 As we have already gotten applic
25. handle all the tasks in T we compute how many more engines and workers we still need and start these computer resources Next poll task from T and assign it to engines each time we decrease the capability of that engine by 1 Repeat this step until there 1s no capability left for that engines then trigger start application function of that engine to run broker for newly added tasks Retrieval all the engines till there 15 no capability left After these we go back to the beginning and repeat all the steps until all tasks have been finished 3 20 Algorithm 1 5 load balancing algorithm 1 begin 2 Maintain a list of submitted tasks in queue C 3 repeat 4 Refresh list of running computing resources 5 Divide newly added instances vm EC to set of engines E engines and workers Rworkers 6 Associate up to W pE amount of workers in Rworkers to each engine e Bengines 7 if size of Rworkers gt 0 OR all waiting compute resources available AN D sizeof Q gt 0 then s Number of tasks remaining to be submitted for execution nPending CC E sizeof Q 9 if n Pending gt 0 then 10 Workers to run wr nPending Ntw 11 wPending wr 12 for each Engine Eengines do 13 Free slots for engine e es WpE current number of workers in e 14 wPending wPending es 15 if Pending gt then 16 L Engines to run er wPending W pE Integer division 17 Provision er engines in the Cloud 18 _ Provision wr
26. includes web portal for controlling and monitoring front end users load balancer for scaling system and assigning tasks engine for scheduling tasks and worker for executing tasks This system have successfully applied to search gravitational waves which evolves a large amount of computation In our experiment thousands of tasks can be executed in parallel The computer resources scale in and out according to the amount of tasks The load of each resource 15 balanced dynamically Even such system design 15 well proved there are still many improvements could be made in the future We have solved the bottleneck of engine with traditional 3 layers system However the web portal 1s still facing the pressure when the request increases A solution 1s to set up several tomcat server to support multiple web portal During our experiment we actually have implemented it and it 1s easy to configure according to our experience Another improvement which could be done 15 to change the protocol of web service from SOAP to REST SOAP is a heavy and complex protocol compared with REST When conducting the experiment and test we do have trouble with getting the response message with SOAP REST 15 more appropriate for this project for its light weight feature At this stage we only support Amzon EC2 cloud later system could extend to multiple cloud provider even mix cloud data centres with clusters In 5 35 addition middleware Workflow Engine assig
27. inutes allowing you to quickly scale capacity both up and down as your computing requirements change Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use Amazon EC2 provides developers the tools to build failure resilient applications and isolate themselves from common failure scenarios 9 2 12 2 2 1 Amazon EC2 Functionality Amazon EC2 presents a true virtual computing environment allowing you to use web service interfaces to launch instances with a variety of operating systems load them with your custom application environment manage your network s access permissions and run your image using as many or few systems as you desire 9 To use Amazon EC2 you simply e Select a pre configured templated image to get up and running immediately Or create an Amazon Machine Image AMI containing your applications libraries data and associated configuration settings Configure security and network access on your Amazon EC2 instance Choose which instance type s and operating system you want then start terminate and monitor as many instances of your AMI as needed using the web service APIs or the variety of management tools provided Determine whether you want to run in multiple locations utilize static IP endpoints or attach persistent block storage to your instances Pay only for the resources that you actually consume like instance hours or data tr
28. ment In order to fine tune our system we conduct a large amount of experiment from different perspectives Even through the process of experiment evolves many difficulties such like the failure of application and Amazon cloud resources we finally get most of expected data for analysing Generally there are two groups of experiment one 1s a serious of experiment of different combination of engines worker and sources The other is a large scale of experiment for each source with 500 computing workers In this group we get gravitational wave search result which contains all the points 4 1 Data Collection and Analysis 4 1 1 Different combination with the fixed number of sources First run a series of experiment with the fixed number of sources and different combination of engines and workers Limit the total number of tasks to be 40 To gain better understanding of practical environment we try to use different source for each experiment Set the number of engines to be 1 2 and 4 respectively Also limit the number of workers to be 4 8 and 16 Run through all possible 9 combinations altogether 4 25 4 1 1 1 Engine 1 4 Workers 5316 Workers Figure 11 Engine 1 Worker 4 16 Tasks vs Execution Time emmsSubmittime emmFinish time ewwSubmitTime eFinishTime 1 12 00 anid 13 12 00 22 48 00 12 00 00 21 36 00 20 24 00 10 48 00 1 4 7 1013161922252831 12345678 910111213 Figure 12 Engine 1 Worker 4 16 Tasks vs Time W
29. n tasks based on number of cores of resource In the case of searching gravitational wave the application consume a great amount of memory it is unstable to assign several tasks Optimization could be made to middleware based on the nature of application 5 36 6 Bibliography 1 Broberg J R Buyya and Z Tari MetaCDN Harnessing storage clouds for high performance content delivery Journal of Network and Computer Applications 32 2009 pp 1012 1022 2 Collaboration L S Lal lalapps Freesoftware gpl tools for data analysis http www sc group phys uwm edu daswg Online 3 Deelman E G Singh M H Su J Blythe Y Gil C Kesselman G Mehta K Vahi G B Berriman J Good A Laity J C Jacob and D S Katz Pegasus A framework for mapping complex scientific workflows onto distributed systems Scientific Programming 13 2005 pp 219 237 4 Izal M U G Keller E Biersack P Felber A Hamra and G L Erice Dissecting bittorrent Five months in a torrent s lifetime 2004 5 Pandey 5 D Karunamoorthy and R Buyya Cloud Computing Principles and Paradigms Wiley Press 2011 pp 299 319 6 Sammut L A Melatos C Messenger and B Owen A frequency Comb Search for Sco X l in American Astronomical Society Meeting Abstracts 214 American Astronomical Society Meeting Abstracts 214 2009 pp 000 06 7 5 Pandey WVoorsluysl M Rahmanl R Buyyal J Dobson2 K Chiu A Grid Workflow E
30. nvironment for Brain Imaging Analysis on Distributed Systems 2009 John Wiley amp Sons Ltd 8 LSC LIGO Scientific Collaboration http www ligo org Online 9 Amazon Web Services AWS What is AWS http aws amazon com what is aws Online 37 7 Appendices Appendix A User Manual 1 Type the address of web portal in web browser and log in with username and password Workflow Management System for e Science Applications THE FOURTH IEEE INTERNATIONAL SCALABLE COMPUTING CHALLENGE SCALE 2011 WELCOME TO THE CLOUDBUS WORKFLOW MANAGEMENT PORTAL FOR SCIENTIFIC WORKFLOW APPLICATIONS Cloudbus project The University of Melbourne I s Australian Government M 4 Australian Government um t LA Department of Education Science and Training MEI BC RNI SI Australian Research Council Figure 21 Web Portal Screen Shot 1 38 2 After log in to user panel select Workflow Property Configuration from left navigation Workflow Actions A Create Submit Monitor Monitor All Actions Cloud Services Generate Amazon Service XML Ligo Actions Generate Ligo Workflow XML fMRI Actions EMO Actions Configuration A Workflow Engine Gridbus Broker Ligo Configuration Configure Ligo Properties Figure 22 Web Portal Screen Shot 2 3 Setup the parameters for Workflow engine and save it 39 save it Workflow Prope
31. o arms have identical lengths then interference between the light beams returning to the beam splitter will direct all of the light back toward the laser But if there 1s any difference between the lengths of the two arms some light will travel to where it can be recorded by a photo detector 8 Figure 3 Livingston Observatory Livingston Louisiana The space time ripples cause the distance measured by a light beam to change as the gravitational wave passes by and the amount of light falling on the photo detector to vary The photo detector then produces a signal defining how the light falling on it changes over time The laser interferometer 15 like a microphone that converts gravitational waves into electrical signals Three interferometers of this kind were built for LIGO two near Richland Washington and the other near Baton Rouge Louisiana LIGO requires at least two widely separated detectors operated in unison to rule out false signals and confirm that a gravitational wave has passed through the earth 2 15 3 Design 3 1 System Architecture uo e td 9 a a Web Service Web Service Web Service 8 C Workflow Engine 2 C Workflow Engine 2 eee C Workflow Engine 2 Server Server Server nm 2 m X D a m em Amazon EC2 Amazon S3 Figure 4 System Architecture for Autonomic Scaling of PaaS Services We propose a la
32. ples Two neutron stars orbit each other and collide merging into one which reach Earth and will be detected simultaneously at two observatories Figure 2 Gravitational Waves and LIGO In 1974 Joseph Taylor and Russell Hulse found such a pair of neutron stars in our own galaxy One of the stars 1s a pulsar meaning it beams regular pulses of radio waves toward Earth Taylor and his colleagues were able to use these radio pulses like the ticks of a very precise clock to study the orbiting of neutron stars Over two decades these scientists watched for and found the tell tale shift in timing of these pulses which indicated a loss of energy from the orbiting stars energy that had been carried away as gravitational waves The result was just as Einstein s theory predicted 2 3 2 How LIGO Works LIGO will detect the ripples in space time by using a device called a laser interferometer in which the time it takes light to travel between suspended mirrors 15 measured with high precision using controlled laser light T wo mirrors hang far apart 2 14 forming one arm of interferometer and two more mirrors make second arm perpendicular to the first Viewed from above the two arms form an L shape Laser light enters the arms through a beam splitter located at the corner of the L dividing the light between the arms The light 1s allowed to bounce between the mirrors repeatedly before it returns to the beam splitter If the tw
33. plication 1s 9 minutes and most of tasks have been done around 9 minutes Only a few of them takes much longer time as the failure of execution Less workers generally have better performance and incur less failure 8 workers for each engine 15 appropriate combination in terms of resource consumption and performance 4 3 4 1 2 Large scale experiment with 500 computing workers wwExecunton Time Submission Time o IN Figure 18 1000 tasks with 500 workers Tasks vs Time Chart Area 0 00 00 We conduct many large scale experiment with up to 500 workers Figure 18 shows the submission time and execution time for each task There are intervals for submission time periodically because we set a thread pool for submitting tasks to prevent exhausting the resource When the thread pool 15 full submission of tasks will wait We can see there are white gaps between blue lines because some tasks are failed or their done time could not be recorded successfully This is an issue with workflow engine that we should address in the future 4 32 4 2 Application Output C statistic Plot for Source B8 ea334 12000 x mean C 1 6 lt X no signal signal of 8000 6000 4000 2000 0 100 200 300 400 500 600 700 800 900 1000 1100 frequency Hz Figure 19 Source 8 Search Result We gain Figure 16 after running through all 1000 tasks
34. rites Configuration Gridbus Broker Host HSQLDB Host HSQLDB Port Tuple Space Host Tuple Space Port Storage Host Storage Host User Storage Directory Storage Transfer Protocol Aneka Storage Host Aneka Task Service URL Enable Striped File Access true false Striped File Access User Striped File Access Pwd Save the configuration ec2 174 129 142 168 compute ec2 174 129 142 168 compute ec2 174 129 142 168 compute 200 ec2 174 129 142 168 compute home spandey wfOutput ftp ec2 67 202 18 242 comp spandey Figure 23 Web Portal Screen Shot 3 4 Navigate to Ligo property configuration Setup the parameters for Load balancer and 40 Ligo Properites Configuration workersPerEngine avgTimeThredhold 6 wfPerWorker timeSpan 0000 completionTimeSpan workerAvailable v D w A c UJ engineAM I workerAMI ami f64db69f accessKeyid secretAccessKey keyPairName spandey us east1 appName applicationPath contextPath wiFolderPath pkiFilePatn localKey Users surajpandey ssh id ds Save the configuration Figure 24 Web Portal Screen Shot 4 5 Select Ligo action gt Generate Ligo Workflow XML from left navigation go to the bottom section type the name for workflow application and upload source configuration file Then click refresh button the description and operation of workflow application will come up in the table 41 Edit Work
35. yered design that can process multiple users and their workflows in a scalable manner as depicted in Figure 4 At the bottom most layer we use virtualized resources provided by various Cloud service providers e g Amazon EC2 GoGrid Azure etc We explore a novel way of managing commonly used data across all VM instances by using application level Virtualized Memory which allows applications on multiple servers to share data without replication decreasing total memory need At the middleware level we use our existing middleware solutions Workflow Engine 5 for managing application workflow at task level and Aneka 5 for managing jobs one workflow block as depicted in Figure 3 and or tasks that are submitted by the workflow engine In order to scale out scale in the middleware components we will design a load balancer distributor algorithm that dynamically instantiates workflow engines each running on a separate VM based on 1 User priority 2 Number of waiting jobs user requests coming to the server request level parallelism and 3 Average completion time of workflows submitted to the virtualized resources With the coordination of middleware component provisioning and compute resource provisioning at that 3 16 Dynamically Provisioned LIGO Search Dynamically Provisioned infrastructure level we propose to scale out and scale in resources to manage multiple workflows submitted by large number of users across Cloud

Scalable Execution of LIGO Astrophysics Application in a Cloud

Contents

Download Pdf Manuals

Related Search

Related Contents