Home
as a PDF
Contents
1. Database Rows Database Rows 173 stat running Job Instance Total Aggregate Job Total Rows Rows Figure 20 Number of Database Entries for a Workflow The above graph shows how much this structure reduces the number of database rows as the jobs progress through our system This data is from a workflow that was executed on our development portal accessing production grids The workflow had 6 abstract jobs and was executed as a parameter study causing a total of 26 job instances to be executed In total the workflow took 2687 seconds As can be seen in the graph the number of database rows vastly decreases between stat running to stat_JobInstance and stat_JobInstanceStatus This drop is primarily due to 44 stat running entries being inserted as a function of time and job instances where stat JobInstance entries are bounded by job instances and the number of state transitions they experience The drop from stat Jobinstance to stat AggregateJob is due to two groupings done First is the grouping of similar job instances into one aggregate job The second grouping is due to grouping the similar states from stat JobInstanceStatus because a job could enter some states an arbitrary number of times within stat AggregateJobStatus 45 5 CONCLUSION In this project we successfully created a system for the collection of usage statistics for integration with the WS PGRADE
2. Storing data that satisfies this requirement allows us to calculate aggregate job data incrementally adding one job instance at a time 3 2 CALCULATOR SERVICE The calculator service s goal was to retrieve the data from the aggregate job tables and calculate the relevant statistics for each of the seven levels we are providing portal statistics DCI resource user concrete workflow workflow instance and abstract job The service did this in three steps firstit queried a set ofaggregate job entries then it calculated the changes in the statistics for each row for each of the seven levels of statistics that needs updating Finally it then performs an update on the statistic database tables The calculator service also managed some database clean up for the database component For the querying of the aggregate job entries there were several concerns As our calculator was being implemented as a simple web service we wanted it to only pull in a manageable subset of the aggregate job entries as the design called for the subset to be stored in memory This was addressed through a LIMIT clause on the SQL query Another concern with the query was a race condition with the database component As the database component needs to write to the stat AggregateJob table whilst the calculator needs to read from it we had to implement a guard that would allow the calculator to know when astat AggregateJob entry is complete or that
3. and concrete workflows 2 4 DESIGN CONCERNS During the design one of the main issues that was presented the amount of memory use and CPU load on the gUSE and WS PGRADE Grid Portal servers Our goal was to keep any load on these servers to a minimum so that the portal operation would not be impacted significantly This was one of the primary reasons for our calculator service to be a separate web service from gUSE We also designed our database components to function ona separate database from the gUSE database if called for The main concerns for the front end was how to display the amount of data provided in a simple and meaningful way that did not require too much hardcoding Furthermore we wanted to have a simple way to change what was displayed without having to touch the code Finally we wanted to be able to display some of the data graphically 27 3 IMPLEMENTATION For this stage of the project we sought to implement our system and add a portlet to the user interface to the WS PGRADE Grid Portal We first implemented the changes to the database which included defining schema changes and stored procedures Having the database defined allowed us to concurrently implement the user interface and the calculator service 3 1 DATABASE The database component of the system focused on creating and modifying database structures in order to aggregate the data from the gUSE system Our main concern was the scale of the data that we recei
4. lt gt userlD VARCHAR 255 lt gt portallD VARCHAR 255 lt gt startTime TIMESTAMP lt gt endTime TIMESTAMP lt gt terminated TINYINTI1 lt gt resource VARCHAR 255 lt entered TIMYINT 1 joblnstanceld INT 11 jobState VARCHARI 255 gt startTime TIMESTAMP gt endTime TIMESTAMP P id INT 11 BEF UPDATE TOAGGJOB Figure 11 stat_JobInstance and stat_JobInstanceStatus The stat_JobInstance table structure maintained data for each of the states that the job touched There are currently 23 possible states With the shown structure it is only required to maintain information about the states that are used Our system however is built with the assumption that the number of states can change Also one of the states was added for our system This structure also handles the case of the loops in the state diagram for jobs by allowing multiple entries for all of the states 30 Figure 12 represents a basic subset of the graph of states that jobs may traverse Figure 12 Simplified Job State Diagram during their execution The full list of possible job states is available in the appendix Primarily we store data on the transitions between states and combining different states allows us to draw conclusions about where in the system the job is waiting id BIGINTI20 lt Resource VARCHAR 255 lt gt NumberOfJobs INT 1 1 lt gt JobName VARCHAR 255 7 ID BIGINT 20 StartTS TIMESTAMP Lee
5. 18 2 2 2 USE CASE DIAGRAM 19 EEN 21 KENE 22 CN DER e EE 2 3 DATA AGGREGATION 2 4 DESIGN CONCERNS 3 IMPLEMENTATION rsssmanamamananananananananmanmananas 3 1 DATABASE 3 2 CALCULATOR SERVICE 3 4 UI 3 4 1 TOOLS LANGUAGES taa I U QQ 3 4 2 IMPLEMENTATION PROCESS eebe 34 3 KN EL ZAA OT PRODUCT EE 3 5 CONFIGURATION ERKANNT EEN 4 1 BACKEND TESTING 4 2 PORTLET TESTING 4 3 FUNCTIONALITY TESTING 4 4 DATABASE MEMORY CONSUMPTION S CONCLUSION hini Suasana onon on Re EE Re Re nn nn Ke tini 5 1 USER INTERFACE 5 2 BACK END 6 FUTURE WORK U U U EA u aus usss ES SE N N 6 1 REVISED ARCHITECTURE LEMETA BROKE ae ee ll lsi R ee s skua 6 12 ACCOUNTING l A di wt m od A 6 2 METRICS 6 3 UI ADDITIONS 26 27 28 32 35 35 36 36 39 41 42 43 43 44 46 46 47 48 48 48 48 REFERENCES rarnttrtrenrararananonnnonnonononooooooruourooooononosononononrororonosusunosonoonensosuourururueneonosonononsnrorosasasususnnnnnnnn 50 ID BAKU 52 APPENDIX B CLASS DIAGRAMS reene 56 APPENDIX B 1 CALCULATOR SERVICE 56 APPENDIX B 2 PORTLET DATA ACCESS LAYER 59 APPENDIX C STAT METRIC DESCRIPTION TABLE emer 60 APPENDIX D INSTALLATION MANUALL os oooasoooss ne nenn 64 APPENDIX D 1 DATABASE DEPLOYMENT 64 APPENDIX D 2 CALCULATOR DEPLOYMENT 65 APPENDIX D
6. 3 PORTLET DEPLOYMENT 66 APPENDIX D 4 STOPPING STATISTICS 66 APPENDIX E DATABASE DESCRIPTION eneen 67 APPENDIX E USER MANUA eege 70 F 1 INTRODUCTION 70 F 2 DCI METRICS 71 F 3 RESOURCE METRICS 71 F 4 USER METRICS 72 F 5 CONCRETE WORKFLOW METRICS 72 F 6 WORKFLOW INSTANCE AND ABSTRACT JOB METRICS 72 TABLE OF FIGURES FIGURE 1 DIRECTED ACYCLIC GRAPH EXAMPLE se se ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee ee teeoooooosoeseseeoooooooesesesooooooseesesenoanon 11 FIGURE 2 SYSTENFARCHITEETURE iss ee BR N se n ke Ge Ne kan onn EEN Ee N se ee ee u die EE 15 FIGURE 3 DATA FLOW DIAGRAM ENKEN saaa ee Ee see n saaa kose Ee Re n See ENNEN gee one ee ge ene gee Nee ig sees id 16 FIGURE4 USECASE DIAGRAM ES u ee ed EE W ap a Ee Gunde ee EE Ge n dan n od 19 FIGURE 5 SEQUENCE DIAGRAM DCI STATISTICS uu ee se se se ee ee ee AA EA Ee Ee AA ee ee ee ee ee ee ee ee a u ke Ne NA ila a 21 FIGURE 6 CANDIDATE DESIGN L n saaa bee Se Ee Ee es Ee ige Eer no a Ro ES NNN ee aaa sasa ele FIGURE 7 UI CANDIDATE DISPLAY DESIGN ee esse ese se se ee ee ee ee ee aa LLECH tete Ge ee ee AG Ge ee n n Ee Re AA Ge DYDY FF ee ee Ge ee FIGURES ORE EE EE RE ER EE EE a ka EAR fs FIGURE 9 DATA COMPOSITION DIAGRAM ee ee se ee ee ee ee ee Ee l EA Ee Ee ee L ee ee ee a a a a ee sese ee ua NA Ne E ee ee FIGURE 10 STAT RUNNING TABLE DESCRIPTION FIGURE 11 STAT JOBINSTANCE AND STAT_JOBINSTANCESTATUS ee ee ee ee ee ee ee ee Re ee ee e
7. DEI Resources guenn ee Resour Overall DCI Statisfts sige Epa FailuroRato 3 DCI Name Machinet43 fanure Rate 47 Total Number of Faned Jobs 100 Jobs Job Average Execution Time 631 05 Total Number of Jobs 215 Jobs Standard Deviation of Job Average Execution Time 550 0705 Figure 27 Selecting Resource To navigate to resource metrics the user will need to have already choosen a DCI Once a DCI is chosen a new dropdown list of available resources on that DCI will become available The user can choose the resource they wish to view 71 EA USER METRICS To navigate to user metrics the user clicks the User button at the top of any page once clicked they will be directed to a page with all ofthe current users metrics F 5 CONCRETE WORKFLOW METRICS To navigate to concrete workflow metrics the user selects the Concrete Workflow button at the top of any page Once the choice is made the user will need to choose which of their concrete workflows they wish to view They can select up to three to view at a time by holding the shift or ctrl keys when selecting multiple The statistics will be displayed below once the user clicks the button to the right of the selection menu in order that the concrete workflows appear in the list F 6 WORKFLOW INSTANCE AND ABSTRACT JOB METRICS test Portal Statistics DCI Statistics User Statistics Concrete Workflow Statistics HurryUpAndivaksMinutes wanao nurryUpAndWist_2011 04 11 0
8. ID BIGINT 20 SquaresJobTime INT 11 Indexes NumFailedJobs INT 11 NumJobs INT 11 URL VARCHAR 255 Gstatistics ID BIGINT 20 DCI VARCHAR 45 wfid VARCHAR 255 lt statistics ID BIGINT 20 totalTime INT numberOfWorkflows INT m squaresOfTime BIGINT name VARCHAR 255 averageTime INT URL VARCHAR 255 statistics ID BIGINT 20 gtdev INT lt gt statistics ID BIGINT 20 Figure 14 Calculator Database Structure The above table structure holds the final statistics for our system This structure allows us to isolate the storage of the statistic values such as average from the identity such as resource URL This simplified the table structure by removing common shared columns into a separate table The exception to this is workflow instance where we store the start and end time for the workflow instance and concrete workflow where we store statistics about the workflow as a whole However the difference in workflow instance is not maintained by the calculator and instead is maintained by the database component While this is not the ideal place for the responsibility it was necessary because we use that information to know when a workflow is complete so that the calculator only pulls aggregate jobs from complete workflow instances 34 The final task of the calculator service was the database cleanup Due to the triggers handling the data aggregation in the database component it
9. Run state 0 005 Average Time Spent n the Other State 0 00 a Standard Deviations Standard Devistion of Time Spent in the Fated Run State 0 006 Standard Devistion of Time Spent in tre Queue State 0 508 Standard Devistion of Time Spent in the Portal State 543 53 6 Standard Deviation of Time Spent in the Run state 6 00 Stendera Devistion of Time Spent in tne Otner State 6 006 Figure 18 shows the portal metrics and serves as the front page to the portlet The Numbers of Nodes rae gars Fa ureRate yo 100 tise Ese Total Timeis gt Hise Excarc Number of Times the Job Entered tne Fated Run State Centres Number of Times the Job Entered the Queue State 216 entries Number of Times the Job Entered the Portal State 216 entres Numer of Times the Jon Entered the Run state Centres Number of Times the Job Entered tne Other State 0 entries Figure 18 Final Product other pages are lain out in the same manner 40 Kid EW AH Workflow 22 Statistics Fa ureRate Fanar Rate 0 Total Number of Faned Jobs 0 jobs Job Average Execution Time 101135 Tote Nummer of Jos 6 Jobs Standard Deviation of Job Average Execution Time 21 806 4 am sel 400 2011 04 27 1009000 a Wonnos instance Times Average Time Spent in the Fated Run State 0 006 Average Time Spent in the Queue State 0 506 Average Time Spent in the Portal State 1010535 Average Time Spent m the Run State 0 005 Average Time Sp
10. The portals act as portlet containers and provide basic functionality to incorporate a portlet framework The WS PGRADE Grid Portal is the second generation of the original P Grade portal The portal allows creation and submission of workflows on multiple DCI s The portal uses 11 the Grid User Support Enviornment gUSE to provide the gird functionality One of the services is the gUSE repository which stores the workflow objects to be downloaded and further developed Furthermore it provides a forum for collaboration and enables workflows to be shared across the community 10 1 3 1 WS PGRADE GRID PORTAL AND GUSE The WS PGRADE Grid Portal and Grid User Support Enviornment gUSE are both products developed by MTA SZTAKI LPDS branch The WS PGRADE Grid Portal is the second generation of the P Grade Portal Itis a web based environment which provides tools for the development and execution of workflow based grid applications WS PGRADE added capability to better handle both parameter study and workflows and the internal structure changed to be a modular service oriented architecture based system This change was implemented through the development of gUSE gUSE provides a graphical environment that a user can define and execute grid applications on using the WS PGRADE as a user interface 9 1 3 2 LIFERAY Liferay Portal was created in 2004 It is a software platform for building websites and web applications 4 It can be u
11. WAITING 3 false StateType QUEUE SCHEDULED 4 false StateType QUEUE RUNNING 5 false StateType RUN FINISHED 6 true StateType TERMINAL ERROR 7 true StateType FAIL NO_FREE_SERVICE 8 false StateType PORTAL DONE 9 true StateType TERMINAL READY 10 false StateType QUEUE CANCELLED 11 true StateType TERMINAL 54 CLEARED 12 false StateType OTHER PENDING 13 false StateType OTHER ACTIVE 14 false StateType OTHER SUSPENDED 16 false StateType PORTAL UNSUBMITTED 17 true StateType TERMINAL STAGE_IN 18 false StateType OTHER STAGE_OUT 19 false StateType OTHER UNKNOWN_STATUS 20 false StateType OTHER TERM_IS_FALSE 21 true StateType FAIL NO_INPUT 25 false StateType FAIL CANNOT_BE_RUN 99 true StateType FAIL SUCCESS_RUN 55 false StateType SUCCESSRUN 55 APPENDIX B CLASS DIAGRAMS APPENDIX B 1 CALCULATOR SERVICE This class diagram describe the structure of the calculator service that calculates the statistics based off of the aggregate job data 56 DBBase PropertyManager con Connection properties Properties closeQ void getProperties Properties getConnection Connection ing getProperty key String String Sorter portals Map lt String Portal gt resources Map lt String Resource gt users Map String User abstractJobs Map lt String AbstractJob gt concreteWorkflows Map lt String ConcreteWorkflow gt workflowinstan
12. b StataggregateJobStatus StataggregateJobStatus populateStatus con Connection void lt lt create gt gt StataggregateJob lt lt create gt gt StataggregateJob id Long getld0 Long setld id Long void getResource String setResource resource String void getNumberOfJobs0 Integer setNumberOfJobs numberOfJobs Integer void getjobNameob String setJobName jobName String void getStartTS0 Date setStartTS startTS Timestamp void getEndTS0 Date setEndTS endTS Timestamp void getWflD0 String setWflD wflD String void getWrtlD0 String setWrtlD wrtlD String void getUserlD String setUserlD userlD String void getPortallD0 String setPortallD portallD String void setStates s Map lt String StataggregateJobStatus gt void hashCode int toString0 String getTotalRunTime int getFailures int getTimelnStateTypeltype StateType int getSguaresOfTimelnStateTypeltype StateType int getNuminStateType type StateType int stataggregateJoblD Long jobstate String min Integer max Integer total Integer squares Integer num Integer fromrst rst ResultSet StataggregatelobStatus lt lt create gt gt StataggregateJobStatus getStataggregateJoblD0 Long setStataggregateJoblD stataggregateJoblD Long void getJobstate String setJobstate jobstate String void getMin0 Integer setMin
13. four categories overall statistics runtimes in states standard deviation and number of times a job was in a run state The first category contained metrics such as the name overall time and failure rate Following categories were dependent on the states ajob could enter The states were run failed run queue portal and other These states were the combined states of all of the job states available in the system We created the pooled states because the user would not be interested in all the states available The run state was when a job would successfully pass through to completion The failed run state was how long the job would loop through the states as it was possible to go from run back to queue or another state The queue state was how long the job was waiting at a resource and the portal state was the time spent on the portal before being submitted to a resource Finally the other state encompasses any states that are not covered in the other joint states 39 test OC Statistics User Statistics V orkNow Statistics Portal Statistics Overall Portal Statistics Panure Rate 46 Total Number of Fated Jobs 900 jobs Job Average Execution Time 63326 Tote Number of Jods 216 Jobs Standard Deviation of Job Average Execution Time S43 773 5 Times Avecage Time Spent n tne Fated Run State 0 006 Average Time Spent in the Queue State 0446 Average Time Spent in the Portal State 632 745 Avetage Time Spent in the
14. great experience to travel abroad for our capstone project The following individuals deserve particular acknowledgement for their contributions to our project and for always making us feel welcome and a part of the LPDS community As mentioned previously Professor Dr P ter Kacsuk who provided us with the opportunity to work in Budapest with LPDS branch Also we would like to thank Dr Mikl s Kozlovszky for his help and support throughout the project making us feel welcome checking up on us when we were ill answering our daily questions and always ensuring our time here was both enjoyable and productive We would like to thank Sandor Acs for his help throughout the project as well as beneficial suggestions and ideas on what to do in Budapest as well as Gabor Herman for his friendly approach and assistance with the testing phase of our project Furthermore we would like to thank Akos Balask for all his technical support ideas and being there to answer daily questions and sort through bugs as well as Kriszti n Kar6czkai for his support with the database and setting up our development environment We would also like to thank Kitti Varga who helped us daily with printing ordering food and suggesting social events in Budapest as well as her welcoming attitude towards us in the office We would like to thank R ka Makkos who assisted us with the language barrier finding train schedules providing comfort and always checking in to make sure
15. information being recorded about how the workflows were executing This component could be used for example to monetize the portal usage 6 2 METRICS Another set offuture work would be to expand the set ofmetrics offered both on the portlet and from an API Currently our system does not provide data regarding the current state of the portal DCI s or User Metrics on these categories would be useful in particular to administrators to gain knowledge on how the portal or DCI s are being used Specifically there is a set of metrics that would be useful about the user that would be available through Liferay Also it would be possible to determine how many workflows are currently under submission using the stat WorkflowInstance table These additional metrics would be bestimplemented after the API is created as they do not all make sense to be storedina database Another set of possibly useful metrics would be to allow combinations of the current metrics Currently itis only possible to view workflow instances individually or combined in the concrete workflow It could also be feasible for the user to be able to choose a subset of the workflow instances to be combined 6 3 UI ADDITIONS Some new features could be added to the user interface in later work First a search function could be added to easily find a concrete workflow instead of having to find itin a drop down list Second the UI could be made more customizable and all
16. it will not have any more job instances 32 added to it This was solved by only querying aggregate jobs that the workflow instance that executed them is terminated The calculation step had to consume the aggregate jobs and calculate what effect they had on the pertinent statistics For each aggregate job the change in statistics is calculated for each portal user DCI resource concrete workflow workflow instance and abstract job using the identifier shown in the table below Table 1 Statistic Level Identifiers Statistics Level Identifier Portal Portal URL Resource Resource URL Concrete Workflow Workflow ID wfID Workflow Instance Workflow Instance ID wrtID Abstract Job Job Name and Workflow ID jobName and wfID User User ID 33 The change in the statistics are then stored in the database using some combination of SQL updates and SQL inserts for values that do not exist yet Calculator TotalTimelnStates INT 11 SquaresTimelnStates INT 11 Num INT 11 9 statistics ID BIGINT 20 StateType VARCHAR 255 7 wrtlD VARCHAR 255 jobName VARCHAR 255 statistics ID BIGINT 20 wfid VARCHAR 255 startTime TIMESTAMP cham O statistics ID BIGINT 20 endTime TIMESTAMP delta INT status VARCHAR 255 wilD VARCHAR 255 consumed BOOLEAN m ID BIGINT 20 userlD VARCHAR 255 lt TotalJobTime INT 11 oo statistics
17. method is to toggle off the StatAggregator using the URL SERVER StatAggregator toggle and drop the trigger on the stat_running table This will prevent data from progressing through the system and will stop the polling mechanism of the calculator service 66 APPENDIX E DATABASE DESCRIPTION Table 5 Database Table Descriptions Table Name Description stat running Intermediate Data many entries job instance supplied by gUSE modifications entered column default 0 when 1 delete database trigger stat JobInstance Intermediate Data one entry job instance stat JobInstanceStatus Intermediate Data one entry per job state transition stat AggregateJob One entry combining all Joblnstance with same JobName Resource wrtID stat AggregateJobStatus One entry per job state visited by any of the job instances combined into this stat Workflowlnstance One entry per workflow instance stat AbstractJob One entry per job in the DAG of a concrete workflow stat ConcreteWorkflow One entry per concrete workflow stat user One entry per user stat Workflowlnstance One entry per workflow instance stat portal One entry per portal stat DCI One entry per DCI stat_resource One entry per resource queue stat_statistics Contains calculated statistics about jobs One entry for each row in stat_user portal DCI resource Abs
18. more layer viewing resource workflow instance or abstract job metrics Back End Requirements 25 Another area offunctionality for this project was in maintaining the database structures that support the system The data had to be aggregated in such a manner that we did not consume all of the resources of the database However there was a drawback to aggregating data as detail was lost with every aggregation operation Therefore in order to provide as much useful data as possible the data was organized into aggregate jobs units which combined the data for each abstract job for each workflow instance into one structure This allowed us to aggregate all jobs involved in a parameter study into few entries as they are all similar Furthermore in order to provide data to compare grid resources we also divided aggregate jobs on the resource that it was executed on In order to remove the load of statistic calculation from the grid portal we also need a method of pre calculating those statistics that would be required of us This service must use the aggregate job entries and use them to calculate the metrics With this in mind here are the requirements for the data maintenance portion of the project 1 The system shall group job instance data 2 The system shall group job instance data with the same job name workflow instance and computing resource into constructs called Aggregate Jobs 3 The system shall pre calculate statistics from
19. top of these resources there is a grid middleware layer that hides the low level hardware and software differences between resources and provides a standardized interface for use To add another layer of abstraction it is also possible to use a grid portal to hide the differences between multiple grid middlewares such as the WS P Grade Portal developed by MTA SZTAKI s Laboratory of Parallel and Distributed Systems There are two main categories of resources used in grid computing First are dedicated resources called service grids These can be single monolithic machines or they can be computing clusters The primary benefit of these resources is that they are dedicated trustworthy and powerful The other type of resource is commonly referred to as a desktop grid These primarily function using a concept called cycle scavenging where owners donate their unused CPU time to work on a problem farmed out to the grid 2 The considerations of desktop grid systems are different than those of service grids as there are not the same guarantees of availability and trust that there are with service grids 13 1 2 1 WORKFLOWS AND JOBS One of the advantages of the distributed computing paradigm of grid computing is the capability for parallelization This is further suggested by the structure of the applications or workflows created to be executed on such grid systems At a high level a 10 workflow is defined by a Directed Acyclic Graph DAG for whi
20. updates stat AggregateJobStatus with appropriate data from stat JobInstanceStatus CreateOrAddToJobInstance Adds stat_running data to a stat_JobInstance row or creates one if does not already exist 69 APPENDIX F USER MANUAL e DCI Statistics User Statistics Viomfos Statistics Portal Statistics Overall Portal Statistics mag Fa ureRate fanure Rate 46 Total Number of Fated Jobs 900 jobs Job Average Execution Time 635 25 Tots Number of Jobs 216 jobs Standard Deviation of Job Average Execution Time 543 723 s Times aise E Total Timeis Avecage Time Spent n tne Fated Run State 0 006 Average Time Spent in the Queue State 0446 Avecage Time Spent in the Portal State 632 745 Avetage Time Spent in the Run state 0 005 Average Time Spent n the Other State 0 00 a Standard Deviations Numbers of Nodes Hise Exerc Standard Devistion of Time Spent n tne Fated Run State 0 006 Numer of Times the Job Entered tre F ateq Run State 0 entres Standard Deviation of Time Spent in the Queue State 0 30 s Number of Times the Jod Entered the Queue State 216 entries Standard Devistion of Time Spent in the Portal State 583 53 6 Number of Times the Job Entered the Portal State 216 entries Standard Deviation of Time Spent in the Run State 6 006 Numer of Times the Jod Entered the Run State 6 entries Standard Deviation of Tim Spent in tne Otner State 000 a Number of Times the Job Entered tn
21. was impossible to delete consumed entries from the stat_running andstat_JobInstance tables Instead we were only able to flag the offending rows for deletion Therefore since the calculator is already polling that database it also runs a SQL delete query to remove the unneeded entries 3 4 Ul The front end of our system was the portlet integrated into the WS PGRADE Grid Portal To accomplish this we used a multitude of tools and Liferay Liferay was used as part of our development environment to upload the portlet and test its interactions The creation of the portlet was done in multiple iterations eventually ending with the final product 3 4 1 TOOLS LANGUAGES The user interface was an additional portlet added to the preexisting webpage For this four languages and tools were used HyperText Markup Language HTML JavaServer Pages JSP JavaServer Pages Standard Tag Library JSTL Java Script and Google Chart Tools 1 HTMLis the predominant language for the design and display of webpages It is used to create structure formatting and functionality in a webpage 2 JSP is a technology that enables the design dynamic Web pages and separates the user interface from the content generation which allows a Web designer to change the page layout without altering the underlying content 7 3 JSTL A collection of tag libraries that implement general purpose functionality common to many Web applications 7 4 Java Scri
22. 82117 ad Worktiow Correct Workflow Statistics Hide Expand FailureRate Failure Rate 6 Total Number of Falied Jobs 6 jobs Job Average Execution Time 1011 35 Total Number of Jobs Jobs Standard Deviation of Job Average Execution Time 521 806 s Jo s Job 2011 04 27 10 09 00 0 se Worktiow instance KEE 2011 04 27 10 09 00 0 2011 04 27 10 10 11 0 2011 04 27 10 11 48 0 Hide Ewand Figure 28 Concrete Workflow Metrics To view metrics on workflow instances or abstractjobs the user must have first chosen a concrete workflow Once a concrete workflow is selected two drop down menus of available workflow Instances and abstract jobs will appear for each concrete workflow selected The user will choose one to display and the metrics will appear in a pop up window 72 Figure 29 Pop Up Window for Workflow Instance 73
23. Factory The factory queries the database table stat metric description which sends the results back to the factory This step returns a collection of the metric information back to the portlet Next the portlet sends the information to the StatisticsFactory This factory gueries the database for portal metrics and receives the result set which populates the collection of metric information with data The information is then sent back to the portlet The data is than displayed to the user At this point the user can reguest to view DCI statistics The portlet accesses the object MenuPoulator MenuPopulator accesses the database to receive a list of possible DCI s and returns it to the portlet The portlet produces a selection list for the end user Once the user makes a selection the path is the same as with portal metrics except with DCI information 2 2 4 USER INTERFACE CANDIDATES Before starting on the programming aspect of the user interface we created multiple candidate designs to present as potential candidates for a user interface The designs were based on the assumption that there would be only one page to display all the data Furthermore they were designed before we knew the amount of data we could retrieve and before we had directly interacted with the system The two designs below are the closest to the final design 22 Manage Y d Joanie Eda Cosgols Got Abeysondrg Anderton san Qa ven GRID USER SUPPORT ENVI
24. Grid Portal In its current state the system will be able to track the execution of workflow instances and job instances executing on the grid and store this information in an efficient manner We also created a useful visualization interface for this data that displays it for several different levels 5 1 USER INTERFACE The user interface was successfully implemented as an additional portlet for the WS PGRADE Grid Portal The statistics portlet had five pages in the end displaying portal user DCI resource and concrete workflows From the concrete workflow page the user could choose a workflow instance or an abstract job metrics that appear in a pop up window All the pages used a consistent format At the top of each page is a navigational menu so the user can easily visit each page without having to use the browsers back button On each page the user is able to hide or show the sections of statistics to see If there is no data available for one of the levels it instead displays no data available 5 2 BACK END Our data management and aggregation services are implemented so that once deployed they will be able to track all job instances that are executed on the portal While complicated our aggregate job structure aggregates the data into more efficient units while still allowing meaningful comparisons It was created in a manner that it could be run on an isolated server from the gUSE system allowing for any performance i
25. L OF GRID COMPUTING Vol 3 pp 221 238 7 Oracle JavaServer Pages Technology Oracle Online Cited April 22 2011 lt http www oracle com technetwork java overview 138580 html gt 8 P GRADE portal family for grid infrastructures Kacsuk Peter 2011 3 March 10 2011 Concurrency and Computation Practice and Experience Vol 23 pp 235 245 9 P GRADE Portal A generic workflow system to support user communities Kacsuk P ter and Farkas Zoltan 2011 5 May 2011 Future Generation Computer Systems Vol 27 pp 454 465 Arch of PGRADE and basic job and workflow 50 10 SZTAKI LPDS 2011 Welcome to WS PGRADE Portal GUSE Online 2011 Cited April 22 2011 11 SZTAKI MTA SZTAKI Computer and Automation Research Institute Hungarian Academy of Sciences The Instute Online Cited 4 1 2011 http www sztaki hu institute 12 SZTAKI People Kacsuk P ter SZTAKI Online Cited 4 1 2011 http www sztaki hu people 008001429 13 WS PGRADE Supporting parameter sweep applications in workflows Kacsuk P Karoczkai K Hermann G Sipos G Kovacs J Nov 2008 Workflows in Support of Large Scale Science doi 10 1109 WORKS 2008 4723955 14 Google Coogle Chart Tools Imgage Charts aka Chart API Google Code Online 2011 Cited 4 11 2011 lt http code google com apis chart docs making charts html gt 51 GLOSSARY Abstract Job Refers to ajob in a Concrete Workf
26. Project Number GXS 1101 Extension of Grid Portal Functionalities with Collection and Visualization of Usage Statistics A Major Qualifying Project Report Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE In partial fulfillment of the requirements for the Degree of Bachelor of Science by Alessandra Anderson Sam Moniz April 28 2011 Professor Gabor N Sarkozy Major Advisor Professor Stanley M Selkow Co Advisor 1 ABSTRACT The WS PGRADE Grid Portal allows users to create and maintain workflows through an intuitive user interface However the current version lacks the ability to share metrics about the system To provide these metrics a new portlet database and web service were developed The service is responsible for collecting and storing metrics in the database and the portlet is responsible for display of these metrics These additions enable end users to retrieve statistics on the portal user DCI s resources concrete workflows workflow instances and individual jobs from the workflow graph ACKNOWLEDGEMENTS First ofall we would like to thank our sponsor MTA SZTAKI and the Laboratory of Parallel and Distributed Systems LPDS and the laboratory head Professor Dr P ter Kacsuk for allowing us to have an opportunity to work with the LPDS staff to create an interesting project dealing with the WS PGRADE Grid Portal Secondly we would like to thank Worcester Polytechnic Institute for allowing us this
27. RONMENT MTA SZTAKI AABORA TOET OF PARLA a NTRA an Lat Workflow Pull Down List Jobs Pull Down List Workflow Jobs under it Freee By Figure 6 Candidate Design 1 Design 1 has six buttons the user could select for the level and then hit display button to get to the statistics Whenever a user would choose a button it would appear to be pushed in to indicate it was selected The button remained depressed until the user either deselected it or hit display To select a job or a workflow the user would be offered a drop down list to choose from Again they could select multiple to display at once by highlighting more than one Advantages Disadvantages Simple for the User 2 Step Process to see statistics Clean and uncluttered Looks unfinished 23 E es Maagev Y Jonie tak Cosmos Goe Alestoedro Anderson gam ow MTA SZTAKI Viei ore Worifow Zoo pe Secerty Stes ba herbs ages shen grose Part Statistacn ay angsa Cums wa Pona Statatos twe Deere Catagures hen Diferrt Catagores here Dier Catagores reg EED Pal EN rr sapa m ra reet Ka lad of wee ma I N sas Figure 7 UI Candidate Display Design Figure 7 shows a candidate display design This design displays each metric in categories and sub categories A sub category is a grouping of statistics for example times For each choice the user had selected a main category such as Portal would be generated and sub ca
28. TALLATION MANUAL In order to deploy the statistics system there are three components that must be deployed APPENDIX D T DATABASE DEPLOYMENT To modify the database with our schema changes please run the provided scripts o QM OO OO A A aA A A A A A o QA a a a a a koi E stat JobInstance sql n E stat statistics sql E stat running sgl E stat WorkflowInstance sql E stat JobInstanceStatus sgl E stat AggregateJob sgl E stat AggregateJobsStatus sql E stat AbstractJob sql CH E stat JobStateTypeStatistics sql CH E stat portal sql E stat ConcreteWorkflow sql E stat resource sql E stat DCI sql E stat metric description sql Tm E stat user sql E routines sgl Once these scripts are executed confirm that there are 15 new tables or 14 if there was already the stat running table installed in which case confirm that it was modified Also confirm that the following triggers and stored procedures are present Triggers BEFORE UPDAT BEFORE UPDAT BEFORE INSERT B T T L ON stat running ON stat JobInstance TOAGGJOB ON stat ConcreteWorkflow calculate workflow 64 BEFORE INSERT ON stat ConcreteWorkflow calculate workflow delta BEFORE UPDATI GI ON stat WorkflowInstance calculate workflow delta BEFORE UPDAT GI ON stat
29. ZTAKI Desktop Grid and technologies for interoperability 3G Bridge that enables cost efficient alternative platforms for scientific and business applications 10 14 2 METHODOLOGY In order to determine the requirements for our system we progressed through a series of steps to determine what metrics we wanted to make available to the user what data we had to store in order to provide those metrics and how we had to transform the data we received into the data we needed to store Furthermore we explored different methods of displaying these metrics to the user 2 1 ARCHITECTURE Figure 2 System Architecture Figure 2 reflects the architecture for the system with our proposed components in red and orange The proposed components have to receive job status data from gUSE and group itin an efficient and meaningful manner To do this the statistics database will handle grouping of data on the job level and the proposed calculator service would use the grouped data to calculate statistics and store the calculated values in another database structure for the calculated statistics The calculated statistics tables would be read by the portlet in order to be displayed to the user 15 Job Status Information Group Job Status Data by Job Instance Group Job Instance Data by Aggregate Job Consume Agg egate Job data Figure 3 Data Flow Diagram The above figure shows how data flows through the propos
30. aggregate jobs for the user interface 2 3 DATA AGGREGATION PortalStatistics DCI Resource Resource Resource Aggre Aggre Aggre Agere Agere Agere Agere Aggre gate gate gate gate gate gate gate Parameter Study Job gate Job job Job Job Job Job Job Job Figure 9 Data Composition Diagram Figure 9 shows a high level example of the method of aggregating our data into statistics For each layer statistics can be generated through some combination of the layer below until the Aggregate Job layer In the diagram the only data that is being stored is the data for an Aggregate Job which is one of two things If the aggregate job refers to a parameter study node such as the case of Parameter Study Job in the diagram aggregate job stores statistics about the aggregate of all of the jobs that compose it Otherwise there is a one job instance to one aggregate job relationship This allows us to significantly reduce the volume of data stored The aggregate job structure can therefore be used to generate statistics about larger constructs For instance Figure 17 shows how statistics about a resource are composed by aggregating statistics about all the aggregate jobs that have been run on that resource Furthermore DCI Distributed Computing Infrastructures statistics can be aggregated from all the resources that compose it There are similar paths to aggregate statistics about users workflow instances abstract jobs
31. ation and international companies such as General Electric the National Aeronautic and Space Administration and the Office of Naval Research One of their main research areas is cluster and grid computing 1 5 1 LPDS The Laboratory of Parallel and Distributed Systems LPDS is a branch of MTA SZTAKI that specializes in grid technologies LPDS is a member of the Hungarian Grid Competence Center and the National Grid Initiative The department is headed by Proffessor Dr P ter Kacsuk a renowned expert in the field of Grid computing and co editor in chief of the Journal of Grid Computing 12 LPDS has produced five projects the most prominent being the WS PGRADE Grid Portal LPDS participated in the CoreGRID Network of Excellence and works as a project member in all the phases of the largest European grid infrastructure project EGE EGI Inspire Furthermore they helped establish the Hungarian Virtual Organization of the European Grid Infrastructure HUNGRID extended with the WS PGRADE Grid Portal They are also involved in many more projects as well both nationally and internationally They have two goals in grid research To provide efficient software development tools and high level services together with customizable scientific gateways based on workflows P GRADE Grid Portal gUSE for harvesting the most wide spread grid infrastructures based on gLite Globus and BOINC To offer easy to maintain middleware solutions S
32. cally executed on some database event such as the insertion into a table User The user that is using the portal or the user that is interacting with our system User Statistics Refers to the statistics of all jobs and workflow executed by a given user Workflow Instance Asingle execution of a Concrete Workflow Workflow Instance Statistics Refers to the statistics of all the job instances that were executed for this workflow instance Also provides the time of execution overall 53 APPENDIX A JOB STATE TABLE This is a table of the possible states that a job instance can enter on the portal or on a resource Of particular note is our grouping of them as shown in the state type column which was discussed in the paper Ifit becomes necessary to change any of these values or add additional values you must change the enumeration that is in the calculator service project StatAggregator jobState JobState If it becomes necessary to add terminal states you also have to change the ToJobInstance trigger on stat running Also this table is subject to change as control of some of the states is given to the grid middlewares Also note state 55 which currently only exists in our system to represent the final running state the produced results Table 3 Job States Name Identifier Terminal State Type Assignment INIT 1 false StateType PORTAL SUBMITTED 2 false StateType QUEUE
33. ces Map lt String Workflowlnstance dcis Map lt String DCI sort aJobs Collection lt StataggregateJob gt con DBBase void addToWorkflowinstances job StataggregateJoba void addToConcreteWorkflows con DBBase void addToDCls void addToAbstractJobs job StataggregateJoba void addToUsers job StataggregateJoba void addToResources job StataggregateJoba con DBBase void addToPortals job StataggregateJoba void getEntities Collection lt Entity gt Sh put con Connection void lt lt thread gt gt stataggregate Puller conFactory DBBase started boolean run boolean lt lt create gt gt Puller con DBBase getCon Connection run0 void runAgg0 void toggle String stopLoop0 String startLoop0 String aggregateTop10000 String closed void getAggregateJobs Collection StataggregateJob cleanStatRunning0 void cleanjoblnstance void cleanOldStatRunning0 void cleanOldjoblnstance void markEntered jobs Collection lt StataggregateJob gt void StataggregateJoba StataggregateJobStatus getSquaresOfRunningTime int setSquaresOfRunningTime squaresOfRunningTime int void getStatesMap Map lt String StataggregateJobStatus gt getStates Collection lt StataggregateJobStatus gt fromrst rst ResultSet Stataggregateloba combine a StataggregateJobStatus
34. ch the nodes are jobs and the edges are inputs and outputs of those jobs 2 i j 2 1 1 o d o Figure 1 Directed Acyclic Graph Example Figure 1 shows an example of a DAG The orange rectangles are jobs and the grey sguares are output ports and the green sguares are input ports The edges are files that are supplied by the output ports to all connected input ports This structure allows the workflow to be executed in a parallel manner by scheduling jobs for execution as soon as their inputs become available and executing the job as soon as there is a resource available for it Multiple jobs from the same workflow can therefore be executed in parallel 9 In combination with repository technologies a configured workflow can be executed an arbitrary number of times each execution of which is a workflow instance In a similar manner jobs that appear in the DAG to be referred to as abstract jobs can be executed multiple times across multiple workflow instances or a single abstract job can be executed many times within the same workflow instance when using special ports 13 Those ports cause the job to be executed for each of some combination of the inputs 1 3 PORTALS A portal is a web system that provides an interface for accessing services such as a grid portal or a gateway platform Originally all major portals started out as Grid Portals and were later extended to support other infrastructures such as desktop Grids
35. e Ee Ge Re ee ee Ge ee Ee ee 30 FIGURE 12 SIMPLIFIED JOB STATE DIAGRAM soe VEE BEE taa ee h ER RR Re GE a EE De Se EG SR aha aan 31 FIGURE 13 STAT AGGREGATEJOB AND STAT_AGGREGATEJOBSTATUS Ge ee ke Re ee ee Se ee Ee ee 31 FIGURE 14 CALCULATOR DATABASE STRUCTURE kk kk kt ee se se se ee ee ee ee ee aaa aaa aaa aaa a a kaka ee Ne Ne We ke Ne NA ee Ne Yn ee dee 34 FIGURE 15 UI IMPLEMENTATION GRAPH ee se se ee ee ee ee ee ee ee ee ee ee Ee Ee U U ee ee a a a AA ee 36 FIGURE 16 ORIGINAL USER INTERFACE FIGURE 17 SECOND ITERATION USER INTERFACE 38 FIGURE18 FINALPRODUCT S uy EE aE SS a ui Seed EE eeh 40 FIGURE 19 FINAL PRODUCT CONCRETE WORKFLOW AND ABSTRACT JOB METRICS 41 FIGURE 20 NUMBER OF DATABASE ENTRIES FOR A WORKFLOW I n n nn 44 FIGURE 21 NUMBER OF DATABASE ENTRIES FOR A WORKFLOW nn a 47 FIGURE 22 STATAGGREGATOR CLASS DIAGRAM FIGURE 23 STATAGGREGATOR CLASS DIAGRAM PART J N 58 FIGURE 24 PORTLET DATA ACCESS LAYER CLASS DIAGRAM a aa aa aaa a aka kase sasa Ne se ke Ne NA Ne ee Ne ee ee 59 ole dd USER AINTERFAGE eege a ee 70 FIGURE 27 SELECTING DCI STATISTICS FIGURE 28 SELECTING RESOURCE EE 1 BACKGROUND In the field of scientific computing there are some complex computational problems that require a large amount ofresources to solve Such tasks as param
36. e Other state 0 entries Figure 25 User Interface F 1 INTRODUCTION The aim of the statistics portlet is to allow users to view metrics on seven levels portal user DCI resource concrete workflow workflow instance and abstract job This is accomplished by allowing users to navigate to different pages to see the level of statistics they want The statistics portlet is an addition to the pre existing portlets on the WS PGRADE Grid Portal The default page view upon clicking the statistics tab is the metrics for the portal The other pages can be accessed through a menu at the top of the page For any section a user can choose to expand and minimize the amount of data they wish to see by clicking on expand or hide Descriptions and usage information can be found below 70 F 2 DCI METRICS Mwsycom Statistics test DCI User Statistics V orktiow Statutes Portal Statistics Figure 26 Selecting DCI Statistics To navigate to DCI metrics the user clicks the DCI menu button at the top of any page Once on the DCI page the user chooses from a drop down list of available DCI s Once chosen the user clicks the DCI button next to it and the metrics will be displayed The DCI metrics can be useful for comparing different DCI s and checking performance F 3 RESOURCE METRICS vwonnmow Storage Btersy com Statistics test Portal Statistics User Statistics V orkfiow Statistics DCI Statistics Manmi X
37. ed system The information starts in the statistics database as entriesinthe stat running table which is populated by a gUSE service The stat running entreis describe the current state of the job at a specific point in time These values are then combined using MySQL database triggers into structures based off ofjob instances run on the grid The job instance values then are grouped again into a structure called aggregate jobs which are a combination of several job instances that share the same job name workflow instance and resource There also exists a web service the calculator service that consumes the aggregate jobs and calculates the metrics for the user The calculated values are then available to the portlet for display to the user Overall this design allows our services and database to be completely isolated from the gUSE systems that allows the performance to be controlled independently The 16 exception to this would be the constructs created for the portlet to provide useful menus to the user 2 2 USER INTERFACE REQUIREMENTS The UI requirements included functionality requirements and usability requirements The functionality requirements included being able to show the metrics gathered accessing the database having similarity to the rest of the portal and creating a way to navigate the data Showing the metrics gathered required providing a layout and a table structure as well as offering graphical representations of s
38. ent in the Other state 0 006 Standard Deviations Number Standard Standard Deviation of Time Spent in the Faleg Run State 0 005 tandara Deviation ot Time Spent in the Queue state 0 75 6 Standard Deviation of Time Spent in the Portal State 21 746 Standard Deviation of Time Spent in the Run State 0 00 6 Standard Deviation of Time Spent in the Gre state 0 006 Figure 19 Final Product Concrete Workflow and Abstract Job Metrics Figure 19 shows an example of an abstract job within a concrete workflow The abstract job is displayed in a pop up window and the concrete workflow is underneath it 3 5 CONFIGURATION In order to remove configuration constants from our code we employed a java configuration file for both the portlet and the calculator service This file contained information about the database connection and any constants that we wanted to be simple to change This is advantageous as it makes it simpler to change some of the behavior of the system 41 4 TESTING Our testing approach was a combination ofiterative and cumulative tests As we progressed through the implementation of our system we had many smaller components that could be tested individually which was accomplished as we progressed through the implementation We also had a dedicated time set aside for testing which focused on functionality integration and performance testing This approach was beneficial as it lent itself to the concurrent development m
39. ertain levels and the statetype column allowed us to set one of the five state types we were using Lastly the category column allowed statistics to be grouped together so they could all be displayed with a single call The category dictated which metrics would be displayed together for example times in state types was one category By extracting the 38 presentation information from the database this extra table cut down the amount of hardcoding considerably and made the system overall easier to modify The final iteration incorporated small changes to achieve the final product Frist both graphs were modified to better display the data Second hide and expand options were added to each category of statistic Third both abstract job and workflow instance were changed to be displayed in popup windows instead ofon a separate page Finally a menu navigator was added at the top of each page and the titles for categories changed to better describe the metrics 3 4 4 FINAL PRODUCT The final product showed portal metrics when the user accessed the portlet Users could choose from a top menu to view DCI user or concrete workflow statistics From DCI and concrete workflow the user could enter another level and view resources workflow instances and abstract jobs Each level was displayed in the same format except for workflow instance and abstract jobs which were displayed in popup windows The final metrics that were shown fell into
40. eter studies analysis and other complicated problems are difficult to accomplish due to lack of resources or computational power One of the solutions to these problems is grid computing Grid computing is used to share tasks over multiple computers and shared resources MTA SZTAKI located in Budapest Hungary has developed The WS PGRADE Grid Portal which is a web based service rich environment for the development execution and monitoring of workflows and workflow based parameter studies on various grid platforms The WS PGRADE Grid Portal uses high level graphical interfaces to allow all levels of users to submit applications in the form of directed acyclic graphs DAG to a large variety of grid solutions The DAG defines the dependencies between components of the user s workflow and the job manager then uses various Grid resources for processing the application Furthermore the portal can access multiple grids simultaneously which allows easy distribution on multiple platforms 10 The portal allows users to run jobs on multiple grid infrastructures such as gLite and other middleware as well as local clusters 9 Furthermore they can submit a workflow to multiple Distributed Computing Infrastructures DCI which are each comprised of numerous resources 1 1 PROJECT STATEMENT The objective of this project was to integrate a new service into the WS P GRADE Grid Portal which would collect store and present data about the execution of
41. f metrics Administrators as mentioned before are interested in the overall portal statistics as well as the DCI and resource levels While our system provides the data to all users the differences between them will mean some levels of data will be more useful to a particular type of user For example an administrator may be interested in the amount ofjobs run on a certain resource while a user may be more interested in the amount of time there workflow took For this reason the data was divided into the multiple levels The user can view the levels by choosing different menu options The portal and user levels assume the statistics to be displayed were the current portal and user the other levels require a choice ofwhat object to be displayed This is because there are multiple options for example a user can have many concrete workflows and itis not possible to easily display all of them The multiple levels and choices allows both administrators and users to view only the statistics they wish to see without having to deal with an overload ofinformation Overall this design works for the system because there is no need for a user to see more The user may only view their statistics because the other levels of statistics provide for comparison The other choices provide a way to view statistics on individual objects instead of receiving an overload of information A user can compare DCI s to select which one has been performing the best in
42. flD String Map lt String String gt etProperty key String Strin getAJobs wflD String List lt String gt executeQueryNAME ID guery g param String HashMap lt String String gt Metriclnformation locale Locale pretty_name String column_Name String category String units String percision int source String data double type StateType lt lt enumeration gt gt StatisticLevel PORTAL USER Da RESOURCE WORKFLOWINSTANCE CONCRETEWORKFLOW ABSTRACTJOB name String getCategory0 String getColumn Name String getPercision0 int getPretty namel String getSource0 String getUnits String lt lt create gt gt Metriclnformation pretty_name String column Name String category String units String percision int source String type StateType getData String getUglyData0 double setData data double void getTypel StateType create StatisticLevel name String toString String StatistiticsFactory connectionFactory DBBase MetricinformationFactory lt lt create gt gt StatistiticsFactory connectionFactory DBBase confactory DBBase getWorkflowinstance metrics Map lt String List lt MetricInformation gt gt wrtid String void getConcreteWorkflow metrics Map String List lt Metriclnformation gt gt wfid String void getAbstractJob metric
43. gt stat_aggregateJob_ ID BIGINT 20 lt gt End TS TIMESTAMP jobstate VARCHAR 255 wflD VARCHAR 255 min INT 11 wrtlD VARCHARI255 lt gt max INTI11 userlD VARCHAR 255 total INT 11 portallD VARCHAR 255 Q squares INT 11 gt squaresOfRunningTime INT 11 num INT 11 consumed TINYINT 1 Figure 13 stat_AggregateJob and stat_AggregateJobStatus 31 Figure 13 shows the final step in the database component the aggregate job structure Aggregate job combines data from several job instances that share some data and combines the jobs into one structure All of the job instances combined into an aggregate job share the same job name workflow instance and execution resource The same job name and workflow instance means that the job instances share the same executable routine Enforcing the same resource allows comparisons between different grid resources executing the same job The data stored in the aggregate job tables is similar to the data from the job instance table The main difference is that it is structured to combine several job instances For each state that any of the job instances visit data required to calculate the average time the number of entries into that state and the standard deviation are stored One of the requirements for this table was that the only metric information that we store would be calculable with only the previous metric value and information about the values to add to it
44. jobs and workflows on the WS PGRADE Grid Portal This addition allows end users communities and administrators to retrieve statistics on the portal usage The design of our project had three major components data collection metric calculation and visualization The goal of our data collection component was to receive data from the WS PGRADE Grid Portal and reduce it to an efficient structure Our metric calculation component consumed that data and calculated the portal s statistics Finally our visualization component displayed the statistics to the user in a meaningful form The motive behind this project was to provide a new service in the WS PGRADE Grid Portal that would be a useful addition Although this project is mostly to provide a new feature for the users itis also helpful for administrators For administrators this feature will allow them to track of the load on different aspects the portal as well as be able to monitor different levels ofusage so they can better provide for the user For the user our service provides feedback on the execution of their jobs and workflows 1 2 GRID COMPUTING Grid Computing was originally proposed as a global system to solve computationally intensive problems that could be solved in a reasonable amount of time even with state of the art supercomputing resources 6 This problem was solved by aggregating multiple computing resources that may be geographically or architecturally distinct On
45. low Abstract Job Statistics Refers to the statistics of all the job instances of the specified Abstract Job aggregated across workflow instances Aggregate Job Aggregation of all job instances that share the same workflow instance resource and job name Concrete Workflow A workflow that is configured for execution Concrete Workflow Statistics Refers to the statistics of all the executions Workflow Instance of the specified Concrete Workflow DCI Distributed Computing Infrastructure a collection of virtual organizations that from which computing resources can be accessed DCI Statistics Refers to the statistics of all jobs and workflows executed on the given DCI Google Chart Tools API used to generate diagrams from the statistics Job Instance Ajob that is executed on the grid 52 Level Of Statistics Portal DCI Resource User Concrete Workflow Workflow Instance Abstract Job Refers to what it is possible for the user to view statistics on Portal Statistics Refers to the statistics of all jobs and workflows executed using the instance of the WS PGRADE Grid Portal Resource A single computing resource A set of these make up a DCI Resource Statistics Refers to the statistics of all jobs and workflows executed on the given Resource gueue Stored Procedure SPROC Executable database code that is stored in and run on the database Trigger Executable database code that is automati
46. min Integer void getMax Integer setMax max Integer void getTotal0 Integer setTotal total Integer void getSquares Integer setSquares squares Integer void getNum0 Integer setNum num Integer void toString0 String StateStatistic Statistics total int squares int num int type StateType addStatistics s StateStatistic void create StateStatistic type StateType addAggregateJob ajob StataggregateJoba void updateStateStatistics statlD long con Connection void TotalJobTime int SquaresJobTime int NumFailedjobs int NumJobs int ID long e Entity StateStats Map StateType StateStatistic gt addStatistics s Statistics void create Statistics e Entity addAggregateJob ajob StataggregateJoba void updateStatistics con Connection long Figure 22 StatAggregator Class Diagram Part 1 57 Workflowlnstance wrtlD String insertEntity con Connection void lt lt create gt gt Workflowlnstance wrtiD String getKey String getStatFKColumnO String getTable0 String getKeyColumnO String getWhereClause0 String setKeys ps PreparedStatement void Entity insert boolean stats Statistics getStats Statistics getTable String n getKey String getWhereClause String getStatFKColumno String setKeys p
47. mpletion time e Average time jobs are in different states e The standard deviation for the times e The number of jobs e Number of workflows e Running failure rate e Number of failed jobs 1 5 MTA SZTAKI MTA SZTAKI is Hungary s largest and most successful information technology research Institute The name is an acronym for The Computer and Automation Research Institute Hungarian Academy of Sciences in Hungarian It is governed by the Hungarian Academy of Sciences and is supervised by the Board of the Institute 11 It was founded in 1964 and has more than 300 full time employees The main task for the institute is to perform basic and application oriented research in an interdisciplinary setting in the fields of computer science engineering information technology intelligent systems process control wide area networking and multimedia 11 Also they do contract based research development and training as well as provide support for domestic and foreign industrial governmental and other groups They are active in both graduate and undergraduate education offering lectures and classes as well as providing opportunities for students to participate in the work at the institute 13 The institute is a part of the European Research Consortium of Informatics and Mathematics and a member of the World Wide Web Consortium They have worked on projects for both Hungarian companies such as Paks a Hungarian Nuclear Power St
48. odel of the back end and the portlets All of our tests were executed on our development virtual machine For this project the tests consisted of manual testing Due to the bulk of the functionality being the database interactions it was simpler to test the functionality manually or with test scripts While we did consider building a Java database test harness for our database code we deemed it unnecessarily time consuming Our testing also relied heavily on the logging provided by Apache Tomcat s logging system and log file catalina out This system allowed us to print debugging messages to determine the state of the program when it was running on our development environment instead of our local machines 4 1 BACKEND TESTING Throughout the development of the database component s SOL stored procedures and database triggers continuous testing was done in the form of SOL scripts to simulate a workflow running on the grid Further testing was provided through executing actual workflows on our development portal to test our system with actual data The final pass for database testing was a suite of testing SOL scripts that tested the behavior of the database programs in a manner similar to unit tests The calculator service testing methodology was almost entirely made up of functionality tests running a workflow and confirming that all of the statistics are correct There were several types of workflows that we used in order to do this te
49. ome statistics The metrics the user needed to be able to view were on several layers listed below Each layer had to have the same layout for organization purposes as well as function in the same manner even though the data accessed was different Accessing the database required a way to retrieve the data To maintain similarity with the rest of the portal it was necessary to study the previously completed sections Finally to navigate the data required setting up choice lists as well as menu buttons The menu buttons were main navigation reaching all the top levels of metrics such as portal user DCI and concrete workflow The choice lists required populating the list with what was available Furthermore it required that the user makes choices either with a drop down menu or a user filled input box The usability requirements included general user interface standards such as size of text or coloration Other standards include arrangement readability comprehensibility and usability 1 Users may view metrics about e The WS PGRADE Grid Portal e User e DCI e Resource e Concrete Workflow e Workflow Instance e Abstract job 2 Users may choose the navigational buttons e DCI e User 17 e Concrete Workflow 3 Users may select individual e DCPs e Resources e Concrete Workflows e Workflow Instances e Abstract Jobs 4 Users may compare multiple Concrete Workflows 2 2 1 USE CASES For the interface there were multiple levels o
50. ow is just a parameter study workflow in which all the jobs only execute once all ofthe workflows behaved as a parameter study The most specialized workflow was a workflow that would contain jobs that failed this tested the behavior of our failure rate statistic Other workflows were created to test the running statistics of our system The workflow would run a large number ofjobs that had a predictable execution time We were then able to compare the calculated running times with the expected Table 2 Expected Average Running Time Compared to Reported Concrete Workflow Name Expected Average Running Time Per Job Reported Running Actual Time Per Job QuickLongRunner 10 seconds 345 33 seconds LongRunner 60 seconds 297 55 seconds LongRunner 10minEach 600 seconds 1328 7 seconds 43 The discrepancy between the expected and the actual running time is due to the service that populates the stat running table Currently that service does not distinguish between a job instance waiting in the queue of a resource and the job instance being executed on that resource This was discovered during the implementation of our system We therefore were careful to show that once the stat runningtableis populated correctly our system would return the correct values 4 4 DATABASE MEMORY CONSUMPTION As was previously mentioned the aggregate job structure was designed to reduce the memory consumption of the system
51. ow the user to 48 select which statistics to display Third the portlet could display multiple levels of metrics at once for example display both the portal and the user metrics together Finally better navigation techniques could be implemented for example tabs instead of a menu as well as a back or refresh button 49 7 REFERENCES 1 Alejandro Abdelnur Stefan Hepper 2003 Java Portlet Specification October 7 2003 2 Anderson David P 2004 BOINC A System for Public Resource Computing and Storage 5th IEEE ACM International Workshop on Grid Computing 2004 pp 4 10 BOINC PAPER 3 gLite gLite Lightweight Middleware for Grid Computing Online lt http glite cern ch gt 4 Liferay Inc What is a Portal Liferay Enterprise Open Source For Life Online lt https www liferay com documentation additional resources whitepapers p_p_id 20 amp p_p_lifecycle 0 amp p_p_state maximized amp p_p_mo de view amp _20_redirect c document_library get_file 3 Fuuid 3D8f82e386 3109 4512 bacc 64cda6724751 26groupld 3D14 amp _20_struts_action document_library file_entr y_web_form amp _20_fileEntryId 7454189 amp 20_fileName What is a Portal 3F pdf gt 5 MTA Sztaki LPDS 2011 P Grade Grid Portal Online 2011 Cited April 7 2011 lt http portal p grade hu gt 6 Multi Grid Multi User Workflows in the P GRADE Grid Portal Sipos Gergely and Kacsuk P ter 2005 3 4 December 6 2005 JOURNA
52. pt is an object oriented scripting language that is used for web development to create more interactive webpages 5 Google Chart Tools or Google Chart API is a tool that allows the creation of charts from data and embeds it in a webpage The embedded data must follow the formatting parameters in an HTTP reguest and Google than 35 returns a PNG image of the chart 14 We used this tool because it allowed simple creation of dynamic graphs 3 4 2 IMPLEMENTATION PROCESS The UI implementation was done between three main stages with multiple iterations within them Modified Graphs Hide and Expand Popup Windows Titles _ Changed fi Figure 15 UI Inplementation Graph The figure above highlights each iteration and the milestones within it 3 4 3 ITERATIONS The UI was developed in three iterations with numerous milestones for each one The first iteration consisted of creating a template for the portlet that could access Liferay accessing the database displaying the data and adding the Google Chart API 36 User Statistics a a feat VW MOER phe mysa p MOU Overst Comparten Tea 09 Owersi Computer Tm SE of Jota d r na hor ge 108 Ter pa Ser ge G erger et nodau m Ee WW PP Beton te Ber em k db mate 8 ember of r ng a 8 Augen e pp i PO Zem gie se Fu ge eme ve i erger of neges n eege Porta Susto pena s r ne Donn amna ai Fur me Outen wate 0 Meter at Noce
53. s Map lt String List lt Metriclnformation gt gt jobName String wfl lt lt create gt gt MetricinformationFactory connectionFactory DBBase getMetric type StatisticLevel Map lt String List Metriclnformation getUser metrics Map lt String List lt Metriclnformation gt gt userlD String void getPortal metrics Map lt String List lt Metricinformation gt gt URL String voi getResource metrics Map lt String List lt Metriclnformation gt gt URL String getDCl metrics Map lt String List lt Metricinformation gt gt name String v populateMetricData ps PreparedStatement metrics Map lt String List lt Metriclnformation gt gt con Connection void oid Figure 24 Portlet Data Access Layer Class Diagram 59 APPENDIX C STAT METRIC DESCRIPTION TABLE This table describes the presentation of the metrics we make available to the user on the portlet Table 4 stat metric description table Units Source Table for State Type ID Column Pretty Name Name level UOTSDIAd o d o ga d delta Workflow Instance 6 s 1 stat Workflowl work NULL 3 Execution Time nstance flowi nstan ce NumFailed Total Number of 1 jobs 0 stat statistics all NULL 5 Jobs Failed Jobs StdDev Standard Deviation of 3 s 3 stat_statistics all NULL 7 Job Average Execution Time Average Average Time Spent 2 s 2 stat JobStateT all RUN 10 in the Failed Run ypeStati
54. s PreparedStatement void selectStatisticsID con Connection long testTheninsertEntity con Connection void insertEntity con Connection void Resource DCI String URL String togetdci StataggregateJoba getDCl0 String populateDCl con DBBase void lt lt create gt gt Resource URL String getKey0 String setKeys ps PreparedStatement void A AbstractJob jobName String wflD String insertEntity con Connection void getKey0 String getStatFKColumno String getTable0 String getWhereClause String setKeys ps PreparedStatement void lt lt create gt gt AbstractJob jobName String wflD String ConcreteWorkflow workflowTotalTime long sqaresOfWorkflowTime long numberOfWorkflows long wfid String Portal getStatFKColumn String getWhereClause String getKeyColumn String getTable0 String insertEntity con Connection void DCI dciName String lt lt create gt gt DCl dciName String getKey String setKeys ps PreparedStatement void getStatFKColumn0 String getWhereClause String getKeyColumn String getTable String insertEntity con Connection void GetConcreteWorkflowStatistics con Connection void consumeWRTIDS wrtlDs Collection lt String gt con Connection void testTheninsertEntity con Connection
55. s n Gegen d pre er sam MA Te ge Omar wee St me m Omer ga Jeer Mersin pre Ter er 0 fm Te Ter ge 08 amar of Redes m Tei mwe Porte wee 150 Ter me Portal weie OF erger of Redes Petre ear baseet o Total Wow a fam uatan n Tene Aen iratan o ya nae Figure 16 Original User Interface Figure 16 shows the original design seen below displayed metrics in rows The second iteration incorporated many changes First the site layout was changed to the final version Second DCI was added to the levels of metrics Third a new way to access the database was implemented Fourth the ability to select multiple concrete workflows was added Fifth the number of states was decreased to five instead of seven Finally a new table structure was added The new layout is seen below 37 vwonnou Storage wert Settings Ena User Hep infornston stanses mersycom Statistics test F x DC Statistics User Stetetca vvon o Matten Portal Statistics Overall Portal Statistics mice Eege F atureRate Portal id htt am51 pce Lite nu seberwapgrace Faure rate H Tots Number of Fated Jods 104 joos Job Average Execution Time S345 Tots Number ot Jods 5 4 Jobs Standard Devation of Jod Average Exscuton Time 93 872 5 a VOOR Times EKS Total Times Average Tene Spent in tre Fated Rue State 555 008 Average Time Spent in tae Queue State 3 09 6 Average Time Spent in the Portal State 207 13 Average Time spect in tr Run s
56. sed for web integration collaboration and social application platforms Liferay is developed by a large open source community as well as professional interactions This makes it both flexible and innovative The Liferay portal is used in the WS PGRADE Grid Portal as part of the user interface framework As WS PGRADE uses the Liferay framework our user interface was built as portlets that can be viewed on Liferay 1 3 3 PORTLETS A portlet is a Java technology based web component that processes requests and generates dynamic content Portlets are used as plug ins to an existing user interface to provide different features This allows a website to be customized for each type of user as well as provide different content A portlet is managed by a request and response paradigm and normally is intractable through its forms and links 12 A portlet is managed by the portlet container Liferay in this project which provides them with the runtime environment It contains and manages the lifecycle as well as storage and preferences The container and portlet can be separate entities or built together 1 1 4 METRICS Metrics are a measurement of performance efficiency or other statistics in an application For the WS PGRADE Grid Portal there are numerous metrics for the different aspects of the system We defined metrics that deal primarily with usage statistics Among the metrics we were able to calculate are e Average job co
57. ssues to be addressed separately 46 6 FUTURE WORK Throughout the project we created a list of possible features and metrics for our system However due to time constraints or complexity we were unable to implement everything We want to identify some areas where we feel that future work on our system would be of value Our suggested enhancements are generally either new features to the system or additional metrics 6 1 REVISED ARCHITECTURE Figure 21 Number of Database Entries for a Workflow Figure 19 proposes changes to our architecture of the system with our proposed component in purple Specifically we would recommend implementing an API service that would replace or add on to the calculator service This API would provide an access point for the portlet and allow for the possibility of other services to use the statistics data We 47 would further suggest to keep the separation of the statistics services from the gUSE services to reduce the impactifat all possible 6 1 1 META BROKER Assuming the API is implemented one service that could use our data would be the brokering service The broker is responsible for assigning job instances to computing resource gueues If an API is implemented the broker could use the past performance of the job or the resource as part ofits decision 6 1 2 ACCOUNTING Our system can also be used as the first step in an accounting component to gUSE As previously there was no
58. statistics calc statistics stats update BEFORE UPDAT GI ON stat JobStateTypeStatistics calc statetype stats update T BEFORE INSERT ON stat statistics calc statistics stats insert BEFORE INSERT ON stat JobStateTypeStatistics calc statetype stats insert Stored Procedures JobInstanceToAggregateJob CreateOrAddToJobInstance See database description section for a brief description of the use of each of these elements It should also be possible for all ofthese components to be run on a separate database from the gUSE database if deemed pertinent If so please make sure that the connection information is changed appropriately Also make sure to test the portlet s MenuPopulator java as it does use some gUSE database tables in order to provide useful names for concrete workflows jobs and DCIs APPENDIX D 2 CALCULATOR DEPLOYMENT There are several options to deploy the calculator service It is set up as a web service which can be on the same server or on a distinct server from the portal First step is to locate the statAggregator properties file and set the values in there for the database connection how long to wait for non terminated jobs and stat running entries and for the frequency of the poll Then install the project as a web service on a server with access to the database with the configuration information given Once installed go to
59. stics State Average Average Time Spent 2 S 2 stat JobStateT all PORTAL 12 in the Portal State ypeStatistics Average Average Time Spent 0 S 2 stat JobStateT all FAIL 14 in the Fail State ypeStatistics Average Average Time Spent 2 s 2 stat JobStateT all OTHER 16 in the Other State ypeStatistics StdDev Standard Deviation of 4 s 2 stat JobStateT all OUEUE 18 Time Spent in the ypeStatistics Queue State StdDev Standard Deviation of 0 s 2 stat JobStateT all TERMINAL 20 Time Spent in the ypeStatistics Terminal State StdDev Standard Deviation of 4 s 2 stat JobStateT all SUCCESSRU 22 Time Spent in the ypeStatistics N Run State Num Number of Times the 5 entri 0 stat JobStateT all RUN 24 Job Entered the es ypeStatistics Failed Run State Num Number ofTimesthe 5 entri 0 stat JobStateT all PORTAL 26 Job Entered the es ypeStatistics Portal State Num Number of Times the 0 entri 0 stat JobStateT all FAIL 28 Job Entered the Fail es ypeStatistics State Num Number of Times the 5 entri 0 stat JobStateT all OTHER 30 Job Entered the Other es ypeStatistics State TotalTimel Failed Run 7 s 2 stat JobStateT all RUN 34 nStates ypeStatistics 62 TotalTimel Queue 7 s 2 stat JobStateT all QUEUE 36 nStates ypeStatistics TotalTimel Terminal 0 s 2 stat JobStateT all TERMINAL 38 nStates ypeStatistics TotalTimel Other 7 s 2 stat JobStateT all OTHER 40 nStates ypeStatistics APPENDIX D INS
60. sting First was a very simple workflow that just executed one job which simply waited for a short period of time This allowed us to guickly test that the data was propagating through the system We would then manually confirm the values though comparison with the original data 42 4 2 PORTLET TESTING Testing for the portlet consisted of making sure it could handle different information loads as well as operate in the expected way The first part of testing consisted of testing extreme data both large and small numbers as well as having no data This ensured that the display would never fail and even if there was no data it would still work We also had to test the functionality This involved making sure every button and selection acted in the way it was supposed to Furthermore it was tested on multiple browsers to ensure the portlet worked the same way on each browser 4 3 FUNCTIONALITY TESTING In order to show that our system was working as expected we ran a suite of functionality tests The goal of these tests was to explore the behavior of the system at a high level These tests consisted ofworkflows that would be executed on the portal and after the execution was complete we viewed the statistics pertinent to the workflow There were several workflows that were created for these tests As the edge cases to our system were all related to parameter studies and because in WS PGRADE a non parameter study workfl
61. tate 31355 Average Time Spent in tre Other State 0005 Standard Deviations Numbers of Nodes rm Ee Stancarc Devaton of Tim Spent n the Fated Run State 372786 Numos of Times Ine Job Entered Ine Fated Run state 17 0 entries Standard Devation of Time Spent in tne Queue State 12 05 5 Number of Times the Job Entered tne Queue State 24 0 entries Standard Devaton of Time Spent in ine Portal State 240 234 Numer of Times the Job Entered the Portal State 164 0 entries Standard Devaton of Time Spent im tne Ren State 238 416 Numae of Times the Joo Entered tne Run State 1630 entries Standard Deviation of Time Spent in the Other State 0 005 Number of Times the Job Entered the Other State 20 entries Figure 17 Second Iteration User Interface The table that was added to the database stat metric description created a simpler way of presenting the data This table was comprised of nine columns column name pretty name category units precision source for level statetype and id The column name referenced what column the data was being accessed from the source table The pretty name and units columns were the description and units respectively that would be shown on the portlet The precision column was the number of decimal places that would be shown The id was both the primary key for the table and was also used for ordering of statistics within a category for level specified what the statistic was good for as some metrics only worked for c
62. tegories of each type of metric would be created below Advantages Disadvantages Metrics Available Right away Potentially a lot of scrolling Clean Takes up a lot of room No Customizability The final design was loosely based on the two above These designs evolved into the final design as we progressed through the project The principle of selecting workflows and abstract jobs as well as separate categories for displaying was still incorporated into the final product Furthermore they were useful for discussions on how the final interface should look 24 2 2 5 FINAL DESIGN After reviewing the original designs the final design was proposed This design consisted of creating multiple pages to display each level of statistic on its own page The pages were divided into the different levels a page for portal user DCI and concrete workflow The user could choose up to three concrete workflows to display at once For DCI the user could choose to view individual resources on the selected DCI and for concrete workflow the user can choose either abstractjob or workflow instance metrics to view After the portal is accessed the portal metrics are displayed automatically Figure 8 Site Map Figure 8 shows the final site map The user accesses the portlet which shows them user statistics From there they can navigate to DCI user or concrete workflow metrics From there the user can enter one
63. the URL SERVER StatAggregator which is currently set up to toggle the polling mechanism ofthe service Alternate initialization may be 65 recommended using the web xml file to set the service to start with the server See the file stataggregate java and index jsp to see how to start the service if an alternate method is called for The calculator service also uses information from the gUSE database Specifically it uses it in order to provide a resource URL to DCI name mapping See Resource populateDCI APPENDIX D 3 PORTLET DEPLOYMENT To deploy the portlet first set the values in the configuration file to give database access to the database where the statistics data is being stored and to set the locale and language defaults to Hungary and Hungarian The configuration file also requires the URL of the portal The language and locale is used for formatting of the values on the portlet To deploy the portlet on Liferay go to the manage tab at the top of the page and select control panel At the bottom of the list under server choose Plugins Installation Under the Plugins Installation click the button Install More Portlets and choose Upload File Select choose file and locate the war file to be uploaded Then click Install and wait for the success message to appear APPENDIX D 4 STOPPING STATISTICS If it becomes necessary to stop the statistics functionality besides reverting the system the simplest
64. the past and choose individual resources if they wish to view another level in The same works for concrete workflows The user can choose one and then expand upon it by selecting an abstract job or workflow instance 18 2 2 2 USE CASE DIAGRAM End User A D Figure 4 Use Case Diagram For this system there is only one actor an End User This represents anyone using the system such as an administrator or normal user Each user can perform the same 19 actions regarding navigation and viewing statistics The diagram below shows what is possible 20 2 2 3 SEQUENCE DIAGRAM getorta u Figure 5 Sequence Diagram DCI Statistics 21 Figure 5 is a sequence diagram that demonstrates one path to get statistics in this case for DCI metrics This path is similar for all the levels The portal statistics are displayed firstand then the user needs to make a choice what to access next The portlet serves as the user interface for the end user and provides the options for the user The Menu Populator is responsible for providing a choice list for the user in applicable cases Metrics Information Factory provides metric descriptions such as the name and units of the possible metrics Statistics Factory retrieves the data for the given metric description and the database provides the data for all the objects The end user either an administrator or user accesses the portlet which accesses MetricsInformation
65. tractJob ConcreteWorkfl 67 ow Workflowlnstance stat JobStateTypeStatistics Contains calculated statistics about job states stat metric description Contains information about the access and grouping of metrics for display Table 6 Database Tigger Descriptions Name Description BEFORE INSERT ON stat running toJobInstance stat_running entries into stat_JobInstance and stat_JobInstanceStatus entries Calls CreateOrAddToJobInstance BEFORE UPDATE ON stat_JobInstance TOAGGJOB stat_JobInstance and status entries to stat AggregateJob and stat AggregateJobStatus Calls JobInstanceToAggregateJob BEFORE UPDATE ON stat_ConcreteWorkflow Calculate average and standard deviation BEFORE INSERT ON stat_ConcreteWorkflow Calculate average and standard deviation BEFORE UPDATE ON stat Workflowlnstance Calculate workflow execution time BEFORE UPDATE ON stat_statistics Calculate average and standard deviation BEFORE UPDATE ON stat_JobStateTypeStatistics Calculate average and standard deviation BEFORE INSERT ON stat_statistics Calculate average and standard deviation BEFORE INSERT ON stat_JobStateTypeStatistics Calculate average and standard deviation 68 Table 7 Stored Procedures Name Description JoblnstanceToAggregateJob Chooses to create new stat AggregateJob entry or updates an existing one Also inserts or
66. ved which consisted of many entries for each job instance run on the grid As the number of jobs that are run could be very large due to the nature of parameter study workflows we determined we must consume these entries upon their entry into the database This was accomplished using database triggers which execute a routine in conjunction with SQL INSERT or UPDATE statements There were three table structures maintained by the database 28 portal URL VARCHAR 255 userlD VARCHAR 255 wflD VARCHAR 255 wrtlD VARCHAR 255 jobName VARCHAR 255 pid VARCHAR 255 lt gt jobStatus VARCHAR 255 lt gt wiStatus VARCHAR 255 lt gt resource VARCHARI I2551 seq VARCHAR 255 tim TIMESTAMP lt gt entered TINYINT 1 Figure 10 stat_running table description First was the stat running table which received data from gUSE in a polling manner For each job being run on the portal the portal would periodically query the job s status and record the information in this table Therefore this table has many entries for each job executed The stat running table was consumed using database triggers that executed whenever a row was inserted into it That trigger would create or add to data to the next intermediate table structure which grouped data by job instance 29 P id INT 11 y wrtlD VARCHAR 255 lt gt jobName VA RCHARI 255 lt gt pid VARCHAR 255 lt gt wD VARCHAR 255
67. void insertEntity con Connection void lt lt create gt gt ConcreteWorkflow wfid String getKey0 String getStatFKColumn String getTable0 String getWhereClause String URL String User getKey String getKeyColumn0 String getTable0 String lt lt create gt gt Portal URL String useriD String setKeys ps PreparedStatement void getStatFKColumn0 String getWhereClause0 String insertEntity con Connection void getKeyColumn String setKeys ps PreparedStatement void Figure 23 StatAggregator Class Diagram Part 2 58 setKeys ps PreparedStatement void create User useriD String insertEntity con Connection void getKey0 String getKeyColumn 0 String getStatFKColumn String getTable0 String getWhereClause String APPENDIX B 2 PORTLET DATA ACCESS LAYER This diagram describes the structure of the data access layer for the statistics portlet DBBase MenuPopulator con Connection connectionSource DBBase closeQ void create MenuPopulator conFactory DBBase getConnection0 Connection getPortals List lt String gt getResource DCI String List lt String gt PropertyManager getResource0 List lt String gt getDCls List lt String gt properties Properties get WFIDs useriD String Map lt String String gt getProperties Properties getWRTIDs w
68. we were alright Furthermore we would like to thank Zs fia JAvor who would let us know whenever anything was going on and Dr R bert Lovas who would always take the time to have a friendly conversation And to everyone on the staff of LPDS thank you for providing a warm environment and making our time here both enjoyable and comfortable we really enjoyed our stay Finally we would like to thank our advisor Gabor S rk zy and co advisor 3 Stanley Selkow for their guidance on our project the preparation that went into our being here and our stay in Budapest We would like to especially thank Professor Sark zy for advising this project and always making sure we were on the right track both for our project and for our experiences in Hungary TABLE OF CONTENTS ABSTRA ei WEE 2 ACKNOWLEDGEMENTS seess ge 3 TABLE OF CONTENTS es eis oe ees aa eed eie vee ee se vee ees ese eds eie Oe es eed e ee Vee ee eed ee ee 5 TABLE OF FIGURES Wiscasset 2 ees eel w ee ese EE 8 ABACKGROUNDE ie SE EE DE el a Ee DE DE SA DE EE EE de DE 9 1 1 PROJECT STATEMENT 9 1 2 GRID COMPUTING 10 1 52 12 WORKFLOWS AND e 10 1 3 PORTALS 11 1 3 1 WS PGRADE GRID PORTAL AND GUSE esse se see sen se see sek an ges ek an aa anan saa os an sanson an sanson oe Rek oe Rek RR 12 1 3 2 LIFERAY 12 1 3 3 PORTLETS 12 1 4 METRICS 13 1 5 MTA SZTAKI 13 1 5 1 LPDS 14 2 METHODOLOGY S a a a a l AE 15 2 1 ARCHITECTURE 15 2 2 USER INTERFACE REQUIREMENTS 17 2 2 1 USE CASES
Download Pdf Manuals
Related Search
Related Contents
Kenmore 415.1610711 Gas Grill User Manual 取扱い説明書(PDF) Transition Networks J/E-CX-TBT-01 User's Manual ShutdownAgent 2012 User Manual MC Manual for iPhone, Rls 4.3 CLE8625DP-03-05 - McCoy Global Inc. 取扱説明書:準備編 Copyright © All rights reserved.
Failed to retrieve file