Home

Distributed Application Control System (DACS)

image

Contents

1. timestamp D user parameters y Main Driver interval Db operations interval 2 interval Process skipped intervals HAE 3 4 Compute C Determine __ ompute operations start time for ntum data count new interval via SQL timestamp forced skipped Update timestamp 8 Compare data count to time data threshold function complete Create skipped interval Create and send HAE interval one transaction HAE timestamp FIGURE 22 TIN SERVER DATA FLOW interval Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Distributed Application Control Syst IDC 7 3 1 June 2001 Detailed Design WaveGet server WaveGet server is a data monitor server that polls the request table for auxiliary station waveform requests and initiates actions to acquire the requested wave forms The actions include IPC message enqueues into one or more Tuxedo queues and the updating of the state of the revised requests in the database The IPC mes sages consist of the updated request information The enqueued messages initiate pipeline processing that ultimately results in auxiliary waveform being requested by the Retrieve Subsystem WaveGet server processes both new requests and previ
2. Write and send updated Update requests one timestamp dispatch 4 FIGURE 23 WAVEGET SERVER DATA FLOW WaveGet server manages the retry of previous failed requests Failures are detected by the DACS and recorded in the request table WaveGet server repro cesses previous failed attempts after a small time interval has elapsed Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design In archival mode WaveGet_server changes the state of selected entries in the request table The intent is to change the state of requests that have either too many failures or are too old The new state both prevents WaveGet_server standard mode from considering these requests and provides a clear indication to an opera tor that the request is no longer being considered by WaveGet_server Input Processing Output tis_server Figure 18 on page 58 shows data and processing flow for tis server tis server receives input from user defined parameter files the database and the scheduler server The parameter files specify all processing details for a given instance of the data monitor server Details include database account station names database queries and interval coverage threshold values The user parameters are used to construct the recurring database queries to check or monitor the availability of n
3. FiGURE 5 DACS As MIDDLEWARE The operating system used at the IDC is Solaris a version of UNIX by Sun Micro systems the application software is the SAIC supplied software and the DACS middleware is a product called Tuxedo which is provided by BEA Tuxedo is widely used for banking applications and other branches of industry that maintain distrib uted applications for example phone companies courier services and chain Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 1 Overview W retailers Tuxedo is a powerful and versatile product of which each application typically uses only a part This document does not provide an introduction to the full scope of Tuxedo see And96 and BEA96 Instead only those features of Tuxedo with a direct bearing on the IDC software are included Tuxedo is a transaction manager that coordinates transactions across one or more transactional resource managers Example transactional resource managers include database servers such as ORACLE and the queueing system that is included with Tuxedo This queueing system is used extensively by the DACS for reliable mes sage storage and forwarding within the IDC Automatic and Interactive Processing software The disk based queues and the database maintain the state of the sys tem during any system or process failure Tuxedo also provides extensive backup and self correctin
4. Figure 4 shows key features of the DACS application that supports the Interactive Processing software In support of Interactive Processing the DACS is a messaging based system which enables data sharing between Interactive Tools The DACS allows separate pro grams to exchange messages in near real time The DACS provides some manage ment of the Interactive Tools by automatically invoking a requested program when needed This feature allows an analyst to easily summon the processing resources of occasionally used auxiliary programs A DACS monitoring utility confirms that processes are running and accepting messages In support of Interactive Process ing the DACS also supports interactive requests to certain Automatic Processing applications 1 The DACS queuing system is not shown the figure Distributed Application Control System DACS Q June 2001 IDC 7 3 1 IDC DOCUMENTATION Software analyst analyst review Interactive gt Tools execution and message IPC messages and events Tuxedo for Interactive Processing Interactive Tools monitoring automatic pipeline process control Automatic Processing FIGURE 4 DACS APPLICATION FOR INTERACTIVE PROCESSING FUNCTIONALITY Chapter 1 Overview W Figure 5 shows the concept of middleware The DACS coordinates the execution of various application programs on a network of computers by controlling these applicatio
5. IDC 7 3 1 June 2001 TABLE 11 Chapter 5 Requirements TRACEABILITY OF FUNCTIONAL REQUIREMENTS WORKFLOW MANAGEMENT CONTINUED Requirement How Fulfilled 255 Workflow management shall provide This requirement is fulfilled by the error recovery per data element for DACS tuxshell server failures of the Automatic Processing programs Error recovery shall consist of a limited number of time delayed retries of the failed Automatic Pro cessing program If the retry limit is reached the DACS shall hold the failed intervals in a failed queue for 25 6 The DACS shall initiate workflow Reliable queue messaging disk and management of each data element transaction based messaging within within 5 seconds of data availability the DACS can occur at least 10 times per second and workflow manage ment of each data element can be initiated with the same frequency However tis server database queries currently take about 20 seconds at the IDC and tis server is currently configured to run every 90 seconds Therefore the worst case is in excess of 100 seconds after data are avail able The 5 second requirement is not possible given the current database server dependence 257 Workflow management shall deliver Same as above intervals from one Automatic Pro cessing program to the next program in the sequence within five seconds of completion of the first program If the second program is busy with another interval
6. IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements TABLE 8 TRACEABILITY OF GENERAL REQUIREMENTS CONTINUED Requirement How Fulfilled 7 Operational Mode pause For Automatic Processing the pause Automatic Processing completion of mode is displayed by stalling schedul active automatic processing ing of the data monitor servers using the tuxpad schedule it script and Interactive Processing full interactive possibly the shutdown of the DACS processing TMQFORWARD servers to stop pro cessing of queued intervals For Interactive Processing this requirement is fulfilled the same as above although this processing mode is not generally applicable to interac tive processing 8 The DACS shall be started at boot The DACS is booted by the operator time by a computer on the IDC local area network The boot shall leave the DACS in the stop state After it is in this state the DACS shall be opera tional and unaffected by the halt or crash of any single computer on the network usually via tuxpad and the DACS is effectively in the stop or pause mode awaiting operator action to initiate the play mode The DACS can survive the crash of a single computer in most cases Single points of failure include the database server and the file logging server which are accepted single points of failure The scheduling system queue server is a sin gle point
7. ous requests that have failed to result in successful auxiliary waveform acquisition WaveGet server provides standard mode and archival mode processing Standard mode processing operates on incomplete requests for data Archival mode process ing operates on requests for which too many retrieval attempts have failed or too much time has elapsed In standard mode processing WaveGet server sorts all active requests for data by four different criteria The first sort is by priority of request the second is by trans fer method the third is by station and the fourth is by time WaveGet server prioritizes the requests based upon a list of priority names defined by the user parameters The priority names define different request types and within each priority level the requests are grouped by transfer method Within a transfer method the requests are sorted by station and by time After all active requests are sorted one IPC message per request is enqueued into the configured Tuxedo queue process 4 in Figure 23 em DACS v IDC DOCUMENTATION Chapter 4 Software W Detailed Design a scheduler F Reschedule v D user parameters Main Driver timestamp E Db operations A Compute waveget time for request table query request timestamp request sorted 1 prioritized request Query requests sort by priorities
8. output to the timestamp table to track interval creation by station However in practice the timestamp updates are carried out by database triggers that update this information based upon updates to the wfdisc table This performance optimi zation can be considered part of the tis server design but its implementation is external to tis server Upon interval creation tis server enqueues a message con taining the interval information into a Tuxedo queue for initiation of a pipeline pro cessing sequence on the time interval tis server completes its interval creation cycle by sending an acknowledgement SETTIME command to the scheduler server which results in rescheduling for the next tis server service call tiseg server Figure 20 on page 62 shows data and processing flow for tiseg server tiseg server receives input from user defined parameter files the database and the scheduler server The parameter files specify all processing details for a given instance of the data monitor server Details include database account auxiliary network database queries and station and time based interval coverage values The user parameters are used to construct the recurring database queries to check or monitor the avail ability of new station data Initial database input to tiseg server includes an auxil iary network which is used to build a complete station site and channel table for all monitored auxiliary stations tiseg server first carries
9. stations and other information on geo graphical maps Master machine Machine that is designated to be the controller of a DACS Tuxedo applica tion In the IDC application the custom ary logical machine identifier LMID of the Master is THOST message interval Entry in a Tuxedo queue within the qspace referring to rows in the interval or request database tables The DACS pro grams ensure that interval tables and qspace remain in synchronization at all times message queue Repository for data intervals that cannot be processed immediately Queues con tain references to the data while the data remains on disk Glossary NFS Network File System Sun Microsys tems Protocol that enables clients to mount remote directories onto their own local filesystems online Logged onto a network or having unspecified access to the Internet ORACLE Vendor of the database management system used at the PIDC and IDC P parameter par file ASCII file containing values for parame ters of a program Par files are used to replace command line arguments The files are formatted as a list of token value strings partitioned State in which a machine can no longer be accessed from other DACS machines via IPC resources BRIDGE and BBL PIDC Prototype International Data Centre pipeline 1 Flow of data at the IDC from the receipt of communications to the final automated processed data bef
10. FIGURE 10 FIGURE 11 FIGURE 12 FIGURE 13 FIGURE 14 FIGURE 15 FIGURE 16 FIGURE 17 FIGURE 18 FIGURE 19 FIGURE 20 FIGURE 21 FIGURE 22 FIGURE 23 FIGURE 24 FIGURE 25 IDC SOFTWARE CONFIGURATION HIERARCHY RELATIONSHIP OF DACS TO OTHER SUBSYSTEMS OF IDC SOFTWARE DACS APPLICATION FOR AUTOMATIC PROCESSING DACS APPLICATION FOR INTERACTIVE PROCESSING DACS AS MIDDLEWARE CONCEPTUAL DATA FLOW OF THE DACS FOR AUTOMATIC PROCESSING CONCEPTUAL DATA FLOW OF DACS FOR INTERACTIVE PROCESSING PROCESSING REQUESTS FROM MESSAGE QUEUE TRANSACTION IN DETAIL FORWARDING AGENT CONSTRUCTION OF A PIPELINE DATA FLOW OF THE DACS FOR AUTOMATIC PROCESSING DATA FLOW OF THE DACS FOR INTERACTIVE PROCESSING DATA FLOW OF DACS CSCs FOR AUTOMATIC PROCESSING CONTROL AND DATA FLOW DACS CSCs FOR INTERACTIVE PROCESSING DATA MONITOR CONTEXT DATA MONITOR ACKNOWLEDGEMENT TO SCHEDULING SYSTEM TIS_SERVER DATA FLOW CURRENT DATA AND SKIPPED INTERVAL CHECKS TISEG_SERVER DATA FLOW TICRON_SERVER DATA FLOW TIN_SERVER DATA FLOW WAVEGET_SERVER DATA FLOW SCHEDULING SYSTEM DATA FLOW TUXSHELL DATA FLOW Distributed Application Control System DACS IDC 7 3 1 June 2001 A W 17 21 22 23 26 29 34 50 53 55 56 58 60 62 64 66 68 79 85 IDC DOCUMENTATION FIGURE 26 DBSERVER DATA FLOW 89 FIGURE 27 MONITORING UTILITY WORKFLOW 95 FIGURE 28 WorRKFLOW DATA FLOW 97 FIGURE 29 TUXPAD DESIGN 112 FIGURE 30 QINFO DESIGN
11. Set of message queues grouped under a logical name The IDC application has a primary and a backup qspace The primary qspace customarily resides on the machine with logical reference LMID OHOST server Software module that aep requests from clients and other servers and returns replies server group Set of servers that have been assigned a common GROUPNO parameter in the ubbconfig file All servers in one server group must run on the same logical machine LMID Servers in a group often advertise equivalent or logically related services service Action performed by an application server The server is said to be advertising that service A server may advertise several ser vices multiple personalities and several servers may advertise the same service replicated servers shutdown Distributed Application Control System DACS IDC 7 3 1 June 2001 Action of terminating a server process as a memory resident task Shutting down the whole application is equivalent to termi nating all specified server processes application servers first admin servers second in the reverse order that they were booted v IDC DOCUMENTATION Software W About this Document TABLE IV TECHNICAL TERMS CONTINUED Term Description SRVID Server identifier integer between 1 and 29999 uniquely refer ring to a particular server The SRVID is used in the ubbconfig file and with Tuxedo administrative utilities to refer t
12. has been converted to a digital count which is monotonic with the amplitude of the stimulus to which the sensor responds Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Web World Wide Web a graphics intensive environment running on top of the Internet WorkFlow Software that displays the progress of automated processing systems workstation High end powerful desktop computer preferred for graphics and usually net worked Distributed Application Control System DACS IDC 7 3 1 June 2001 Glossary V IDC DOCUMENTATION Software Index A admin server vii 42 affiliation 27 121 122 application instances 5 application server vii 43 TMQFORWARD 44 TMQUEUE 44 TMS 43 TMS_QM 43 TMSYSEVT 44 TMUSREVT 44 Automatic Processing 5 conceptual data flow 14 utilities 32 availability management requirements 128 traceability 148 B backup component viii backup concept 23 BBL 42 birdie 100 control 109 error states 109 105 interfaces 109 boot viii BRIDGE 19 42 Distributed Application Control System DACS BSBRIDGE 42 bulletin board 42 C capacity mapping 24 catchup capability 24 client viii conventions data flow symbols v entity relationship symbols vi typographical vii CSCI external interface requirements 137 traceability 161 CSCI internal data requirements 142 traceability 169 D DACS filesystem use 20 interface w
13. 114 FIGURE 31 SCHEDULE_IT DESIGN 115 FIGURE 32 ENTITY RELATIONSHIP OF SAIC DACS CSCs 121 FIGURE 33 DATA ARRIVAL EXAMPLE 139 Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Distributed Application Control System DACS TABLES TABLE I TABLE TABLE III TABLE IV TABLE 1 TABLE 2 TABLE 3 TABLE 4 TABLE 5 TABLE 6 TABLE 7 TABLE 8 TABLE 9 TABLE 10 TABLE 11 TABLE 12 TABLE 13 TABLE 14 TABLE 15 TABLE 16 DATA FLOW SYMBOLS ENTITY RELATIONSHIP SYMBOLS TYPOGRAPHICAL CONVENTIONS TECHNICAL TERMS DATABASE TABLES USED BY DACS MAP OF TUXEDO COMPONENTS TO SAIC DACS COMPONENTS DACS LiBIPC INTERVAL MESSAGE DEFINITION LIBIPC API DATABASE UsAGE BY DACS DACS OPERATIONAL MODES FAILURE MODEL TRACEABILITY OF GENERAL REQUIREMENTS TRACEABILITY OF FUNCTIONAL REQUIREMENTS AVAILABILITY MANAGEMENT TRACEABILITY OF FUNCTIONAL REQUIREMENTS MESSAGE PASSING TRACEABILITY OF FUNCTIONAL REQUIREMENTS WORKFLOW MANAGEMENT TRACEABILITY OF FUNCTIONAL REQUIREMENTS SYSTEM MONITORING TRACEABILITY OF FUNCTIONAL REQUIREMENTS RELIABILITY TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS TRACEABILITY OF CSCI INTERNAL DATA REQUIREMENTS TRACEABILITY OF SYSTEM REQUIREMENTS Distributed Application Control System DACS IDC 7 3 1 June 2001 vi vii vii 27 39 103 106 122 127 136 144 148 150 153 156 158 161 169 169 IDC DOCUMENTATION Software About this
14. 21 The DACS shall provide a graphical This requirement is fulfilled by the display of the status of message pass dman client ing with each Interactive Processing program The status shall indicate the interactive processes capable of receiving messages and whether there are any messages in the input queue for each receiving process 32 The DACS displays shall remain cur This requirement is fulfilled in general rent within 60 seconds of actual time because the DACS is always process The system monitoring displays shall Xing in real time or near real time Spe provide a user interface command cifically the DACS status at the that requests an update of the display machine or server level is available in with the most recent status real time via the tuxpad refresh but ton WorkFlow updates on an opera tor specified update interval or on demand via a GUI selection UJ UJ k Distributed Application Control System DACS IDC 7 3 1 June 2001 Requirements IDC DOCUMENTATION Software TABLE 12 TRACEABILITY OF FUNCTIONAL REQUIREMENTS SYSTEM MONITORING CONTINUED Requirement The DACS run time status display shall be capable of displaying all pro cesses managed by the availability manager The DACS message passing display shall be capable of displaying the empty non empty message queue status of all processes that can receive messages The DACS work flow management display shall be ca
15. 7 3 1 IDC DOCUMENTATION Software N network processing 63 64 67 O operate_admin 110 control 118 error states 119 115 interfaces 118 operational modes 12 P partitioned ix pipeline description 25 schematic 26 Processinterval 31 93 Q ginfo 110 control 118 error states 119 115 interfaces 118 qmadmin 46 qspace ix 46 queues 46 queue space 46 R recycler router 90 recycler server 32 51 control 92 error states 93 Distributed Application Control System DACS 92 interfaces 92 reliability requirements 134 traceability 158 requirements COTS software 11 CSCI external interface 137 CSCI internal data 142 functional 128 general 126 hardware 11 system 142 requirements traceability 144 rollback 22 S schedclient 31 49 77 control 81 error states 82 78 interfaces 82 schedule it 110 control 118 error states 119 115 interfaces 118 scheduler 30 49 77 control 81 error states 82 78 interfaces 82 semaphores 45 SendMessage 31 93 server 21 server group ix service ix 21 shared memory 45 single point of failure 24 software requirements 11 SRVID x Index v Index system monitoring requirements 133 traceability 156 system requirements 142 traceability 169 T tagent 42 technical terms vii ticron_server 30 54 63 64 67 71 timestamp 28 75 121 123 tin server 30 54 72 tis server 30 49 57 69 tis
16. 8 in Figure 24 on page 79 are sent with the TPNOREPLY flag set which means there will be no reply no returned result in the result queue scheduler servers generate ouput to log files Tuxedo queues and Tuxedo servers The updated scheduling states are enqueued to the schedule queue O1 in Figure 24 on page 79 Output to Tuxedo services consists of service calls to data monitor servers schedclient generates output to the terminal or message window and to the sched command queue Q2 in Figure 24 on page 79 Control scheduler start up and shut down are handled by Tuxedo because scheduler is a Tuxedo application server Start up upon system boot up is initiated by an operator as is manual start up and shut down of one or more of the replicated scheduler servers However Tuxedo actually handles process execution and termination Tuxedo also monitors scheduler servers and provides automatic restart upon any unplanned server termination schedclient is always started as part of an operator request The request can be direct by submission of the schedclient command within a UNIX shell environment or indirect by the operator GUI tuxpad specifically by the schedule it GUI Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software Detailed Design Interfaces The interface to the scheduling system is through the schedclient application which sends commands to scheduler C
17. Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software AUDIENCE About this Document This document is intended for all engineering and management staff concerned with the design and requirements of all IDC software in general and of the DACS in particular The detailed descriptions are intended for programmers who will be developing testing or maintaining the DACS RELATED INFORMATION See References on page 175 for a list of documents that supplement this docu ment The following UNIX Manual man Pages apply to the existing DACS soft ware dbserver 1 dman 1 interval router 1 libipc 3 birdie 1 recycler_server 1 schedclient 1 scheduler 1 SendMessage 1 tis_server 1 tiseg_server 1 ticron_server 1 tin_server 1 WaveGet_ server 1 tuxpad 1 tuxshell 1 WaveGet server 1 WorkFlow 1 USING THIS DOCUMENT This document is part of the overall documentation architecture for the IDC It is part of the Software category which describes the design of the software This document is organized as follows Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software About this Document m Chapter 1 Overview This chapter provides a high level view of the DACS including its func tionality components background status of development and current operating environment m Chapter 2 Architectural Design This chapt
18. Document This chapter describes the organization and content of the document and includes the following topics m Purpose m Scope m Audience m Related Information m Using this Document Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software About this Document PURPOSE This document describes the design and requirements of the Distributed Process ing Computer Software Configuration Item CSCI of the International Data Centre IDC The collection of software is more commonly referred to as the Distributed Application Control System DACS The DACS consists of commercial off the shelf COTS software and Science Applications International Corporation SAIC designed Computer Software Components CSC including server applications cli ent applications one global library and processing scripts SCOPE The DACS software is identified as follows Title Distributed Application Control System Abbreviation DACS This document describes the architectural and detailed design of the software including its functionality components data structures high level interfaces method of execution and underlying hardware Additionally this document speci fies the requirements of the software and its components This information is mod eled on the Data Item Description for Software Design DOD94a and Data Item Description for Software Requirements Specification DOD94b Distributed
19. Time to Recover 5 sec detected and automatically recovered onds for detection and 5 seconds to by the DACS as discussed previously initiate recovery 41 4 process timing failure interactive In general the analyst detects and applications recovers from these failures The Maximum Failure Rate not detect DACS for Interactive Processing does able include process monitoring and time out monitoring for tuxshell child pro Maximum Time to Recover user cesses detection and recovery 41 5 all others N A Maximum Failure Rate undefined Maximum Time to Recover unde fined TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS Requirement How Fulfilled 42 The DACS shall interface with the All DACS access to the database ORACLE database through the GDI server is through the GDI 43 The DACS shall read from the The DACS data monitor applications wfdisc table The DACS shall assume wfdisc table entries will follow the data model described in IDC5 1 1Rev2 tis server and tiseg server read the wfdisc table Access to the table is fully compatible with the published database schema v Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement The DACS shall insert and update entries in the interval table which is used as a monitoring point for the Automatic Processing system As part of reset mode the DAC
20. UNIX System Labs Inc X Window System is a registered trademark of The Open Group Ordering Information The ordering number for this document is SAIC 01 3001 This document is cited within other IDC documents as IDC7 3 1 Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Distributed Application Control System DACS CONTENTS About this Document PURPOSE SCOPE AUDIENCE RELATED INFORMATION USING THIS DOCUMENT Conventions Chapter 1 Overview INTRODUCTION FUNCTIONALITY IDENTIFICATION STATUS OF DEVELOPMENT BACKGROUND AND HISTORY OPERATING ENVIRONMENT Hardware Commercial Off The Shelf Software Chapter 2 Architectural Design CONCEPTUAL DESIGN DESIGN DECISIONS Programming Language Global Libraries Database Interprocess Communication IPC Filesystem UNIX Mail Distributed Application Control System DACS IDC 7 3 1 June 2001 10 10 11 11 11 13 14 18 18 18 19 19 20 20 IDC DOCUMENTATION FIP 20 Web 20 Design Model 21 Distribution and Backup Concept 23 Pipelines 25 Database Schema Overview 27 m FUNCTIONAL DESCRIPTION 28 Distributed Process Monitoring Reliable Queueing and Transactions 28 Data Monitoring 30 System Scheduling 30 Pipeline Processing 31 Workflow Monitoring 31 Automatic Processing Utilities 32 Operator Console 32 Interactive Processing 32 m INTERFACE DESIGN 34 Interface with Other IDC Systems 34 Interfa
21. a near continuous fashion The DACS nominally forms intervals of segments of 10 minutes in length However during recovery of a data acquisition system failure the DACS forms intervals of up to one hour in length The DACS can be configured to form intervals of practically any size Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 5 Software Requirements 45 2 The data from each source nominally arrive in piecewise increasing time order Data delivery from an individual station may be interrupted and then resumed Upon resumption of data delivery the data acquisition system may provide current data late data or both Current data resumes with increasing time and late data may fill in a data gap in either increasing FIFO or decreasing LIFO time order from the end points of the time gap Figure 33 shows an example where current continuous data are inter rupted and then resumed which is then followed by examples of both FIFO and LIFO late data arrival In A continuous data arrive with advancing time B Data are interrupted no data arrive C Data begin to arrive again starting with the current time D Both late data and continuous data arrive in tandem the late data fills in the data gap in FIFO order E Both late data and continuous data arrive in tandem the late data fill in the data gap in LIFO order The data acquisition system defines each channel of a seismic
22. and time The sorted list is recorded in a memory based list and is the central data structure for all server operations process 3 and M1 in Figure 23 on page 68 The sorted list is pruned of any request names that are not defined in the user defined list of station names The pruning involves updating the request states to a user specified ignore state which removes the request from further consideration The sorted list of requests is updated in the database and sent to a Tuxedo queue as one global transaction processes 4 and 5 in Figure 23 on page 68 In archival mode processing WaveGet server will set request state ailed for all old requests that have not resulted in successful auxiliary waveform acquisition within a user specified time lookback and or have failed an excessive number of times Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Distributed Application Control Syst IDC 7 3 1 June 2001 Detailed Design WaveGet server generates output to log files the database Tuxedo queues and the scheduler server Output to the database includes updates to the request table and timestamp table request table updates to state queued are coupled with enqueues of the request information to a Tuxedo queue The enqueue initiates the pipeline processing sequence to retrieve the requested auxiliary waveform WaveGet server completes its processing cycle by sending an
23. arrives With an interrupt the pro cess shall rely on the interrupt such as activity on a UNIX file descriptor to indicate when a message is waiting 48 The DACS shall interface with the UNIX operating system to start Automatic Processing programs and wait on the termination of these programs Pro cesses started by the DACS shall inherit the system privileges of the DACS including the process group environment and file system permissions Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 49 50 51 Requirements The DACS shall collect the exit or abnormal termination status of processes it starts The exit status shall be used to determine success or failure of the Auto matic Processing program Processes shall use a defined set of exit codes to indicate various levels of success and another set of codes to indicate different types of failure The DACS shall interface with an operator or operators The DACS shall pro vide monitoring displays and control interfaces The monitoring displays shall provide system monitoring for computer status process status workflow sta tus and the message passing service The information presented with each monitoring display is specified in System Monitoring on page 133 The control interface shall enable the operator to take actions on the DACS The control interface supports the functions listed in the following su
24. best form of error handling is a repeated attempt to call the data monitor server As such scheduler always schedules a subsequent call to the data monitor service immediately after the service call This worst case schedule time is typically set beyond the time the service would next normally be called and is tunable via user Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design parameters A successful data monitor service call completes with an acknowledg ment SETTIME command step 8 in Figure 24 on page 79 enqueued into the command queue This acknowledgment command results in an update of the next scheduled time to call this data monitor service scheduler commands and results pass through the command and result queues The results of most commands are simply a boolean success or fail The show com mand is an exception where scheduler returns the human readable listing of sched uled services scheduler commands and results are matched by the Tuxedo queue based correlation identifier that is used by both scheduler and schedclient schedcli ent polls the result queue step b in Figure 24 on page 79 and searches for the matching result of the command that was enqueued into the command queue step a in Figure 24 on page 79 scheduler commands such as the SETTIME com mands originating from the data monitor applications for example tis server step
25. calls a tuxshell server within a transaction but the processing application status success or fail is sent back to the calling client via a libipc message process 6 in Figure 15 on page 53 However the message is not entirely ibipc compliant in that tuxshell does not send an IPC broadcast to the interactive session dman client Finally tuxshell does not attempt an interval state update in the databases because this processing is on the fly and is not represented as an interval in the database The structure of messages within DACS for both Interactive Processing and Auto matic Processing is defined by ibipc and is described in detail in Table 3 The first column of Table 4 lists the message attribute name the middle column maps any relationship to the database interval request table and the third column defines the attribute and explains how it is used within DACS for both Interactive and Auto matic Processing The design decision to base libipc messaging on Tuxedo disk queuing was influ enced by several criteria including convenience history and implementation time constraints The implementation was convenient because messages within DACS for Automatic Processing are based upon Tuxedo queues and Interactive Process 13 In practice the lack of the IPC event message does not cause any problems Distributed Application Control System DACS Q June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed D
26. collection of intervals data element references and shall update the status of intervals in the interval database table SYSTEM REQUIREMENTS The DACS shall be configurable 53 The implementation of the DACS shall allow for configuration data of the number and type of computers on the network and the number of auto mated processes of each type allowed to execute on each computer type The implementation of the DACS also requires the execution parameters for each process in the Automated and Interactive Processing 54 Only authorized users shall be allowed to initiate processing Unauthorized requests shall be rejected and logged The DACS shall require passwords from authorized users at login 55 The DACS shall operate in the IDC environment 56 The DACS shall operate in the same hardware environment as the IDC 57 The DACS requires extensive database queries to detect new wfdisc records These queries will impact the database server Otherwise the DACS shall con sume negligible hardware resources 58 Similarly the DACS must share the same software environment as the rest of the IDC While this environment is not exactly defined at this time it is likely to include m Solaris 7 or 8 m ORACLE 8 x m X Window System X11R5 or later m TCP IP network utilities 59 The DACS shall adhere to ANSI C POSIX and SQL standards Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Softw
27. configuration parameters of the DACS The DACS shall only require a user level prior understanding of UNIX and Motif The DACS shall be delivered electronically The DACS capabilities of workflow management and message passing are ranked equally high in terms of criticality These capabilities shall function in the event of system failures The functions of availability management and system monitoring rank next in order of importance The DACS shall continue to perform the first set of functions even if the second set of functions are unavailable for any reason Distributed Application Control System DACS IDC 7 3 1 June 2001 v Chapter 5 Requirements IDC DOCUMENTATION Software REQUIREMENTS TRACEABILITY Tables 8 through 16 trace the requirements of the DACS to components and describe how the requirements are fulfilled TABLE 8 TRACEABILITY OF GENERAL REQUIREMENTS Requirement Operational Mode shutdown Automatic Processing no automatic processing DACS not running Interactive Processing no interactive processing DACS not running How Fulfilled For Automatic Processing the DACS can be shutdown under operator control using tuxpad scripts tuxpad and schedule it or a Tuxedo admin istration utility and schedclient For Interactive Processing this requirement is fulfilled the same as for Automatic Processing although in practice the operators tend not to have to administer the DACS because
28. group server and services level Appropriate backups are configured to seamlessly take over processing as soon as a primary system component fails or becomes unavail able boot Action of starting a server process as a memory resident task Booting the whole application is equivalent to booting all speci fied server processes admin servers first application servers sec ond client Software module that gathers and presents data to an applica tion it generates requests for service and receives replies This term can also be used to indicate the requesting role that a soft ware module assumes by either a client or server process DACS machines Machines on a Local Area Network LAN that are explicitly named in the MACHINES and NETWORK sections of the ubbconfig file Each machine is given a logical reference see LMID to associate with its physical name data monitors Class of application servers that monitor data streams and data availability form data intervals and initiate a sequence of gen eral processing servers when a sufficiently large amount of unprocessed data are found dequeue Remove a message from a Tuxedo queue enqueue Place a message in a Tuxedo queue forwarding agent Application server TMQFORWARD that acts as an intermediary between a message queue on disk and a group of processing servers advertising a service The forwarding agent uses transac tions to manage an
29. host 2 and host 3 three servers A1 A2 and A3 are running each of which is capable of providing the service A The DACS assures that each service request goes to one and only one server and is eventually removed from the message queue only after processing is complete E ee ot host 2 host 1 2 Tuxedo AI load balancing E service A FiGURE 8 PROCESSING REQUESTS FROM MESSAGE QUEUE Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design Figure 9 shows a transaction as one step in a series of processing steps to be applied to data intervals It shows a processing server An between a message queue A its source queue and a message queue B its destination queue The processing server advertises service A and is capable of spawning a child process a the automated processing program that actually provides the service processing server q An child process a application program FIGURE 9 TRANSACTION IN DETAIL Assuming that queue A contains at least one message the first step of the transac tion step is to provisionally remove the uppermost message from queue A In step 1 information is extracted from the message and sent to processing server An Server An spawns a child process a and
30. it automatically starts on machine boot and normally requires zero administration The crinteractive script is also used by the operator to adminis ter Interactive Processing instance s IN Operational Mode stop Automatic Processing no automatic processing all automatic processing system status saved in stable storage all automatic processing programs terminated all DACS processes idle Interactive Processing full interactive processing For Automatic Processing the DACS can be stopped under operator con trol using tuxpad scripts tuxpad and schedule_it or a Tuxedo administra tion utility and schedclient In the stop mode all of the DACS is termi nated except for the Tuxedo adminis tration servers for example BBL on each DACS machine For Interactive Processing this requirement is fulfilled the same as above and also normally is never required Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements TABLE 8 TRACEABILITY OF GENERAL REQUIREMENTS CONTINUED Requirement How Fulfilled 3 Operational Mode fast forward For Automatic Processing the DACS Automatic Processing full automatic Provides extensive support for scaling processing automatic processing the number of machines servers and configured for burst data for exam services as well as such resources that ple GA replaced by additional are active at any given time Fas
31. message out of normal First In First Out FIFO queue order DACS sets MSGID CORRID to the value of interval intvlid thereby linking the queue interval message to the database interval record 2 MSGSRC N A This field stores the source qspace name and queue The source is sometimes referred to as the sender as in the sender that initiated the message send 3 MSGDEST N A This field stores the destination qspace name and queue The destination is sometimes referred to as the receiver as in the recipient that receives the delivered message Distributed Application Control System DACS IDC 7 3 1 June 2001 v Chapter 4 W Detailed Design IDC DOCUMENTATION Software TABLE 3 DACS LIBIPC INTERVAL MESSAGE DEFINITION CONTINUED Field Name 4 MSGCLASS Database Interval N A Description This field stores the class of the message which is generally used to distinguish queue messages between the Automatic and Interac tive Processing DACS applications 5 MSGDATA interval time endtime name class state intvlid request sta array chan class start_time end time reqid For messages sent to or within Automatic Pro cessing MSGDATA stores interval or request information These messages originate from either DACS data monitors or an Interactive Tool such as ARS The tuxshell server extracts this message value as a string and then parses time class and name values used to construct the auto
32. out partial interval processing process 2 in Figure 20 on page 62 An attempt is made to declare each partial interval complete querying the database for data availability of the remaining channels for the auxiliary station in question Data completeness is defined by all remaining channels or some subset subject to user defined parameters When the minimum number of auxiliary sta tion channels is confirmed interval state is updated to queued and the interval information is enqueued to a Tuxedo queue for example DFX queue to initiate pipeline processing process 7 in Figure 20 on page 62 The second and primary processing task of tiseg server is the interval creation algo rithm whereby complete and partial intervals are created The interval creation algorithm includes a sort of all wfdisc rows by station names process 3 in Figure 20 on page 62 to organize interval creation and processing in station lexicographic Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design order The availability of waveforms on the user defined monitor channel results in the creation of a interval The interval is considered only partial if the monitor channel is not joined by the minimum number of affiliated channels for the auxil iary station process 5 in Figure 20 on page 62 in a check of criteria identical to the partial interval check process 2 in Figure 20 on p
33. shows the relationship of the DACS to other subsystems of the IDC soft ware ig IMS continuous segmented b IMS continuous data Retrieve data auxiliary i Subsystem seismic data stations Subsystem y oe waveforms waveforms wfdiscs wfdiscs v Db operations DACS Automatic Processing configuration DACS Interactive Processing configuration Automatic Processing Interactive Processing FIGURE 2 RELATIONSHIP OF DACS TO OTHER SUBSYSTEMS OF IDC SOFTWARE The Continuous Data Subsystem receives data from primary seismic hydroacous tic and infrasonic S H I stations The Retrieve Subsystem receives data from aux iliary seismic stations The data consists of ancillary information stored in the ORACLE operations database and binary waveform files stored on the UNIX file system The ancillary information consists of rows in the wfdisc table and each row Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 1 Overview W includes file pointers to raw waveform data Within the IDC software the DACS is deployed in two separate application instances The DACS supports both auto matic and interactive processing The DACS addresses different needs of the soft ware within each of these CSCIs Figure 3 shows key features of the DACS that support the Automatic Processing software a continuous IMS d
34. station Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design channels The request table is read and updated in a manner similar to the interval table except that request records are only read and updated and are not created by the DACS The interval records are indexed by a unique identifier stored in the intvlid column and the lastid table is read and updated to retrieve and assign unique identifiers for each new interval record The timestamp table is used to store the progress of interval creation by time The timestamp records are managed for most of the processing pipelines where the last successful interval creation for the pipeline is recorded The timestamp records are also used to store the current wfdisc endtime on a station by station basis Updates to these timestamp records are handled by the database triggers wfdisc endtime and wfdisc NVIAR endtime Application of the triggers allows substantial performance gains when trying to query wfdisc endtime by station because there are very few records in the times tamp table compared to the wfdisc table Distributed Application Control System DACS eo June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design V timestamp interval intvlid keyvalue lastid _ k
35. station array hydroacoustic sensor or infrasonic sensor as a separate data source The result is that some channels may be delivered later than other channels from the same station or the channels might not be delivered at all 45 3 Data quality is a prime concern of the IDC mission however the DACS makes no determination of data quality Any data that are available shall be processed Distributed Application Control System DACS eo June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements data interruption data data ic amplitude TEE time advancing time A continuous data B interruption of data data resumption data J amplitude advancing time C resumption of continuous data after an interruption continuous late data arrival FIFO data data amplitude I p a md time D continuous data and resumption of FIFO late data heavy lines late data arrival LIFO EEG data data amplitude Les time E continuous data and resumption of LIFO late data heavy lines FiGURE 33 DATA ARRIVAL EXAMPLE Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 5 Software Requirements 46 The DACS shall interface with the Interactive Processing programs through a message passing API The DACS shall provide thi
36. the DACS Mes sage queues are interspersed between the elementary services The distribution scheme is based on the following objectives m Capacity Mapping All machines should be loaded in accordance with their capacities m Load Limitation No component of the system should be allowed to overload to a point where throughput would suffer m Load Balancing All machines should be used to approximately the same level of their total capacity m Minimization of Network Traffic Whenever possible mass data flow over the LAN should be avoided For example detection processing should usually occur on the machine that holds the data in a disk loop m Catchup Capability Some extra capacity in terms of processing speed n times real time should be reserved for occasions when processing must catch up with real time m Single Point of Failure Tolerance The system should withstand any single failure hardware or software and allow scheduled maintenance of individual hardware or software components without interrupting processing or if interruption is inevita ble with a seamless resumption of processing These objectives cannot always be met Trade offs between objectives arise given the fact that hardware and development resources are finite Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design Pipelines During automatic processing the same d
37. the reliable message broker sending processes may simulate There is limited and specific support either by iteratively sending thesame for event message broadcasting message to many receivers one to Where ibipc sends an event broad many many to one mes Similari ae ie cast to the DACS dman client for 15 Dy orted b multi le oint each message send and receive to point messaging that is receiving within the interactive session processes may receive separate mes sages from many senders The dman client also subscribes to Tuxedo event broadcasts which announce the joining and departing of a client of the interactive session 17 6 Location transparency sending and This requirement is fulfilled via libipc receiving processes do not need to messaging which is based on the know the physical location of the Tuxedo reliable queuing service other All addressing of messages is accomplished through logical names Akad Application programming interface This requirement is fulfilled via libipc the message service will be available messaging which is based on the to the Interactive Processing pro Tuxedo reliable queuing service grams via a software library linked at compile time 18 The message passing service shall This requirement is fulfilled by the provide an administrative control pro cess to support administrative actions The administrative actions shall
38. the tuxpad message GUI window as described above schedule it receives input from user parameters schedclient following execution of the schedclient show command and the user via GUI selections see Figure 31 on page 115 The user parameters are limited to the file path name of the sched client user parameter file which is used for every schedclient command generated and run by schedule it schedule it is built around the scheduling system s service list which is stored in an internal array This array is central to all supported tuxpad operations M1 in Figure 31 on page 115 The array is initialized and updated by parsing the output of the schedclient show command The parsed input consists of a list of service names including the scheduled time for the next service call and the configured delay time schedule it displays this service list in the GUI Selections of one or more services can be checked by the operator to define specific services to stall or unstall using the schedclient stall or unstall commands schedule it is primarily designed to provide a simple and direct front end to sched client However like tuxpad schedule it is also designed to support some more sophisticated compound command sequences An operator selection of the Kick Sched button results in the kick command sent to schedclient but only after stalling all services in the service list via the stall schedclient command for each service schedule it errors are dire
39. uted application tuxpad consists of five applications four of them are manifested in interactive GUls that are all accessible via the main tuxpad GUI The five applica tions are tuxpad operate admin schedule it qinfo and msg window The schedule it and qinfo applications can optionally be run stand alone whereas operate admin and msg window are integral to tuxpad All applications are designed to provide an intuitive front end to the underlying Tuxedo administrative commands for example tmadmin and the DACS control clients for example schedclient These front ends generate Tuxedo and the DACS client commands that are run Their output is parsed for results that are then presented to the oper ator via the GUI These primary design objectives necessitated a scripting language including flexible text parsing support for dynamic memory and variable length lists convenient process execution and management and a high level GUI toolkit Perl Tk the Perl scripting language with integrated bindings to the Tk GUI toolkit met all the requirements and is used for implementation for all five of the tuxpad scripts tuxpad drives the Tuxedo command line based administration tools tmadmin tmboot and tmunloadcf Figure 29 tuxpad also provides one button access to the qinfo schedule it and msg window GUls tuxpad displays all configured machines Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Soft
40. 30 2000 Science Applications International Corporation Interactive Analysis Subsystem Software User Manual SAIC 01 3001 2001 Science Applications International Corporation Distributed Application Control System DACS Software User Manual Revision 0 1 SAIC 00 3038 2000 IDC DOCUMENTATION Software Glossary A admin server Tuxedo server that provides interprocess communication and maintains the dis tributed processing state across all machines in the application Admin serv ers are provided as part of the Tuxedo distribution AEQ Anomalous Event Qualifier application DACS Tuxedo System of cooperating processes config ured for a specific function to be run in a distributed fashion by Tuxedo Also used in a more general sense to refer to all objects included in one particular ubbconfig file machines groups servers and associated shared memory resources qspaces and clients application server Server that provides functionality to the application architecture Organizational structure of a system or component Distributed Application Control System DACS architectural design Collection of hardware and software components and their interfaces to establish the framework for the devel opment of a computer system archive Single file formed from multiple inde pendent files for storage and backup purposes Often compressed and encrypted ARS Analyst Review Station This a
41. C 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 11 TRACEABILITY OF FUNCTIONAL REQUIREMENTS WORKFLOW MANAGEMENT Requirement How Fulfilled 25 The DACS shall provide workflow This requirement is fulfilled in the management for the Automatic Pro DACS by a number of components cessing Workflow management and features including reliable queu ensures that data elements get pro ing transactions process monitor cessed by a sequence of Automatic ing data monitor servers tuxshell Processing programs A data element so on is a collection of data typically a dis crete time interval of time series data that is maintained by processes exter nal to the DACS The DACS workflow management shall create manage and destroy internal references to data elements The DACS references to data elements are known as inter vals The capabilities of the workflow management are enumerated in the following subparagraphs 254 The DACS shall provide a config This requirement is fulfilled by the urable method of defining data ele DACS data monitor servers specifi ments The parametric definition of cally tis_server and tiseg_server and data elements shall include at leasta the ability to specify the required minimum and maximum time range parameters related to interval cre a percentage of data required a list ation of channels stations and a percent age of channels and or stations required If th
42. CONSTRUCTION OF A PIPELINE Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design V Database Schema Overview The DACS uses the ORACLE database for the following purposes m To obtain data availability acquired waveform data submitted data requests m To obtain interval processing progress via queries to the interval table m create processing intervals and requests and update their states m To obtain and store the DACS processing progress by time for example tis server progress m To obtain and store specific station wfdisc endtime information in an effi cient manner m To obtain network station and site affiliation information m store and manage unique interval identifier information Table 1 shows the tables used by the DACS along with a description of their use The Name field identifies the database table The Mode field is R if the DACS reads from the table and W if the system writes updates to the table TABLE 1 DATABASE TABLES USED BY DACS Name Mode Description affiliation R This table is a general mapping table which affiliates information The DACS uses the affiliation information to obtain mappings between network and stations and sta tions and sites during station based interval creation interval R W This table contains the state of all processing intervals that are created updated displayed and managed b
43. I call 46 6 receive receive a message argument This requirement is fulfilled via the specifies message types to read ipc receive libipc API call 46 7 delete delete messages from queue This requirement is fulfilled via the argument specifies most recent or all purge libipc API call messages Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement The DACS shall offer three types of notification of new messages none callback invocation or an interrupt The type shall be chosen by a process when it registers With none the pro cess shall call the poll function to check on message availability With callback invocation the process shall register a callback procedure to be executed when a message arrives With an interrupt the process shall rely on the interrupt such as activity on UNIX file descriptor to indicate when a message is waiting How Fulfilled Two of the three types of notification are fulfilled although the second type is fulfilled in a modified form Mes sage notification type none is ful filled via explicit calls to the ipc_receive libipc API call Message notification type callback is fulfilled via the ipc add xcallbackQ libipc call except that the registered call back or handler function is called
44. NTATION Software Chapter 2 Architectural Design V The database serves as the data exchange broker for the DACS and the various Data Services subsystems The DACS provides message passing and session man agement to the Interactive Tools within the Interactive Processing System Interface with External Users The DACS has no interface with external users Interface with Operators System operators control and monitor the DACS through tuxpad and WorkFlow as described above The DACS for Automatic Processing and Interactive Processing is designed to run unattended and to survive many failure conditions Ideally opera tor control is limited to planned system start up shut down and maintenance The DACS servers record processing progress such as interval creation and pipeline processing executions on the system wide logging directory tree Automatic Pro cessing progress and problem detection and resolution can be ascertained through the inspection and analysis of one or more of the DACS log files Operators will often be the first to examine the log file however developers of the Automatic Processing programs may examine the files in the course of debugging at system level Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 3 Tuxedo Components and Concepts This chapter describes the Tuxedo COTS software product including the compo nents and function of Tuxedo used
45. S may delete or alter entries in the interval table to force reprocessing of recent data ele ments Purging of the interval table is left to processes outside the DACS How Fulfilled The DACS manages the interval table to reflect the state of all automatic processing Interval deletion is not generally supported which is appar ently not a problem Intervals are changed as a part of interval repro cessing accessible through WorkFlow The DACS shall interface with the wfdisc table of the ORACLE data base The software systems of the Data Services SCSI shall acquire the time series data and populate the wfdisc table The DACS shall assume a particular model for wfdisc record insertion and updates The DACS shall be capable of accepting data in the model described by the following subparagraphs The DACS reads the wfdisc table Access to the table is fully compatible with the published database schema 45 1 The IDC Continuous Data system acquires seismic hydroacoustic and infrasonic waveforms from multiple sources The data quantity is 5 10 gigabytes of data per day arriving in a near continuous fashion The DACS nominally forms intervals of segments of 10 minutes in length However during recovery of a data acquisition system failure the DACS forms inter vals of up to one hour in length The DACS can be configured to form intervals of practically any size This requirement is fulfilled through the DACS
46. Software IDC DOCUMENTATION Distributed Application Control System prie ae acu ER i M 2r 11 5 3 tS mM re du rien dr Mae Ping 1 Approved for public release distribution unlimited Notice This document was published June 2001 by the Monitoring Systems Operation of Science Applications Inter national Corporation SAIC as part of the International Data Centre IDC Documentation Every effort was made to ensure that the information in this document was accurate at the time of publication However infor mation is subject to change Contributors Lance Al Rawi Science Applications International Corporation Warren Fox Science Applications International Corporation Jan W ster Science Applications International Corporation Trademarks BEA TUXEDO is a registered trademark of BEA Systems Inc Isis is a trademark of Isis Distributed Systems Motif 2 1 is a registered trademark of The Open Group ORACLE is a registered trademark of Oracle Corporation SAIC is a trademark of Science Applications International Corporation Solaris is a registered trademark of Sun Microsystems SPARC is a registered trademark of Sun Microsystems SQL Plus is a registered trademark of Oracle Corporation Sun is a registered trademark of Sun Microsystems Syntax is a Postscript font UItraSPARC is a registered trademark of Sun Microsystems UNIX is a registered trademark of
47. System DACS June 2001 IDC 7 3 1 L E Z 2GI Looz SDVq u9js s J01ju05 uomeo2i ddy 9 1151 TABLE 2 OF TUXEDO COMPONENTS TO SAIC DACS COMPONENTS dbserver libipc Tuxedo Data interval recycler WorkFlow dman Component Monitor scheduler schedclient tuxshell router server SendMessage birdie tuxpad tlisten Bs Bs Bs Bs Bs tagen BRIDGE Sn Rn Sn Rn Sn Rn Sn Rn Sn Rn Sn Rn Sn Sn Rn BBL DBBL Ms Ms Mc Ms Ms Ms Mc Mc TMS Mt Mt Mt Mt Mt Mt Mt Mt TMS QM TMQUEUE Eq Eq Dq Eq Dq Eq Eq Eq Eq Eq Dq TMQFOR Fs Fs Fs WARD TMUSREVT Es Es Er IPC resources ubbcon Ds Ds Ds Ds Ds fig tux config user logs Ls Ls Ls Ls Ls Ls Ls Ls transaction Lt Lt Lt Lt Lt Lt Lt Lt logs queue Sq Sq Sq Sq Sq Sq Sq Sq space s1do uo5 A pue sjueuoduio opaxny 91 M3jJOS 493deu5 n n E z 4 gt 4 z Looz L Z 2GI SDV 545 J01ju05 poinquiasig TABLE2 OF TUXEDO COMPONENTS TO SAIC DACS COMPONENTS CONTINUED dbserver libipc Tuxedo Data interval recycler WorkFlow dman Component Monitor scheduler schedclient tuxshell router server SendMessage birdie tuxpad queues Sm Sm Sm Sm Sm Sm Sm Sm tmloadcf tmunloadcf Gc tmadmin Aa Aa qmadmin Aq 1 Interaction Symbol Definitions Bs Boots the server Sn Rn Sends message ov
48. These capabilities shall function in the event of system fail ures The functions of availability management and system monitoring rank next in order of importance The DACS shall continue to perform the first set of functions even if the sec ond set of functions are unavailable for any reason Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software References The following sources supplement or are referenced in this document And96 BEA96 DOD94a DOD94b Gan79 IDC5 1 1Rev2 IDC6 5 1 IDC6 5 2RevO 1 Distributed Application Control System DACS IDC 7 3 1 June 2001 Andrade J M Carges M T Dwyer T J and Felts S D The TUXEDO System Software for Constructing and Managing Distributed Business Applications Addison Wesley Publishing Company 1996 BEA Systems Inc BEA TUXEDO Reference Manual 1996 Department of Defense Software Design Description Military Standard Software Development and Documentation MIL STD 498 1994 Department of Defense Software Requirements Specification Military Standard Software Development and Documentation MIL STD 498 1994 Gane C and Sarson T Structured Systems Analysis Tools and Techniques Prentice Hall Inc Englewood Cliffs NJ 1979 Science Applications International Corporation Veridian Pacific Sierra Research Database Schema Revision 2 SAIC 00 3057 PSR 00 TN28
49. Tuxedo stores the current state of the application One copy of the Bulletin Board is on each machine BBL is Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 3 Tuxedo Components and V Concepts launched on each machine after the BRIDGE has been established It remains in the process table until the application is shut down completely or on the particular machine DBBL generates and manages the Distinguished Bulletin Board which exists only on the Master machine DBBL is launched on the Master machine at boot and remains in the process table until the application is shut down The DBBL keeps all BBLs synchronized so that all machines are in a consistent state across the distrib uted system The DBBL automatically restarts any BBL in the case of a crash or accidental kill The BBL on the Master machine automatically restarts the DBBL upon any failure or crash of the DBBL When the Master machine is properly migrated to the backup Master machine the DBBL is also migrated to the new Master machine Application Servers Application servers are Tuxedo supplied servers which include application level infrastructure and services that are necessary for many distributed processing applications The Tuxedo supplied infrastructure and services include distributed transaction management reliable disk based queuing services and event message passing services TMS TMS OM These
50. Wayshs J01ju05 uomeo2ijddy 9 1151 10 tuxpad includes the five scripts tuxpad operate admin schedule it qinfo and msg window Only qinfo uses qmadmin SAIC supplied DACS servers are started by tlisten via tagent under Tuxedo operator control or under automatic Tuxedo control All servers and clients SAIC or Tuxedo supplied rely upon BRIDGE services for inter machine communication tuxpad scripts execute Tuxedo supplied and DACS supplied utilities and clients but tuxpad scripts are not directly connected to the running Tuxedo applica tion Interaction with the Tuxedo transaction managers is indirect and is handled by Tuxedo on behalf of SAIC DACS components Queuing transaction is applicable only to interval router Enqueue operation is applicable only to interval router The ubbconfig tuxconfig defines IDC servers that are run and managed by the Tuxedo application IDC clients are not defined in application configuration s1do uo5 A pue sjueuoduio opaxny 91 MjJIOS 493deu5 n n E z 4 z IDC DOCUMENTATION Chapter 3 Software Tuxedo Components and Concepts The tlisten process is the parent to all Tuxedo servers its child processes inherit its user identifier UID group identifier GID and environment This feature allows the DACS to run under a distinct UID and environment on each machine provided tlisten is started by the user with this UID in t
51. _server tin_server creates intervals based upon a trade off between data availability and elapsed time Intervals of class T N are inserted into the interval table and the interval information is enqueued into a Tuxedo queue to initiate pipeline process ing The data availability criterion is based upon the number of completed intervals Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design for a given class or group of processing processes 5 7 in Figure 22 The process ing class or group is flexible in that tin server exclusively relies on an integer returned from a user defined SQL query Thus tin server is not concerned with net work or station affiliations and the user defined data count query must map the completion status of the monitored station set or group to an integer number A dedicated instance of tin server is required for each processing group or class for example three hydroacoustic groups require three dedicated tin server instances The data availability versus time criteria are based on two user defined value arrays of equal dimension These arrays define the minimum number of data counts or completions acceptable at a time elapsed relative to present time and the end time of the last interval created In general the data count thresholds reduce and or the data completeness threshold is relaxed as elapsed time increases If sufficient da
52. a ticron server creates network processing intervals on a regular basis and of a fixed size tin server creates intervals of varying type based upon a trade off between data availability and elapsed time WaveGet server initiates pro cessing to acquire auxiliary station waveforms based upon requests for such data Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Detailed Design tis server tis server creates and updates processing intervals of class T S for processing data from continuously transmitting stations tis server forms new candidate intervals based upon the timely arrival of new station data and updates existing intervals that were previously skipped due to incomplete or nonexistent station data The data flow for tis server is shown in Figure 18 tis server creates and updates intervals for all stations specified by the user parameters The candidate interval check attempts to form a new interval for each station where the interval start time and end time are current tis server attempts to form a column of new intervals that would appear on the right side of the WorkFlow display see Figure 27 on page 95 Candidate intervals are stored in a temporary memory based list during each tis server cycle M1 The candidate interval for each station is assessed for data coverage and the interval is created if a sufficient percentage of overlapping station channels has arrived The
53. ability to specify parame ters for variable interval sizes under varying conditions to tis server Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement How Fulfilled 45 2 The data from each source nominally _tis_server can handle all described arrive in piecewise increasing time types of data delivery and can create order Data delivery from an individ intervals in the order of current data ual station may be interrupted and first then resumed Upon resumption of data delivery the data acquisition system may provide current data late data or both Current data resumes with increasing time and late data may fill in a data gap in either increasing FIFO or decreasing LIFO time order from the end points of the time gap 45 3 Data quality is a prime concern of the DACS does not consider data quality IDC mission however the DACS as a criteria for interval creation makes no determination of data qual ity Any data that are available shall be processed Distributed Application Control System DACS IDC 7 3 1 June 2001 Qo Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement The DACS shall interface with the Interactive Processing programs through a m
54. acknowledgement SETTIME command to the scheduler server which results in rescheduling for the next WaveGet server service call Control Tuxedo boots monitors and shuts down the data monitor servers tis server tiseg server ticron server tin server and WaveGet server Server booting is either initiated by an operator directly using Tuxedo administrative commands or indi rectly via tuxpad or automatically via Tuxedo server monitoring During Tuxedo server monitoring servers are automatically restarted upon any failure An operator initiates the server shut down Control of the data monitor server function is largely defined by the user parame ters However the scheduling system enables an operator to start the data monitor service on demand such that a data monitor cycle can be called at any time other wise the data monitor service is automatically called by the scheduling system on a recurring scheduled basis In addition the same interface allows for stalling and unstalling data monitor service requests which results in the ability to control whether or not a data monitor server is active and able to initiate interval creation Interfaces The data monitor servers are database applications which receive input data from the database then exchange or store that data in internal data structures for vari ous types of interval creation algorithms The detailed process or control sequenc ing within each data monitor including inte
55. age 106 dman can encounter many error conditions An example error includes a non exis tent agent parameter specification which prevents dman from running because it does not have a session to which it can connect A non existent OMCONFIG envi ronment variable will similarly result in an immediate failure because this variable is required for message polling One and only one dman per session is permitted and Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design dman defends against this by exiting with a failure message indicating that the ses sion already has an active dman if one exists There are many other types of error conditions that dman attempts to guard against and warn the analyst The dman GUI includes a message window which conveniently presents warning messages and other diagnostics to the analyst birdie directs error messages to the standard error stream which is consistent with most command line driven applications birdie error conditions are all of the ibipc error conditions because birdie is intended to exercise all ibipc API calls tuxpad operate admin schedule it and msg window tuxpad provides a GUI based operator console to simplify operation of the DACS tuxpad satisfies the requirement to provide a convenient centralized operator con sole that can be used by the operator to control all aspects of the running distrib
56. age 62 If the monitor channel is joined by the minimum number of affiliated channels for the auxiliary station a new row with state queued is inserted into the interval table and the interval infor mation is enqueued into a Tuxedo queue process 7 in Figure 20 on page 62 tiseg server generates output to log files the database Tuxedo queues and the scheduler server Output to the database includes new intervals both incomplete interval state partial or complete interval state queued Updates to the database include previously partial intervals updated to queued intervals following the verification of newly arrived data tiseg server updates the timestamp table with the current time to record the most recent time of a successful interval creation by tiseg server Upon interval creation tiseg server enqueues a message containing the interval information into a Tuxedo queue for initiation of a pipeline processing sequence on the interval tiseg server completes its interval creation cycle by send ing an acknowledgement SETTIME command to the scheduler server which results in rescheduling for the next tiseg server service call ticron server Figure 21 on page 64 shows data and processing flow for ticron server ticron server receives input from user defined parameter files the database and the scheduler server The parameter files specify all processing details for a given instance of the data monitor server Details include databas
57. allow a user to add or delete messages from any message queue and to obtain a list of all processes registered to receive messages birdie client which is a driver to test libipc Most of these requirements among others are also fulfilled by the dman client With dman the analyst can delete all messages but not individual messages Message addition is sup ported through message sends from specific Interactive Tools within the interactive session v Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 10 TRACEABILITY OF FUNCTIONAL REQUIREMENTS MESSAGE PASSING CONTINUED Requirement How Fulfilled 19 The DACS shall deliver messages Reliable queue messaging disk and within one second of posting given transaction based messaging within that network utilization is below 10 the DACS can occur at least 10 times percent of capacity per second 20 If the receiving process is not active This requirement is fulfilled via libipc or is not accepting messages the messaging which is based on the DACS shall hold the message indefi Tuxedo reliable queuing service nitely until delivery is requested by the receiving process or deleted by an administrative control process 21 Interactive processing programs may This requirement is fulfilled via libipc request the send or receive of mes messaging which is based on the sages at any time Multiple processes Tuxedo reliable queuing service may sim
58. alyst is principally interested in the review of events formed by Automatic Processing and relies upon the key interactive event review application process 1 which is imple mented by the ARS program In addition interactive review relies on a collection of Interactive Tools that exchange messages The DACS supports asynchronous mes sage passing via the ibipc message passing library The library is based upon Tux edo disk queuing and as such all messages among the Interactive Tools pass through Tuxedo queues The DACS also supports management of the interactive session including the ability to start up and shut down Interactive Tools on demand Interactive session management is implemented by the dman client pro cess 2 For example a message sent from ARS to XfkDisplay via libipc results in both an enqueue of the message to the XfkDisplay queue and an IPC event which is sent to dman by libipc to broadcast the request for XfkDisplay service dman will automatically start the XfkDisplay application process 3 if it is not already run ning dman monitors all messaging among the Interactive Tools as well as the health of all Interactive Tools within the session Interactive tools can be manually started or terminated via the dman GUI interface Access to Automatic Processing is provided to a limited degree Interactive Tools can send messages requesting cer tain Automatic Processing services for interactive recall processing This linkage is
59. ams exactly once for each data element A program execution is a transaction consisting of start run and exit If the transaction aborts before completion of the exit the DACS shall retry the transaction a limited configurable number of times 38 The DACS shall function as a system in the event of defined hardware and software failures The failure model used by the DACS is given in Table 7 For failures within the model the DACS shall mask and attempt to repair the fail ures Failure masking means that any process depending upon the services of the DACS primarily the Automatic and Interactive Processing software remains unaffected by failures other than to notice a time delay for responses from the failed process Failures outside the failure model may lead to unde fined behavior for example a faulty ethernet card is undetectable and unre pairable by software 39 The DACS shall detect failures and respond to failures within specified time limits The time limits are given in Table 7 em DACS v v Chapter 5 Requirements IDC DOCUMENTATION Software 40 The DACS shall detect and respond to failures up to a limited number of fail ures The failure limits are given in Table 7 For failures over the limit the DACS shall attempt the same detection and response but success is not guar anteed 41 Reliability of a system or component is relative to a specified set of failures listed in Table 7 Th
60. anagement shall cre Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 v IDC DOCUMENTATION Chapter 5 Software Requirements ate manage and destroy internal references to data elements The DACS ref erences to data elements are known as intervals The capabilities of the workflow management are enumerated in the following subparagraphs 25 1 The DACS shall provide a configurable method of defining data ele ments The parametric definition of data elements shall include at least a minimum and maximum time range a percentage of data required a list of channels stations and a percentage of channels and or stations required If the data in an interval are insufficient to meet the require ments for an interval then the data element shall remain unprocessed In this case the DACS shall identify the interval as insufficient and pro vide a means for the operator to manually initiate a processing sequence 25 2 The DACS shall provide a configurable method of initiating a workflow sequence The DACS workflow management shall be initiated upon either data availability completion of other data element sequences or the passage of time 25 3 Workflow management shall allow sequential processing parallel pro cessing conditional branching and compound statements 25 4 Workflow management shall support priority levels for data elements Late arriving or otherwise important data elements may be gi
61. and Iddate tables 1 The BRIDGE server is not included or required for stand alone Tuxedo applications because all messaging is local to one machine The current configuration of the DACS for Interactive Processing is standalone and as such the BRIDGE server is not part of the application Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design Filesystem The DACS uses the UNIX filesystem for reading user parameter files writing log files hosting the Tuxedo qspaces and queues as well as the Tuxedo transaction log files The list of libpar based parameter files is extensive and in general each DACS server or client reads one or more parameter files The DACS servers are routinely deployed in various instances that necessitate distinct parameter files based upon the program s canonical parameter files The DACS writes log files at both the system and application level System level log files are written by Tuxedo and one such User Log ULOG file exists per machine System level errors and messages are recorded in these files The individual ULOGS are copied to a central location CLOGS by application level scripts Application level log files are written by DACS servers and clients to record the progress of pro cessing Several special system wide files are required for the DACS These files include Tuxedo transaction log files tlogs qspac
62. and alone although in practice SendMessage is the only can didate for usage outside of WorkFlow WorkFlow is typically configured to run Pro cessInterval upon user selection of interval reprocessing In turn the script builds a SendMessage command line and then runs the command The SendMessage com mand line includes all interval values including class name time endtime state and interval identifier SendMessage attempts to enqueue the interval information into a Tuxedo queue SendMessage is a Tuxedo client application that uses the Tux edo tpenqueue API call to send to the Tuxedo queue processes 6 and 7 in Fig 12 The WorkFlow GUI is not displayed if a fatal error occurs during startup Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design ure 28 SendMessage output is sent to the controlling terminal which is WorkFlow in this case WorkFlow redirects SendMessage output to the WorkFlow message window which reports the results of command Control WorkFlow is an interactive client application and is started and shut down by sys tem operators WorkFlow is primarily designed for monitoring and is therefore pri marily a read only tool However interval reprocessing and other possible write based operations are available As such WorkFlow is typically started via shell scripts that limit access to read only for public monitoring of the automate
63. application servers manage transactions including the create commit roll back abort and timeout transactional commands or elements For each server group the system automatically boots two TMSs Transaction Manager Servers and for the server groups operating on qspaces the system boots two TMS QMs TMS for Queue Management Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 3 Software Tuxedo Components and Concepts TMQUEUE TMQUEUE enqueues and dequeues messages from a qspace for other servers for example for the data monitors Each qspace must have at least one instance of TMQUEUE At least one backup instance of TMQUEUE per qspace is recom mended TMQFORWARD The forwarding agent TMQFORWARD dequeues messages from a specific disk queue and sends them for processing to a server that advertises the corresponding service By convention queue names and service names are identical In the IDC application the servers advertising processing services are various instances of tux shell the general application server tuxshell is discussed in Chapter 4 Detailed Design on page 47 Because TMQFORWARD works in a transactional mode it does not commit to dequeueing messages from a queue until the server signals success Upon any fail ure or if a configured time out value t on the TMQFORWARD command line in the ubbconfig file is reached TMQFORWARD terminates the transact
64. are Chapter 5 60 61 62 63 64 65 66 67 68 Requirements The DACS shall use common UNIX utilities for example cron sendmail and system calls for example sockets exec whenever possible to take advan tage of widespread features that shall aid portability Vendor specific UNIX utilities shall be isolated into separate modules for identification and easy replacement should the need arise The DACS shall implement middleware layers to isolate third party software products and protocol standards The DACS shall implement the functions of workflow management availabil ity management inter process communications and system monitoring as separate stand alone programs The DACS shall use COTS for internal components where practical Practical in this situation means where there is a strong functional overlap between the DACS requirements and COTS capabilities The DACS shall be designed to scale to a system twice as large as the initial IDC requirements without a noticeable degradation in time to perform the DACS functions The DACS requires a capable UNIX system administrator for installation of the DACS components and system level debugging of problems such as file sys tem full insufficient UNIX privileges and network connectivity problems The DACS shall be delivered with a System Users Manual that explains the operations and run time options of the DACS The manual shall also specify all
65. are failed the application within the context of a pipeline processing sequence all within a Tuxedo transaction tuxshell parses the IPC message to retrieve values to build the application program command line to be executed process 2 in Figure 25 on page 85 The IPC mes sage is string based and contains name value pairs in ibpar 3 fashion The values extracted from the message are limited to the name key values that are user defined Typically a station or network name time and endtime will be included in the name key values This is true in general because tuxshell manages the process ing of an application server that operates on an interval of time The elements of the command line are user defined and allow for the substitution of the parsed val ues process 3 in Figure 25 on page 85 The completed command line is executed process 4 in Figure 25 on page 85 and tuxshell then initiates monitoring of the child process Monitoring of the application server includes capturing the exit code of the process if it terminates in a normal manner killing the process if a time out condition arises and detecting an abnormal termination following various UNIX signal exceptions process 5 in Figure 25 on page 85 A normal application program run terminates with an exit code indicating success or failure subject to user specified exit values A successful match of the exit code results in an attempt to forward the processing interval to the nex
66. art of the IDC software and have been developed and supplied by the PIDC The pur pose of this chapter is to describe the basic design of all SAIC developed compo nents Operation of these components is described in IDC6 5 2RevO 1 and man pages describe all parameters that can be used to control and modify functions within the components The first section Data Flow Model gives an overview of the interrelationships between the individual CSCs which are described in detail in the Processing Units section DATA FLOW MODEL In the context of Automatic Processing the DACS includes CSCs for the following functions m Data monitoring m Creation of pipeline processing sequences m Centralized scheduling of the data monitoring servers m Generalized execution and monitoring of Automatic Processing applica tions m Centralized database updates m Host based routing of pipeline processing sequences by data source m Automatic retries of failed pipeline sequences following system level errors m Interactive graphical presentation of all pipeline processing intervals including support for on demand reprocessing Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design CSCs are also included for the operation of the DACS for Automatic Processing via several convenient GUI based operator consoles In the context of Interactive Processing the DACS includes CSC
67. as a pending message process 4 dman tracks all message traffic through Tuxedo IPC events which are automatically broadcast to dman via the libipc message send and receive API calls that the Interactive Tools use Access to Automatic Processing is provided for the purpose of interactive recall processing process 2 and processes 5 7 The TMQFORWARD tuxshell configuration for managing Interactive Pro cessing applications processes 5 7 works in a similar but not identical manner with the DACS for Automatic Processing In Interactive Processing TMQFOR WARD calls a tuxshell server within a transaction however the processing applica tion status success or fail is sent back to the calling client via a libipc message process 6 In addition tuxshel does not attempt an interval state update in the database because this processing is on the fly and is not represented as an interval in the database the calling client ARS does not insert an interval into the data base 3 The interactive session can be managed by the analyst log GUI application not shown in Figure 15 This application manages analyst review by assigning blocks of time to analysts for analysis This applica tion can optionally start dman 4 Thelabel interactive recall processing process 7 in Figure 15 refers generally to the various types of Automatic Processing that are used within Interactive Processing These include interactive beaming BOTF interactive seism
68. ase updates via service calls to dbserver and both operations are part of one transaction Failure of either operation results in a transaction rollback as described above tuxshell generates output to log files the database via dbserver and Tuxedo queues Output to the database includes updates to the interval or request tables Database updates are coupled with enqueues as described above Within the context of the Interactive Processing tuxshell supports all previously described processing with one exception and one addition An IPC request from an Interactive Processing client for example ARS results in tuxshell returning the exit value directly back to the calling client via an IPC message In addition an IPC event is sent to the DACS client dman This IPC event is consistent with IPC mes saging within the interactive where any message send or receive is accompanied by a broadcast to dman notifying this client of each message operation within the interactive session The acknowledgement IPC message and event are not coupled with any database updates via dbserver requests Essentially the application pro gram that is run on behalf of the interactive client is run on the fly and is of inter Distributed Application Control System DACS IDC 7 3 1 June 2001 v Chapter 4 IDC DOCUMENTATION Software W Detailed Design est only to the analyst who owns the interactive session The pipeline operator is not interested i
69. ata continuous data station Continuous Data Subsystem Retrieve auxiliary Subsystem seismic station waveforms waveforms wfdiscs wfdiscs Db Operations intervals workflow monitoring 3 4 wfdiscs intervals Tuxedo for Automatic Processing data monitors intervals automatic pipeline process control Automatic Processing FiGURE 3 DACS APPLICATION FOR AUTOMATIC PROCESSING Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 1 Software Overview In support of Automatic Processing the DACS is a queue based system for sched uling a sequence of automated processing tasks The processing tasks collectively address the mission of the Automatic Processing software while the DACS adds a non intrusive control layer The DACS supports sequential parallel and compound sequences of processing tasks collectively referred to as processing pipelines These processing pipelines are initiated by the DACS data monitor servers which query the database looking for newly arrived data Confirmed data results in new processing intervals that are stored in the database and the DACS queuing sys tem The database intervals record the state of processing and this state is visu ally displayed through the GUI based WorkFlow monitoring application
70. ata interval is processed by a number of application programs in a well defined processing sequence known as a pipe line For example station processing consists of the application programs DFX and StaPro and network processing for SEL1 is comprised of GA DBI GAassoc GAconflict and WaveExpert Figure 11 shows how a pipeline can be constructed The data monitor checks the state of the database and creates intervals and enqueues messages when a suffi cient amount of unprocessed data are present or when some other criterion is ful filled for example a certain time has elapsed Each processing server receives messages from its source queue and spawns child processes that perform the actual processing step in interaction with the database After completion the pro cessing server places a new message in its destination queue which in turn is the source queue for the next processing server downstream and so on until messages finally arrive in the done queue Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design data monitor application program processing server A y Db operations application processing program b server B processing application server C program c FIGURE 11
71. ave changed state as well as new time intervals This current interval information is used to update or add to the hash table Input to WorkFlow via the GUI consists of pointer selections to vertically scroll through the list of timelines horizontally scroll across all time intervals scale the interval history and duration retrieve interval class name times state and interval identifier from a specific brick and reprocess a specific brick if enabled Addition ally GUI input is accessible via pull down menus that enable WorkFlow re initial ization update on demand display of exception failed intervals and display of a color based legend for color state mappings All GUI input results in exercising various control and interface functions that are described in the following sections WorkFlow output is primarily defined by the GUI display and is in part under user control as described above The update cycle is automatic and manual via a menu selection and results in an updated visualization of the hash table WorkFlow diagnostics are sent to the GUI message window at the bottom of the WorkFlow display WorkFlow error messages particularly of the fatal variety are sent to the controlling terminal when the GUI message window is not yet displayed Processinterval and SendMessage are driven by WorkFlow and as such their input is provided by WorkFlow Both the ProcessInterval C Shell script and the SendMessage program can be run st
72. bparagraphs 50 1 The DACS control interface shall allow selection from among the auto matic processing modes listed in Table 6 on page 127 50 2 The DACS control interface shall allow run time reconfiguration of the host computer network Reconfiguration may take the form of added deleted or upgraded workstations The DACS shall allow an operator to dynamically identify the available workstations When a workstation is removed from service the DACS shall migrate all processes on that workstation to other workstations The time allowed for migration shall be the upper run time limit for the Automatic Processing programs In other words running programs shall be allowed to complete before the migration occurs 50 3 The DACS control interface shall allow run time reconfiguration of the DACS programs Reconfiguration shall allow an increase decrease or migration of Automatic Processing programs 50 4 The DACS control interface shall allow access to the availability man ager for starting or stopping individual DACS and Automatic Processing programs 50 5 The DACS control interface shall allow manual processing and repro cessing of data elements through their respective sequences The DACS shall acquire time from a global time service Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 5 Software Requirements CSCI INTERNAL DATA REQUIREMENTS 52 The DACS shall maintain a
73. by the DACS and includes the following topics m Processing Units m Tuxedo Components of DACS Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 3 Tuxedo Components and Concepts PROCESSING UNITS The DACS consists of the COTS software product Tuxedo and SAIC developed components This chapter describes the building blocks of Tuxedo used by the DACS Table 2 maps the Tuxedo components described in this chapter to the SAIC developed components The mapping implies either direct or indirect interaction between the components The type of interaction is specified by a set of symbols that are defined in the table TUXEDO COMPONENTS OF DACS Listener Daemons tlisten tagent Listener daemons are processes that run in the background on each DACS machine Listener daemons are started before and independently of the rest of the distributed application to support the initial application boot on each machine the bootstrapping of the application If an application is distributed like the DACS for automatic processing a Tuxedo daemon tlisten maintains the network connections among the various machines that are part of the application by listening on a particular port One and only one tlisten process must be running on each machine in a distributed application at all times Without tlisten a machine is not accessible for requests to boot servers Distributed Application Control
74. ce with External Users 35 Interface with Operators 35 Chapter 3 Tuxedo Components and Concepts 37 m PROCESSING UNITS 38 m TUXEDO COMPONENTS OF DACS 38 Listener Daemons tlisten tagent 38 Administrative Servers 42 Application Servers 43 IPC Resources 45 Special Files 45 Utility Programs 46 Chapter 4 Detailed Design 47 DATA FLOW MODEL 48 m PROCESSING UNITS 54 Data Monitor Servers 54 Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION scheduler schedclient tuxshell dbserver interval router and recycler server WorkFlow SendMessage and ProcessInterval libipc dman and birdie tuxpad operate admin schedule it and msg window m DATABASE DESCRIPTION Database Design Database Schema Chapter 5 Requirements m INTRODUCTION W GENERAL REQUIREMENTS m FUNCTIONAL REQUIREMENTS Availability Management Message Passing Workflow Management System Monitoring W CSCI EXTERNAL INTERFACE REQUIREMENTS m CSCI INTERNAL DATA REQUIREMENTS m SYSTEM REQUIREMENTS m REQUIREMENTS TRACEABILITY References Glossary Index Distributed Application Control System DACS IDC 7 3 1 June 2001 77 83 89 93 100 110 119 119 122 125 126 126 128 128 129 131 133 134 137 142 142 144 175 G1 IDC DOCUMENTATION Distributed Application Control System DACS FIGURES FIGURE 1 FIGURE 2 FIGURE 3 FIGURE 4 FIGURE 5 FIGURE 6 FIGURE 7 FIGURE 8 FIGURE 9
75. construction of the hash table can be expensive but the initialization or start up delay is still bounded by the database select on the interval or request table 11 O notation or order notation is used to quantify the speed characteristics of an algorithm For example a binary search tree would be O log gt or on order log base two search time O 1 implies direct lookup which is optimal Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software D user parameters Process interval Db operations FIGURE 28 Send message update interval gt Main Driver Detai operator intervals requests Db operations DFX Distributed Application Control System DACS IDC 7 3 1 June 2001 hash table of M1 time intervals timeline widgets Build timeline widgets Py Display intervals and GUI Chapter 4 led Design intervals requests WoORKFLOW DATA FLOW v IDC DOCUMENTATION Chapter 4 Software Detailed Design The hash table is updated during every WorkFlow update cycle The WorkFlow update cycle consists of interval database queries where the select is confined to all intervals of interest that have moddate values within the previous 5 minutes Retrieved rows include time intervals that h
76. cted to the tuxpad message GUI window as described above em DACS v IDC DOCUMENTATION Chapter 4 Software W Detailed Design The GUls for tuxpad schedule it qinfo and msg window are implemented using the Tk windowing toolkit which is accessible via the interpreted Perl Tk scripting language The GUI design and layout relies upon widgets for a main form upon which buttons scroll bars text lists and text input boxes are constructed in GUI widget hierarchy specific to each script GUI Control tuxpad is typically started by the operator and usually through a system wide start script such as start Tuxpad tuxpad should be run on the THOST for complete access to all features and must be run as the Tuxedo DACS user UID that has per mission to run the commands tmadmin qmadmin and so on qinfo can be run stand alone or more typically is started by tuxpad following operator selection of the info button tuxpad takes care to remote execute ginfo on the QHOST machine which is essential because the qmadmin command must be run on the QHOST schedule it can also be run stand alone but is also usually run following operator selection of the Scheduler button The same holds true for msg window which is displayed following operator selection of the Msg Window button All tux pad scripts are terminated following operator selection of the Exit buttons on each respective GUI Interfaces Data exchange among tuxpad operate admin
77. ctional categories of availability management message passing workflow management system monitoring and reliability Availability Management Availability management refers to the availability of UNIX processes An availability manager is a service that starts and stops processes according to predefined rules and on the fly operator decisions The rules usually specify a certain number of processes to keep active if one should terminate then a replacement is to be started 9 The DACS shall be capable of starting and stopping any configured user level process on any computer in the IDC LAN The DACS shall provide an interface to an operator that accepts process control commands A single operator interface shall allow process control across the network 10 The DACS shall maintain start and restart a population of automated and interactive processes equal to the number supplied in the DACS configuration file The DACS shall also monitor its internal components and maintain them as necessary 11 The DACS shall start and manage processes upon messages being sent to a named service If too few automated processes are active with the name of the requested service the DACS shall start additional processes up to a limit Distributed Application Control System DACS 128 June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 12 13 14 15 16 Requirements that have been configured to provide that servic
78. ctural Design V Db operations wfdiscs intervals intervals 2 3 processing resource allocation server 4 data monitor workflow monitor scheduling server automatic reprocessing of failures due to system errors reprocessing of failures under operator control Tuxedo gt queues interval Tuxedo eim generalized database 2 processing forwarding server server agent v Db operations data processing application program Tuxedo queues transactions process monitoring operator console FIGURE 6 CONCEPTUAL DATA FLOW OF THE DACS FOR AUTOMATIC PROCESSING Interval data are reliably stored in Tuxedo disk queues which will survive machine failure The data monitor servers can enqueue the interval data directly into a Tux edo queue where the queue name is user defined Optionally a processing resource allocation server can enqueue interval data into one queue from a set of Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 2 Software W Architectural Design possible queues the selection being a function of the interval type or name pro cess 3 in Figure 6 A Tuxedo queue forwarding server dequeues the interval data from a Tuxedo queue within a transaction proc
79. cupy the space on the right The WorkFlow design enables convenient scaling of the amount of interval information displayed on screen The horizontal pixel size of each time block is reduced or enlarged depending on the number of intervals dis played The GUI based controls enable the operator to adjust the history or num ber of intervals hours and duration which is essentially the horizontal size of each WorkFlow brick A requirement also exists to enable the operator to reprocess any interval via GUI control Intervals eligible for reprocessing are defined via user parameters and are typically limited to intervals with state s that define a terminal condition such as failed error or even done success SendMessage enables interval reprocessing by translating database interval information into a Tuxedo queue based message and then routing the message to a Tuxedo queue to initiate pipeline processing for the desired interval Processinterval is a shell script that facilitates linking WorkFlow SendMessage Distributed Application Control System DACS o June 2001 IDC 7 3 1 L Z 2GI Looz Sdwa 545 J 011u0 uomneo2ijddy poinqisig Intervals have completed sta tion processing but are too late for inclusion in SEL1 Skipped intervals with 0 80 percent of waveform data are not queued for processing lookback control nn Pn 7 eee 3 len messen ee ee es
80. d control its forwarding function generalized process DACS application server tuxshell that is the interface between ing server the DACS and the Automatic Processing software It executes application programs as child processes instance Running computer program An individual program may have multiple instances on one or more host computers Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software About this Document TABLE IV TECHNICAL TERMS CONTINUED Term LMID Description Logical machine identifier the logical reference to a machine used by a Tuxedo application LMIDs can be descriptive but they should not be the same as the UNIX hostname of the machine Master machine Machine that is designated to be the controller of a DACS Tux edo application In the IDC application the customary logical machine identifier LMID of the Master is THOST message interval Entry in a Tuxedo queue within the qspace referring to rows in the interval or request tables The DACS programs ensure that interval tables and qspace remain in synchronization at all times message queue Repository for data intervals that cannot be processed immedi ately Queues contain references to the data while the data remains on disk partitioned State in which a machine can no longer be accessed from other DACS machines via IPC resources BRIDGE and BBL qspace
81. d data availability form data intervals and initiate a sequence of general processing servers when a suffi ciently large amount of unprocessed data are found dequeue Remove a message from a Tuxedo queue Glossary detection Probable signal that has been automati cally detected by the Detection and Fea ture Extraction DFX software DFX Detection and Feature Extraction DFX is a programming environment that exe cutes applications written in Scheme known as DFX applications diagnostic Pertaining to the detection and isolation of faults or failures disk loop Storage device that continuously stores new waveform data while simulta neously deleting the oldest data on the device DM Data monitor dman Distributed Application Manager This software element of the DACS manages the availability execution of processes E enqueue Place a message in a Tuxedo queue IDC DOCUMENTATION Software F failure Inability of a system or component to perform its required functions within specified performance requirements forwarding agent Application server TMQFORWARD that acts as an intermediary between a mes sage queue on disk and a group of pro cessing servers advertising a service The forwarding agent uses transactions to manage and control its forwarding func tion G GA Global Association application GA asso ciates S H I phases to events generalized processing serve
82. d pipe line processing system and allows full access for the pipeline operators WorkFlow start shell scripts also exist for convenient monitoring of the request table The ProcessInterval shell script is run by WorkFlow as described in the previous sec tion The SendMessage application is run by Processinterval in the WorkFlow con text also described above The SendMessage client can be run stand alone and usage is similar to any standard command line application except that as a Tuxedo dient application SendMessage must be run on an active Tuxedo host Interfaces The WorkFlow GUI is designed around the expectation of a relatively high perfor mance graphical subsystem that is accessible through a high level programming interface that likely includes an abstract class based GUI toolkit The GUI toolkit should enable extension so that new GUI components can be created if required for unique feature requirements speed or implementation convenience WorkFlow is currently implemented using the X11 Window System using the Xlib Xt and Motif toolkits and libraries The GUI design and layout relies upon widgets for a graphical canvas main form upon which pull down menus scroll bars scale bars a message window and the main form windows for brick and class name display can be constructed in one GUI widget hierarchy The displayed timelines are handled via a custom timeline widget that controls display and management of each brick on the time
83. d to establish the times of the next interval Distributed Application Control System DACS Q June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design The tin server interval creation algorithm creates a timely or current interval The interval is complete if a sufficient number of data counts versus elapsed time can be confirmed The interval is unresolved if insufficient data counts are present but elapsed time has not run out The interval is incomplete if insufficient data counts cannot be confirmed following a maximum user defined time lapse All time based comparisons are relative to the present time and the end time of the last interval created tin server computes the start time for the current interval as a function of the last interval created a value from the timestamp table and a user defined look back value process 3 in Figure 22 on page 66 The timestamp value and lookback value are generally only relevant if no previous intervals exist in the database such as the case upon system initialization when a new system is run for the first time The end time is computed as a function of the user defined values for target inter val size and time boundary alignment The latter feature allows for interval creation that can be snapped to a timeline grid such that intervals fall evenly on the hour the selected minute interval process 4 in Figure 22 on page 66 Having estab lished the candidate interval star
84. dMessage errors should only occur if the Tuxedo queuing is not available or the Tuxedo qspace is full both of which would be indicated in the GUI message window libipc dman and birdie libipc and dman satisfy requirements for DACS support of distributed asynchro nous messaging between Interactive Tools management of an interactive session through the monitoring of messages and Interactive Tools within the session and execution of Interactive Tools on demand All Interactive Tools for example ARS dman and XfkDisplay link to and use libipc for message passing and session man agement Distributed Application Control System DACS Q June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design libipc consists of an Application Programming Interface API or library of routines which enable reliable distributed asynchronous messaging and message and client monitoring within an interactive session dman is a GUI based interactive client with special bindings to the ibipc library to enable session monitoring and man agement birdie is a command line based application which is primarily intended as a test driver to exercise the ibipc API birdie permits arbitrary access to all ses sion level functions for example delete a message in a queue and as such can be used by operators either directly or via embedding in scripts to perform certain manipulations on queries Figure 15 on page 53 shows the data flow
85. didate intervals LLI LLI new intervals after wfdisc check Ll Setback Time max end time current time current from interval setback time inteval ble TTT new intervals FiGURE 19 CURRENT DATA AND SKIPPED INTERVAL CHECKS Candidate intervals that were not enqueued for processing by tis server because the threshold value was not exceeded are known as skipped intervals However late arriving data may complete an interval and tis server may check the data con tents of all skipped intervals light gray bricks see Skipped Interval Check in Fig ure 19 to see if enough data have been received to surpass the threshold percentage black bars see Skipped Interval Check in Figure 19 If a skipped interval for which the threshold percentage has been exceeded is found interval state is updated to queued yellow bricks new intervals after wfdisc check see Skipped Interval Check in Figure 19 and a corresponding message is enqueued into a Tuxedo queue Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design tis server can create new intervals or update previously skipped intervals based only upon the addition of other intervals in the database Therefore tis server is not necessarily dependent on wfdiscs More generally tis server requires start time and end time data The start time and end time could be related to database wfdiscs or just as eas
86. dminis trative recovery or cleanup from machine failures is accomplished through tmad min executions using the tmadmin pclean and bbclean sub commands The tuxpad output outside the main GUI window consists of output messages and errors gen erated by the executed commands for example tmboot qinfo schedule it The output from the commands is captured by tuxpad and redirected to the tuxpad temporary output file that is written to tmp tuxpad tuxpad pid on the local machine The output is visible to the operator provided the msg window script is running so that the message window GUI is displayed Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Distributed Application Control Syst IDC 7 3 1 June 2001 Detailed Design qinfo receives input from user parameters the qmadmin utility following execution of the command and the user via GUI selections see Figure 30 on page 114 The user parameters define the QHOST and qspace name that is to be opened and queried Parameters also define the complete list of queues that are to be queried and parsed to determine the current number of messages stored in each queue This list includes specification of the color graph that is output by qinfo in the GUI User input is limited to control of the vertical scroll bar which enables output of any queue plots that are not presently visible on screen qinfo errors are directed to
87. e If an interactive process is not active the DACS shall start a single instance of the application when a message is sent to that application The DACS shall be fully operational in stop mode within 10 minutes of net work boot The DACS shall detect process failures within 30 seconds of the failure and server hardware failures within 60 seconds The DACS shall start new processes and replace failed processes within five seconds This time shall apply to both explicit user requests and the automatic detection of a failure The DACS shall be capable of managing starting monitoring terminating 50 automated and interactive processing programs on each of up to 50 comput ers The DACS shall continue to function as an availability manager in the event of defined hardware and software failures Reliability on page 134 specifies the DACS reliability and continuous operations requirements Message Passing Message passing in the context of the DACS refers to the transmission of messages between cooperating interactive applications Message passing is a service pro vided by the DACS to processes that operate outside the scope of the DACS The DACS does not interpret or otherwise operate on the message 17 The DACS shall provide a message passing service for the interactive process ing system The message passing service shall have the attributes of being reli able asynchronous ordered scoped point to point and location trans
88. e 19 shows the logic used to form intervals for cur rent data and check for skipped data Candidate intervals of user specified length are formed by tis server between the end of the last existing time interval in the interval table yellow brick see Current Data in Figure 19 and the end of the newest data record in the wfdisc table black bars see Current Data in Figure 19 for a particular station white brick candidate intervals see Current Data in Fig ure 19 These intervals are inserted into the interval table by tis server see Cur rent Data in Figure 19 A skipped interval is created only if a queued interval exists or has been confirmed later in time than the skipped interval That is a skipped interval is never a leading interval As a result a skipped interval following a station outage only appears after the station resumes transmitting data which results in one or more new queued intervals Distributed Application Control System DACS IDC 7 3 1 June 2001 v Chapter 4 Detailed Design IDC DOCUMENTATION Software max end time end time current Current Data from interval from wfdisc time interval table TT TTT wfdisc table NEN candidate intervals new intervals after wfdisc check BELT E max end time end time current Skipped Interval Check from interval from wfdisc time interval table D I wfdisc table NENNEN can
89. e 50 The general ized processing server tuxshell calls one or more processing applications for exam ple DFX to send the processing interval to the desired requested automatic processing task process 8 in Figure 14 on page 50 tuxshell manages the execu tion of the processing task handling a successful or failed run Failed processing of an interval results in a transaction rollback of the queue message by TMQFOR WARD TMQFORWARD initiates reprocessing of the interval which repeats the queue forwarding sequence processes 5 8 in Figure 14 on page 50 Successful processing of an interval results in an enqueue of an updated message into another downstream Tuxedo queue for example StaPro and a transactional commit of the original queue message dequeued by TMQFORWARD The downstream Tux edo queue manages the next step in the pipeline processing sequence which duplicates the queue forwarding sequence processes 5 8 in Figure 14 tuxshell updates the interval data in the database by sending an updated interval state to dbserver which in turn issues the actual database update command to the ORA CLE database process 7 in Figure 14 on page 50 Queue intervals that failed due to system errors for example a machine crash and have been directed to a sys tem wide error queue are automatically recycled back into the appropriate Tuxedo message queue by recycler server process 11 in Figure 14 on page 50 Figure 15 shows the data flow amon
90. e 68 shows data and processing flow for WaveGet server WaveGet server receives input from user defined parameter files the database and the scheduler server The parameter files specify all processing details for a given instance of the data monitor server Details include database account state names used for query and update of the request table database queries and val ues for sorting and managing the requests The user parameters are used to con struct the recurring database queries to determine if any requests should be passed to the messaging system or if any requests should be declared failed and aborted so that no further data requests are attempted In standard mode processing WaveGet server considers recent requests subject to the three factors maximum lookback current time and time of last run Determi nation of the time interval is a function of a user specified maximum lookback cur rent time and the most recent run of the WaveGet server cycle which is recorded in the timestamp table process 2 in Figure 23 on page 68 The time interval or time period of interest is inserted into a user specified request query which retrieves all requests process 3 in Figure 23 on page 68 The user specified query is purposely flexible so that any practical query filters or clauses can be applied The retrieved requests are sorted according to four search criteria including a user specified priority and the request s transfer method name
91. e account class and size of target intervals to be created for example SEL1 20 minutes database queries and time based interval creation values for example the setback time The user parameters are used to construct the recurring database queries to deter mine the time and duration of the last interval class created Initial database input to ticron server includes timestamp and interval information which is used to build new time interval s depending on when the last interval was created and the cur rent time Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design ticron server processing is straightforward and creates intervals as a function of time The ticron server interval creation algorithm includes determination of a start time for the next interval it will create This start time is a function of the most recent end time of the last created interval value noted optionally in the timestamp table process 2 in Figure 21 on page 64 Associated end times for each interval are computed as a function of the target interval size and a user defined time set back value process 3 in Figure 20 on page 62 One or more intervals are created by ticron server depending on whether the computed new interval of time exceeds the target interval length process 4 in Figure 21 on page 64 Completed inter val s are written to the database and then enqueued in
92. e data in an interval are insufficient to meet the requirements for an interval then the data element shall remain unprocessed In this case the DACS shall identify the interval as insufficient and provide a means for the operator to manually initiate a processing sequence 25 2 The DACS shall provide a config This requirement is fulfilled by the urable method of initiating a work DACS data monitor servers and the flow sequence The DACS workflow ability to specify the required param management shall be initiated upon eters related to interval creation either data availability completion of other data element sequences or the passage of time N N N Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 5 Requirements TABLE 11 TRACEABILITY OF FUNCTIONAL REQUIREMENTS N w WORKFLOW MANAGEMENT CONTINUED Requirement Workflow management shall allow sequential processing parallel pro cessing conditional branching and compound statements How Fulfilled This requirement for sequential pro cessing and compound processing is fulfilled by the DACS process sequencing function TMQFOR WARD and tuxshell s Distributed parallel processing is achieved in part by configuring or replicating like serv ers across machines and or across processors within a machine Parallel processing pipelines or sequences and conditional branching ar
93. e files and the Tuxedo system configura tion file ubbconfig which defines the entire distributed application at the machine group server and service level UNIX Mail The DACS relies upon mail services for automatic email message delivery to system operators when the pending messages overflow in Tuxedo queues FTP The DACS does not directly use or rely upon File Transfer Protocol FTP Web A Web and Java based Tuxedo administration tool is available for administration of the DACS However this tool is not used because the custom DACS operator console tuxpad is preferred over the Tuxedo Web based solution Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design Design Model The design of the DACS is primarily determined by the fault tolerance and reliabil ity requirements previously described This section presents a detailed description of some of the key design elements related to the DACS servers and services namely reliable queuing transactions fault tolerant processing via backup servers and queue based pipeline processing for Automatic Processing Figure 8 shows the logical relations between message queue service server and host The message queue A contains a number of requests for service A for example data intervals to be processed by the application program DFX On three different hosts physical UNIX machines host 1
94. e first column indicates the types of failures that the DACS shall detect and recover from The second column lists the maximum rate of failures guaranteed to be handled properly by the DACS however the DACS shall strive to recover from all errors of these types regardless of frequency The third column lists the upper time bounds on detecting and recovering from the indicated failures Again the DACS shall strive to attain the best pos sible detection and recovery times TABLE 7 FAILURE MODEL Maximum Time to No Failure Type Maximum Failure Rate Recover 41 1 workstation crash failure per hour 60 seconds for detec non overlapping tion and 5 seconds to initiate recovery 41 2 process crash failure five per hour onset at 5 seconds for detection least 5 minutes apart and 5 seconds to initiate recovery 41 3 process timing failure all five per hour onset at 5 seconds for detection but interactive applica least 5 minutes apart and tions 5 seconds to initiate recovery 41 4 process timing failure not detectable user detection and interactive applications recovery 41 5 all others undefined undefined Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements CSCI EXTERNAL INTERFACE REQUIREMENTS The DACS shall have four direct external interfaces and shall operate on an assumed model of data availability The interfaces are specified in the following parag
95. e ful filled through the use of data monitor servers Data monitor server instances create new pipeline sequences as a function of specified availability crite ria As such parallel pipelines are bro ken or decomposed into multiple sub pipelines where each sub pipe line is created by a specific data mon itor server instance There is no supported mechanism within Tuxedo DACS to specify and process a com plex pipeline processing sequence as one parameter or one process sequence expression or function a Workflow management shall support priority levels for data elements Late arriving or otherwise important data elements may be given a higher pri ority so that they receive priority ordering for the next available Auto matic Processing program Within a single priority group the DACS shall manage the order among data ele ments by attributes of the data including time and source and by attributes of the interval including elapsed time in the queue The order ing algorithm shall be an option to the operator This requirement is fulfilled via the DACS data monitor support for prior ity based queuing and related sup port for interval creation that gives preference to late arriving or other wise important data Operator access to this support is through data monitor parameter files Distributed Application Control System DACS IDC DOCUMENTATION Software Distributed Application Control System DACS
96. e most failure conditions as discussed previously No time limits have been specified but the DACS can be configured to service most failure conditions and recover from them in less than 10 seconds v v Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 13 TRACEABILITY OF FUNCTIONAL REQUIREMENTS RELIABILITY CONTINUED Requirement The DACS shall detect and respond to failures up to a limited number of failures The failure limits are given in Table 7 For failures over the limit the DACS shall attempt the same detec tion and response but success is not guaranteed How Fulfilled This requirement is fulfilled via the DACS ability to survive most failure conditions as discussed previously Or if this requirement refers to appli cation failures these failures are han dled as described by tuxshell Reliability of a system or component is relative to a specified set of failures listed in Table 7 The first column indicates the types of failures that the DACS shall detect and recover from The second column lists the maxi rate of failures guaranteed to be handled properly by the DACS however the DACS shall strive to recover from all errors of these types regardless of frequency The third col umn lists the upper time bounds on detecting and recovering from the indicated failures Again the DACS shall strive to attain the best possible detection and recovery times This requir
97. e processing sequence or receive the message from another tuxshell if within a compound tuxshell pro cessing sequence Extract certain parameters from the message for example time end time and station name for a processing interval Use these parameters to create a command line that calls an application program and contains a set of parameters parameter files Spawn a child process by passing the command line to the operating sys tem Update the appropriate row in the interval or request table to status xxx started with the name of the application program replacing xxx Monitor the outcome of processing and if successful as determined by the child process s return code enqueue a message into the next queue in the processing sequence and update interval state to done xxx or call another specified tux shell in the case of a compound tuxshell processing sequence in case of failure as determined by the child process s return code requeue the message into the source queue update interval state to retry and increment the retry count or if the retry count has been exceeded place the message in the failed queue and update inter val state to failed xxx v IDC DOCUMENTATION Chapter 4 Software W Detailed Design in case of time out as determined by the processing time exceeding a configured value kill the child process requeue the message into the source queue update interval state to retry and incre
98. e terms are also included in the Glos sary which is located at the end of this document TABLE 1 DATA FLOW SYMBOLS Description host computer Symbol process D external source or sink of data left duplicated external source or sink of data right data store left duplicated data store right D D disk store Db database store MS mass store queue control flow nn gt data flow gt decision lt gt 1 Symbols in this table are based on Gane Sarson conventions 79 Distributed Application Control System DACS IDC 7 3 1 June 2001 o IDC DOCUMENTATION Software W About this Document TABLE ENTITY RELATIONSHIP SYMBOLS Description Symbol One A maps to one B A B One A maps to zero or one B O B One A maps to many Bs B One A maps to zero or many Bs A O B database table o primary key lt foreign key attribute 1 attribute 2 attribute n Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software database table and column when written in the dot About this Document TABLE 11 TYPOGRAPHICAL CONVENTIONS Element Font Example database table bold interval interval state notation database columns italics state processes software units tuxshell and libraries us
99. e to which messages will be recycled TMQFORWARD monitors the error queue and sends any available mes sages in that queue to recycler server recycler server extracts the source service name which is the queue name from the interval message Like interval router the source service name is extracted by the Tuxedo FML32 library recycler server resets the failure count and timeout count to zero by updating the corresponding fields in the interval message This is done because the recycled message is intended for retry as if it were a new interval with no previous failed attempts recycler server then attempts to enqueue the revised interval message to the origi nating queue recycler server returns a success or failure service call return value to the calling TMQFORWARD depending on the status of the enqueue operation recycler server logs all routing progress to the user defined log file Control Tuxedo controls the start up and shut down of dbserver interval router and recycler server because dbserver interval router and recycler server are Tuxedo application servers However dbserver interval router and recycler server can also be manually shut down and booted by the operator Tuxedo controls all actual pro cess executions and terminations Tuxedo also monitors the servers and provides automatic restart upon any unplanned server termination Interfaces Operators can assist in the control of dbserver interval router and recycler s
100. e workflow manage ment 30 2 The same display shall provide a summary that indicates the processing sequence completion times for all intervals available to Interactive Pro cessing that is more recent than the last data migration 31 The DACS shall provide a graphical display of the status of message passing with each Interactive Processing program The status shall indicate the interac tive processes capable of receiving messages and whether there are any mes sages in the input queue for each receiving process 32 The DACS displays shall remain current within 60 seconds of actual time The system monitoring displays shall provide a user interface command that requests an update of the display with the most recent status 33 The DACS run time status display shall be capable of displaying all processes managed by the availability manager The DACS message passing display shall be capable of displaying the empty non empty message queue status of all processes that can receive messages The DACS workflow management dis play shall be capable of displaying all intervals currently managed by the workflow management 34 The DACS shall provide these displays simultaneously to 1 user although efforts should be made to accommodate 10 additional users 35 The DACS shall continue to function as a system monitor in the event of defined hardware and software failures The DACS reliability and continuous operations requirements are described i
101. each active Automatic Processing program there can be up to fifty processes per computer The size and composition of an interval is left as a detail internal to the DACS 27 The DACS shall continue to function as a workflow manager in the event of defined hardware and software failures The DACS reliability and continuous operations requirements are specified in Reliability on page 134 System Monitoring System monitoring in the context of the DACS refers to monitoring of DACS related computing resources System monitoring does not include monitoring of operating systems networks or hardware except for the detection and workaround of computer crashes 28 The DACS shall provide system monitoring for computer status process sta tus workflow status and the message passing service 29 The DACS shall monitor the status of each computer on the network and the status of all computers shall be visible on the operator s console current to within 30 seconds 30 The DACS shall provide an interface to indicate the run time status of all pro cesses relevant to Automatic Processing and Interactive Processing This set of processes includes database servers and DACS components Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 5 Software Requirements 30 1 The DACS shall provide a display indicating the last completed auto matic processing step for each interval within th
102. ed by management of the ORACLE database connection such that a temporary disconnect or failure can be retried after a wait period kFlow SendMessage and Processinterval WorkFlow provides a graphical representation of time interval information in the system database the interval and request tables WorkFlow satisfies the system requirement to provide a GUI based operator console for the purpose of monitor ing the progress of all automated processing pipelines in real or near real time The current state of all automated processing pipelines is recorded in the state column of each record in the interval and in the status column of the request database table WorkFlow visualizes the Automatic Processing pipeline and progress of analyst review by displaying rows or timelines organized by pipeline type or class for example TI S time interval by station and processing name or station for exam Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design ple ARCES seismic station Figure 27 Each horizontal timeline row is composed of contiguous time interval columns or bricks The WorkFlow brick is colored according to the interval state where the mapping between state and color is user defined The timeline axis is horizontal with the current time GMT on the right side All interval bricks shift to the left as time passes and newly created intervals oc
103. ed on the Processinterval script and SendMes sage client Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement How Fulfilled 51 The DACS shall acquire time from a This requirement is not met The global time service DACS relies upon external support for clock synchronization for example system cron jobs which attempt to synchronize all machines clocks on the LAN once per day Or the DACS relies on the database server for a sin gle source of time However the DACS uses both methods for time synchronization without a consistent criterion TABLE 15 TRACEABILITY OF CSCI INTERNAL DATA REQUIREMENTS Requirement How Fulfilled 52 The DACS shall maintain a collection This requirement is fulfilled by various of intervals data element references DACS elements including the data and shall update the status of inter monitor servers tuxshell and vals in the interval database table dbserver TABLE 16 TRACEABILITY OF SYSTEM REQUIREMENTS Requirement How Fulfilled 53 The implementation of the DACS This requirement is fulfilled by the shall allow for configuration data of ubbconfig file and parameter files the number and type of computers for each DACS application on the network and the number of automated processes of each type allowed to execute on each co
104. ee eee eos 21 horizontal scale control FIGURE 27 MONITORING UTILITY WORKFLOW timeline for network processing timelines for station processing uSisog A 91 M110S ip n fo n 2 4 gt fo z IDC DOCUMENTATION Chapter 4 Software W Detailed Design Input Processing Output WorkFlow receives input from three sources including user parameters the data base and the user via manipulations and selections of the GUI The user parame ters specify database account values query options and definitions for all classes and names of time intervals that WorkFlow will monitor WorkFlow maintains an internal table of all time intervals The size of the table can be significant because WorkFlow is required to display tens of thousands of bricks which can span a number of timelines easily 100 and hundreds of intervals on each timeline Access to the table for interval updates must be fast enough to avoid interactive response delays in the GUI To meet these requirements Work Flow is designed around a hash table which achieves O 1 based access for nearly instantaneous specific interval recall The hash table is shown as an internal data structure M1 in Figure 28 The hash table is built during WorkFlow initializa tion where all time intervals subject to a user specified time lookback are retrieved from the database The
105. ement is fulfilled via the DACS ability to survive most failure conditions as discussed previously workstation crash failure Maximum Failure Rate one per hour non overlapping Maximum Time to Recover 60 sec onds for detection and 5 seconds to initiate recovery This requirement is fulfilled via the DACS ability to survive a workstation failure subject to the DACS being configured with sufficient backup servers to survive a single machine failure The specified detection and recovery times can be met through configuration of the ubbconfig file Is prs N process crash failure Maximum Failure Rate five per hour onset at least 5 minutes apart Maximum Time to Recover 5 sec onds for detection and 5 seconds to initiate recovery Same as above but at the process level Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements TABLE 13 TRACEABILITY OF FUNCTIONAL REQUIREMENTS RELIABILITY CONTINUED Requirement How Fulfilled 41 3 process timing failure all but interac the extent that this requirement tive applications refers to process time outs it is ful Maximum Failure Rate five per hour filled through tuxshell s support for onset at least 5 minutes apart child process time out management Otherwise all process failures are Maximum
106. er defined arguments and variables used in parameter par files or program com mand lines COTS BEA Tuxedo supplied titles of documents server software all CAPS target interval size DACS Software User Manual Distributed Application Control System BRIDGE computer code and output courier filenames directories and web sites text that should be typed in exactly as shown TABLE IV TECHNICAL TERMS Term Description admin server interval by wfdisc src distributed src tis man tis server Tuxedo server that provides interprocess communication and maintains the distributed processing state across all machines in the application Admin servers are provided as part of the Tux edo distribution application DACS Tuxedo System of cooperating processes configured for a specific func tion to be run in a distributed fashion by Tuxedo Also used in a more general sense to refer to all objects included in one particu lar ubbconfig file machines groups servers and associated shared memory resources qspaces and clients application server Distributed Application Control System DACS IDC 7 3 1 June 2001 Server that provides functionality to the application v W About this Document IDC DOCUMENTATION Software TABLE IV TECHNICAL TERMS CONTINUED Term backup component Description System component that is provided redundantly Backups exist on the machine
107. er describes the architectural design of the DACS including its conceptual design design decisions functions and interface design m Chapter 3 Tuxedo Components and Concepts This chapter describes key software components and concepts of the Transactions for UNIX Extended for Distributed Operations Tuxedo COTS software product used by the DACS m Chapter 4 Detailed Design This chapter describes the detailed design of the SAIC supplied Distrib uted Processing CSCs including their data flow software units and database design m Chapter 5 Requirements This chapter describes the general functional and system requirements of the DACS m References This section lists the sources cited in this document m Glossary This section defines the terms abbreviations and acronyms used in this document m Index This section lists topics and features provided in the document along with page numbers for reference Distributed Application Control System DACS Q June 2001 IDC 7 3 1 IDC DOCUMENTATION Software About this Document V Conventions This document uses a variety of conventions which are described in the following tables Table shows the conventions for data flow diagrams Table II shows the conventions for entity relationship diagrams Table lists typographical conven tions Table IV explains certain technical terms that are unique to the DACS and are used in this document For convenience thes
108. er network for server Receives message via network for server Ms Mc Monitors the server with process management Monitors the client with no process management Mt Manages servers and clients queue transactions Eq Dq Enqueues message for server or client dequeues message for server or client Fs Fowards queue based service call within a queue based transaction Es Er Sends event message for client or server Receives event message for client or server Sends receives and stores local messages and state for server and client using IPC resources Ds Defines server to the application in the ubbconfig tuxconfig files Ls Logs system level server or client messages to disk Lt Logs server and client transactions to disk Sq Stores servers and clients queues Sm Stores server and client queue messages Gc Generates text version of system configuration that can be parsed for current state of servers machines and so on Aa Administers the application including starting stopping and monitoring servers and machines Aq Administers Tuxedo queuing 2 Data Monitors include five servers tis server tiseg server ticron server tin server and WaveGet server 3 Only SendMessage interacts directly with Tuxedo WorkFlow is strictly a database application s1do uo5 pue szuauodwo gt A zg 193deu5 91 M31J0S 2 2 gt 4 gt 4 gt L E Z 2GI Looz Sdwa
109. erver scheduler schedclient ticron server liseg server dequeue under global transaction rollback possible gt _ enqueue dequeue synchronous service call QSPACE scheduler 3 queues Q1 schedule service state table Q2 sched command server commands Q3 sched result command results Q3 sched result asynchronous service call FIGURE 24 SCHEDULING SYSTEM DATA FLOW Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 4 Software W Detailed Design The state queue consists of one and only one queue element the scheduler state this is the key to the fault tolerant design TMQFORWARD starts a transaction step 2 and then dequeues and forwards the queue message the state to one of the scheduler servers running on any of several hosts step 3 It does not matter which scheduler server receives the call because all servers are equally stateless until they are passed state within the global transaction If one or more commands exist in the command queue they are dequeued step 4 and applied to the scheduler state resulting in an updated state This updated state is requeued into the state queue step 7 At this point the state queue tech nically has two queue elements in it the previous and the updated scheduler state However neither queue element is v
110. erver by using the Tuxedo command line administration utilities directly or indirectly via tuxpad Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Wor Detailed Design Error States dbserver interval router and recycler server can fail during start up if the user parameter file is non existent or contains invalid settings Start up errors are recorded in the local Tuxedo ULOG file of the machine hosting the failed server Service failure including database submit failure in the case of dbserver or enqueue failures the case of interval router and recycler server result in failure return codes to the calling servers as described above In each case the calling server handles these service failures These application servers benefit from server replication wherein a given server instance can be replicated across more than one machine In this scenario recovery from any server or machine failure is seamless because the replicated server takes over processing Tuxedo recovers the failure of a dbserver interval router or recycler server due to a program crash by automatically restarting the server Database connection management is included in dbserver An application server such as dbserver runs for long periods of time between reboots and so on dbserver s runtime duration might exceed that of the ORACLE database server In general these design goals are satisfi
111. es to a particular destination queue name Message routing can be used to ensure that detection processing of data from a particular station is directed to a specified queue The DACS can be configured to process messages from specific queues on specific machines for example a machine that physically holds the corresponding diskloop on a local disk interval router can also be used to implement data dependent routing for example to make a distinction between seismic and infrasonic stations recycler server Under certain system error conditions queue messages may be diverted to the error queue For example replicated servers that advertise a service may become unavailable if an operator inadvertently shuts down all servers that advertise the service A TMQFORWARD could subsequently try to send the message to the now unavailable service In case of such a failure the message ends up in the error queue perhaps after failed attempts by the TMQFORWARD An operator could attempt to manual recover this message recover the processing interval How ever recycler server automatically handles retries in this failure scenario recycler server regularly checks the error queue and recycles any messages found in the error queue by placing the messages back in their original queue processes 11 and 12 in Figure 14 on page 50 The error queue is distinct from the failed queue that collects messages from repeated application processing failures Reproce
112. esign ing and Automatic Processing can exchange messages As a result a unified mes saging model across the two systems which exchange messages was implemented In practice Interactive Processing and Automatic Processing run in separate DACS applications and as such the messaging does not cross between the systems However this configuration was not anticipated and therefore was not part of the design decision Earlier DACS implementations had also successfully used the unified model The TMQFOWARD tuxshell scheme is re used within the Interactive Processing configuration and as such some leveraging is realized even though the systems run in separate applications It would be possible to re imple ment DACS for Interactive Processing based upon a messaging infrastructure sepa rate from Tuxedo Such an implementation would likely have to include a gateway or bridge process to pass messages from Interactive Processing to the Tuxedo based DACS for Automatic Processing TABLE 3 DACS LIBIPC INTERVAL MESSAGE DEFINITION Field Name Database Interval Description 1 MSGID interval intvlid Each Tuxedo queue message can have a unique identifier assigned at the application level not assigned by Tuxedo which assigns its own identifier to each queue message for internal purposes This unique identifier is known as the queue correlation ID CORRID and this value can be used for query access to the queue message for example to delete or read the
113. ess 5 The queue forwarder passes the DACS generalized processing server the interval data as part of a service call process 6 The generalized processing server calls one or more processing applica tions which subject the processing interval to the automatic processing task pro cess 9 The generalized processing server manages the execution of the processing task and handles successful or failed runs and timeouts Failed process ing intervals as well as timeout of the application program result in a transaction rollback of the queue interval by the Tuxedo queue forwarder and a retry which repeats the queue forwarding sequence processes 5 6 7 and 9 Successful pro cessing intervals result in an enqueue of the updated interval into another down stream Tuxedo queue and a transactional commit of the original queue interval dequeued by the Tuxedo queue forwarder The downstream Tuxedo queue man ages the next step in the pipeline processing sequence which repeats the queue forwarding sequence processes 5 6 7 and 9 The generalized processing server manages the interval data in the database by updating the interval state to reflect the current processing state The actual database update is handled by the general ized database application server which retains one connection to the database while multiplexing database access to a number of generalized processing servers process 7 Queue intervals that fail due to system errors for examp
114. essage passing API The DACS shall provide this interface as a library for use by the developers of the Interactive Processing programs The library shall contain entry points to allow processes to register sub scribe unregister send poll receive replay and delete messages The DACS shall offer several types of notification when new messages are sent to a process The API is specified in more detail in the following list How Fulfilled This requirement is fulfilled by ibipc except that the ability to replay mes sages was not addressed Message subscription is limited to broadcasts to the dman client upon any message send and receive The message poll ing implementation was changed due to a problem with Tuxedo unsolic ited message handling The problem required heavier weight polling although the increased polling time was well within the relatively light message timing requirements The change requires querying the queue to see if a new message has been received The original implementation relied upon relatively light weight broadcasts that were sent by libipc to the receiving client the client that was being sent the message Solicit ing broadcast traffic is lighter weight than actually checking the receive queue register connect to messaging sys tem arguments specify logical name and physical location of process method of notification for waiting messages This requirement is fulfilled via the ipc_at
115. eue space or qspace in Tuxedo litera ture is a collection of queues The automated system of the IDC application soft ware works with two qspaces a primary and a backup on two different machines with dozens of queues in each qspace Utility Programs tmloadcf tmunloadcf The program tmloadcf loads converts Tuxedo DACS configuration from text file to binary machine readable form The program tmunloadcf unloads converts the binary machine readable form back to a text file tmadmin tmadmin is a command line utility that provides monitoring and control of the entire application This Tuxedo client reads from and writes to the BBL running on the master machine to query and alter the distributed application qmadmin qmadmin is a command line utility that provides monitoring and control of a disk qspace This Tuxedo client creates reads from and writes to a qspace on a Tuxedo queue host machine Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design This chapter describes the detailed design of the SAIC developed DACS CSCs non COTS DACS and includes the following topics m Data Flow Model m Processing Units m Database Description Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 4 Detailed Design This chapter introduces DACS servers clients and auxiliary programs that are p
116. every time The reason for the change is described in requirement 46 The handler function invokes ipc_receive to check a new mes sage The handler function is called as part of an X11 timer event callback which is currently configured to hap pen every 1 2 second unless the cli ent application cannot be presently interrupted for example during a database submit Message notifica tion type interrupt is not sup ported and this feature currently is not needed The DACS shall interface with the UNIX operating system to start Auto matic Processing programs and wait on the termination of these pro grams Processes started by the DACS shall inherit the system privileges of the DACS including the process group environment and file system This requirement is fulfilled by tux shell Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement The DACS shall collect the exit or abnormal termination status of pro cesses it starts The exit status shall be used to determine success or failure of the Automatic Processing pro gram Processes shall use a defined set of exit codes to indicate various levels of success and another set of codes to indicate different types of fai
117. ew station data Initial database input to tis server includes station and network affilia tions used to build a complete station site and channel table for all monitored sta tions tis server creates and updates intervals for processing data from continuously transmitting stations tis server forms new candidate intervals based upon the timely arrival of new station data and updates existing intervals that were previ ously skipped due to incomplete or nonexistent station data tis server generates output to log files the database Tuxedo queues and the scheduler server Output to the database includes new intervals be they incom plete interval state skipped or complete interval state queued Updates to the database include previously skipped intervals updated to queued intervals fol lowing the verification of newly arrived data tis server also optionally supports WaveGet server detects Retrieve Subsystem request retrieve failures by querying the request state and request statecount in the database Depending on the state and number of failed requests value of request statecount WaveGet server determines whether subsequent requests should be made or the state should be updated to failed to terminate the request and eliminate it from consideration in future WaveGet server invocations Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design
118. eyname intvlid proclass O class O 9 keyname procname name time keyname procclass endtime keyvalue procname intvlid Iddate time a Iddate intvlid class endtime sta time procname name procclass DL TRIGGER time endtime request state wfdisc moddate reqid Iddate sta Q sta 4 chan chan start time time end time 9 wfid O orid O chanid O evid sta reqid chan sta time chan wfid mpm chanid affiliation orid idate evi jo net start time nsamp sta end time samprate class calib t state calper sia statecount instype Iddate complete segtype requestor datatype modtime clip modauthor dir Iddate dfile foff commid FIGURE 32 ENTITY RELATIONSHIP OF SAIC DACS CSCs Distributed Application Control System DACS IDC 7 3 1 June 2001 o IDC DOCUMENTATION Chapter 4 Software Detailed Design Database Schema Table 5 shows the database tables used by the DACS For each table used the third column shows the purpose for reading or writing each attribute for each rele vant CSC TABLE 5 DATABASE USAGE BY DACS Table Action Usage by CSC affiliation read lis server tiseg server net and sta are read to map a network name to station names and station name to station sites interval read write Data Monitors time endtime class state and intvlid are read created and updated by the inter val creation algorithms dbserver state is updated v
119. fast forward full automatic processing automatic processing con figured for burst data for example GA replaced by additional instances of DFX full interactive processing play full automatic processing automatic processing con figured for normal opera tion full interactive processing slow motion partial automatic process ing automatic processing configured to run only a core subset of automatic processing tasks full interactive processing rewind full automatic processing after resetting the database to an earlier time full interactive processing pause completion of active auto matic processing full interactive processing 1 Slow motion is used to maintain time critical automatic processing when the full processing load exceeds the processing capacity Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 5 Software Requirements An additional general requirement is 8 The DACS shall be started at boot time by a computer on the IDC local area network The boot shall leave the DACS in the stop state After it is in this state the DACS shall be operational and unaffected by the halt or crash of any single computer on the network FUNCTIONAL REQUIREMENTS This section provides specific requirements for the DACS Each subparagraph describes a group of related requirements The requirements are grouped into the fun
120. g Processing Usage Usage Usage 8 6 ipc amp Retrieves the number of messages queued for N and Uses Tuxedo MIB the list of queue names specified birdie only calls to retrieve the number of mes sages 7 ipc purgeQ Deletes first or all messages from the specified N and Uses message queue birdie only dequeue s to purge queue message s 8 ipc_client_status Retrieves the processing status for each client N and Uses Tuxedo MIB defined in the list of specified clients birdie only calls to determine the client process ing status 9 ipc add xcallbackQ Registers a client callback function which is N all except N A invoked periodically for the purposes of polling dman an IPC queue Presumably the callback function will use ipc receive to retrieve IPC messages The frequency of the callbacks is currently fixed at two times per second 10 ipc remove xcallbackQ Removes the client callback function from the N all except N A clients ibXt Xtoolkit event loop dman 11 ipc get Retrieves error status following all libipc calls N all N A and detailed error information for any error conditions uSisaq A 91 M3jJIOS ip 21931d9eu5 n n E z 4 z Looz L Z 2GI Sova uia1sAs J011u05 uome2i ddy pojnqiasig TABLE 4 LIBIPC API CONTINUED DACS DACS IPC Queue IPC Qu Automatic Interactive API func
121. g capability so that network interruptions or scheduled mainte nance activity do not disrupt processing IDENTIFICATION The DACS components are identified as follows birdie m dbserver m dman m interval router m libipc m msg window m operate admin m Processinterval m qginfo 2 The DACS currently does not use Tuxedo for coordinating or managing ORACLE database transactions The DACS relies upon the native Generic Database Interface GDI API libgdi for all database opera tions As such the DACS coordinates database and Tuxedo queuing transactions within the specific server implementation and without automatic Tuxedo control Inherent coordination of database and queuing transactions for example two phase commits would require passing ORACLE transactions through Tux edo Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 1 Software Overview m recycler server m schedclient m schedule it m scheduler m SendMessage m ticron_server m tin server m is server m tiseg_ server m tuxpad m tuxshell m WaveGet server m WorkFlow STATUS OF DEVELOPMENT This document describes software that is for the most part mature and complete BACKGROUND AND HISTORY A previous implementation of the DACS based upon the Isis distributed processing system was deployed into operations at the PIDC at the Center for Monitoring Research CMR in Arlington Vir
122. g interact with all of the DACS functions The queueing function transactions replicated or backup servers and pipeline processing are described in the previous section The Tuxedo supplied distributed process monitoring function involves the real time monitoring of every DACS server IDC or COTS supplied such that the servers are automatically rebooted upon any application failure or crash Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software scheduling server automatic reprocessing of failures due to system errors recycler server 5 gt Chapter 2 Architectural Design Db operations iif wfdiscs ea intervals 2 3 processing data gt resource workflow monitor allocation monitor server tis_server ticron_server inteval_router WorkFlow tiseg_server tin_server WaveGet_server 10 reprocessing of Tuxedo failures under gt queues operator control intervals SendMessage X Processinterval 6 Tuxedo generalized database server 8 QUEUE processing forwarding server agent TMQFORWARD tuxshell dbserver 9 operator console data processing application program intervals v Db operations tuxpad DFX Tuxedo que
123. g subsequent calls to the data monitor server in an attempt to reconnect to the database server In this scenario the data monitor servers never crash or terminate due to database server down time General database query insert or update errors are handled via an attempt to rollback as much of the interval creation cycle work or progress as much as pos sible prior to ending the current interval creation cycle Included in this error state processing is an attempt to keep Tuxedo queue inserts and database inserts or updates as one transaction such that the database operation s are not committed until the Tuxedo enqueue s are successful This is shown in all of the data monitor data flow diagrams Figures 18 23 Errors for all database failures are logged to the data monitor log files Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design scheduler schedclient scheduler and schedclient support the DACS scheduling system scheduler satisfies the requirement for a centralized server for automatic data monitor calls and schedclient satisfies the requirement for a tool for the centralized management of the scheduling system The DACS data monitor application servers for example tis server WaveGet server await service calls from scheduler to carry out their data monitoring service and return acknowledgments to scheduler following completion of their service c
124. g the DACS CSCs for Interactive Processing Tuxedo provides the reliable message passing infrastructure for the DACS including reliable queuing and process monitoring process 3 libipc provides the asynchro nous message passing among the Interactive Tools within the Interactive Process ing This library is linked into all Interactive Processing clients for example ARS and dman and is not explicitly listed in the figure Actions within the interactive session are started by an analyst The analyst either explicitly starts the analyst review station tool ARS process 2 or it is automatically started by dman the DACS interactive session manager client process p Storing messages within a disk based Tuxedo queue ensures that the messaging is asynchronous because the message send and receive are part of separate queuing operations and transac 2 dbserver can update interval state or request state Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design tions Asynchronous messaging allows for one Interactive Tool for example ARS process 2 to send a message to another Interactive Tool that is not currently run ning XkfDisplay is used as an example in Figure 15 and similar control and data flow applies to other Interactive Tools The dman client provides a demand execu tion feature which starts an interactive client that is not already running and h
125. ge by the OSPACE environment variable and the enqueue to test the group and name arguments Returns a pointer specified default IPC to an ipcConn object which provides access to queue the IPC session for this client 2 ipc detachQ Detaches calling client from the IPC session N all N A pointed to by the ipcConn object argument 3 ipc sendQ Sends a message to the specified message N all except Uses a message queue within the IPC session pointed to by the dman enqueue and an ipcConn object argument event broadcast to the session s dman client 4 ipc receiveQ Retrieves the next message in the specified N all except Uses a message queue within the IPC session pointed to by the dman dequeue and an ipcConn object argument event broadcast to the session s dman client 5 ipc_check Returns boolean true if a new message has N all except N A arrived to the default queue since the last dman ipc receive call The default queue is the queue name provided during the ipc attach call and is defined in the ipcConn object This function always returns boolean true due to an implementation change to ibipc uSiseg A ip 1931d u5 91 M3J0S a e a z m z 4 gt 4 2 L E Z 2GdI Looz SDV Wayshs J01ju05 uomeo2ijddy poinquiasig TABLE 4 LIBIPC API CONTINUED DACS DACS IPC Queue IPC API function Description inae Interactive Event Tuxedo MIB rocessin
126. ginia U S A in the early 1990s The current Tux edo based DACS has been used at the PIDC and the International Data Centre of the Comprehensive Nuclear Test Ban Treaty Organization CTBTO in Vienna Aus tria since the spring of 1998 The graphical operator console tuxpad was deployed during 1999 and the DACS scheduling system was completely redesigned in early 2000 Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 1 Overview WV OPERATING ENVIRONMENT The following paragraphs describe the hardware and COTS software required to operate the DACS Hardware The DACS is highly scalable and is designed to run on Sun Microsystems SPARC workstations SPARC Enterprise servers The DACS for automatic processing runs on a distributed set of machines that can scale from a handful of machines to tens of machines depending on the data volume and available computing resources The DACS for interactive processing is most typically run in a stand alone single SPARC workstation configuration SPARC workstation and server models are always changing but a representative workstation is the SPARC Ultra 10 and a representative Enterprise Server is the SPARC Ultra Enterprise 4 000 configured with six Central Processing Units CPUs Typically the hardware is configured with between 64 1 024 MB of memory and a minimum of 10 GB of magnetic disk The required disk space is defined by other subsys
127. gramming interface the message service will be avail able to the Interactive Processing programs via a software library linked at compile time 18 The message passing service shall provide an administrative control process to support administrative actions The administrative actions shall allow a user to add or delete messages from any message queue and to obtain a list of all pro cesses registered to receive messages 19 The DACS shall deliver messages within one second of posting given that net work utilization is below 10 percent of capacity 20 If the receiving process is not active or is not accepting messages the DACS shall hold the message indefinitely until delivery is requested by the receiving process or deleted by an administrative control process Distributed Application Control System DACS eo June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Requirements 21 Interactive processing programs may request the send or receive of messages at any time Multiple processes may simultaneously request any of the mes sage services 22 The DACS shall be capable of queuing holding 10 000 messages for each process that is capable of receiving messages 23 The size limit of each message is 4 096 4K bytes in length 24 The DACS shall continue to function as a message passing service in the event of defined hardware and software failures The DACS reliability and continu ous operations requirements are specified
128. hird party soft sonable degree The interactive mes ware products and protocol saging library libipc was standards implemented with the requirement in mind in that the Tuxedo layer is sepa rated from the general messaging API wherever possible For Automatic Processing layering is in certain cases challenging because deploy ment of a Tuxedo application such as the DACS is at the system and user configuration levels 62 The DACS shall implement the func This requirement is fulfilled to a rea tions of workflow management availability management inter pro cess communications and system monitoring as separate stand alone programs sonable degree WorkFlow manage ment is implemented by several cooperating programs Availability management and system monitoring is handled in part by Tuxedo which relies on a distributed set of servers to carry out this function Inter process communications is handled by a vari ety of programs libraries and system resources such as qspace disk files v Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 16 TRACEABILITY OF SYSTEM REQUIREMENTS CONTINUED Requirement The DACS shall use COTS for internal components where practical Practical in this situation means where there is a strong functional overlap between the DACS requirements and COTS capabilities How Fulfilled This requirement is fulfilled by Tux edo The DACS shall be designed to
129. his environment and the distinct UIDs have been specified in the MACHINES section of the ubbconfig file To launch other servers tlisten uses tagent which is supplied by Tuxedo In con trast to tlisten tagent is only launched on demand and promptly exits after com pleting its task Administrative Servers Administrative servers are Tuxedo supplied servers which implement the funda mental elements and infrastructure of the distributed application These include network based message passing and management of the state of the distributed application distributed transaction management and queuing services BSBRIDGE and BRIDGE The bootstrap bridge BSBRIDGE is launched by tlisten when the user boots the administrative servers on a machine BSBRIDGE prepares the launch of the perma nent BRIDGE and exits as soon as BRIDGE has been established BRIDGE manages the exchange of all information between machines such as the passing of messages BRIDGE remains in the process table until the application is shut down completely or on the particular machine If BRIDGE crashes or is termi nated accidentally the machine is partitioned can no longer be accessed from other DACS machines via IPC resources BRIDGE and BBL and operator interven tion is required to restore processing on the machine BBL DBBL The Bulletin Board Liaison BBL generates and manages the Bulletin Board The Bulletin Board is a section of shared memory in which
130. ia tuxshell service calls to dbserver WorkFlow interval records are read and displayed graphically and state is updated as part of interval reprocessing lastid read write Data Monitors key value and keyname are used to ensure unique intvlid s for each interval creation via a lock for update database operation request read write WaveGet server array chan start_time end time state statecount and requestor are used to priori tize and request auxiliary waveform acquisition or terminate repeated and unsatisfied requests liseg server array start_time end time and state are used to initiate auxiliary station processing for requests that are complete as defined by state dbserver state is updated via tuxshell service calls to dbserver WorkFlow request records are read and displayed graphically Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design V TABLE 5 DATABASE USAGE BY DACS CONTINUED Table Action Usage by CSC timestamp read write Data Monitors procclass procname and time are used to track interval creation progress and retrieve current wfdisc station endtime wfdisc endtime wfdisc NVIAR endtime These databases trigger update time upon any wfdisc end time update wfdisc read 15 server tiseg server time endtime sta chan are used to determine data availability for continuous and auxiliary data stat
131. ic recall RSEISMO interactive hydro recall RHYDRO and interactive auxil iary data request IADR Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design V a Start analyst ARS Start dman 2 automatic start of ARS IPC message optional ARS for XfkDisplay XfkDisplay 5 XfkDisplay request event Retrieve 1 XfkDisplay dman Tuxedo message Display session clients queuing e Demand execution events process Message monitoring monitoring XfkDisplay result du qup XfkDisplay 2 C XfkDisplay gt Db operations A BOTF XfkDisplay IPC result acknowledgement TMQFORWARD ARS A 1 v BOTF acknowledgement interactive recall processing FIGURE 15 CONTROL AND DATA FLOW or DACS CSCS FOR INTERACTIVE PROCESSING Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 4 Software Detailed Design PROCESSING UNITS SAIC DACS CSCs consist of the following processing units m Data monitor servers tis server tiseg server ticron server tin server and WaveGet server m Sscheduler schedclient m tuxshell m dbserver interval router recycler server m WorkFlow SendMessage and Processi
132. ics of the API Important details related to the ibipc API are listed in Table 4 The first column in the table lists the API call name The second column describes the call The third column indicates if the call is used by any of the DACS CSCs for Auto matic Processing In general the DACS CSCs for Automatic Processing do not rely upon ibipc for their messaging and their usage is limited to fairly trivial conve nience functions The fourth column indicates which Interactive Processing DACS clients use the API call The final column briefly notes the API call s usage of queu ing events and Tuxedo Management Information Base MIB calls The Tuxedo MIB API provides for querying and changing the distributed application dman input and output beyond that already described and related to ibipc is described in the Interactive Analysis Subsystem Software User Manual IDC6 5 1 the dman man page dman 1 birdie is a command line driven program and its inputs and outputs are described in the birdie man page birdie 1 Distributed Application Control System DACS IDC 7 3 1 June 2001 v Looz L Z 2GI Sdwa 545 J01ju05 uome2iddy poinquiasig TABLE 4 LIBIPC API DACS DACS IPC Queue IPC API function Description interactive Event Tuxedo MIB rocessing Processing Usage Usage Usage 8 1 ipc Attaches calling client to the IPC session defined N all Uses a messa
133. ily to the start time and end time of database intervals There fore it is possible to specify query parameters that are entirely based upon the interval table whereby tis server forms new intervals based upon the progress of other related intervals This generalized use of tis server is employed in a number of cases to form pipeline processing sequences based upon the existence of spe cific interval states within a specified range of time The design of tis server addresses a number of complexities specifically related to continuous station data transmission wfdisc based monitoring Therefore the more general interval based monitoring uses of tis server exercise a relatively small percentage of the server s features tiseg server tiseg server creates intervals of the class that correspond to relatively short segments of irregular duration from auxiliary seismic stations The created intervals are enqueued into a Tuxedo queue to initiate detection and station processing tiseg server periodically checks the wfdisc table for new entries originating from seismic stations Each auxiliary seismic station has a designated monitor channel that serves as the time reference channel for forming the T B intervals in the inter val table Complete queued intervals are formed in the interval table when the monitor channel is found along with all other expected channels Figure 20 Incomplete partial intervals are formed when the monitor channel
134. in Reliability on page 134 Workflow Management Workflow management in the context of the DACS refers to the marshalling of data through data processing sequences The steps tasks in a data processing sequence are independent of each other with the exception of order That is if step B follows step A then step B may be initiated any time after the successful termination of step A The independence of the processing tasks allows task B to run on a different computer than task A Workflow management allows for different types of ordering Sequential ordering requires that one task run before another task Parallel ordering allows two tasks to execute simultaneously yet both must finish before the next task in the sequence may begin Conditional ordering allows one of two tasks to be selected as the next task in the sequence based on the results of the current processing task Finally a compound ordering allows for a sub sequence of tasks within a task sequence A compound statement requires all internal processing steps to finish before the next interval is submitted to the compound statement 25 The DACS shall provide workflow management for the Automatic Processing Workflow management ensures that data elements get processed by a sequence of Automatic Processing programs A data element is a collection of data typically a discrete time interval of time series data that is maintained by processes external to the DACS The DACS workflow m
135. indow could be run stand alone however the tuxpad temporary file name would have to be known which is possible but not convenient to determine Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software Detailed Design qhost qspace queue info qmadmin queuenames Main Driver qopen qspace colors 77775555 qinfo stdout stderr D user parameters stdaut stderr tmp Display queues by color in GUI update every n seconds tuxpad pid FIGURE 30 QINFO DESIGN schedule it provides a GUI to display and manipulate the scheduling system s schedule service table The script is a convenient front end to the schedclient com mand Figure 31 schedule it issues schedclient commands and parses results from schedclient The schedclient commands supported by schedule it are as follows m show for on demand querying and displaying of the service list m stall and unstall for stalling or unstalling user selected service s m init for re initializing the scheduling system m kick for resetting the scheduling system Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software tmp D1 user parameters D2 tuxpad pid j stdout stderr stdout stderr 2 N Chapter 4 Detailed Design schedclient show command service li
136. ion requeues the message to the top of the originating queue and increases the retry count This recycling action continues until a retry threshold set at queue creation time has been exceeded at which point TMQFORWARD drops the message If all servers advertising the service are busy TMOFORWARD waits for one to become available If the service is not being advertised TMQFORWARD enqueues the mes sage into the error queue TMSYSEVT TMUSREVT TMSYSEVT and TMUSREVT are servers that act as event brokers These servers allow communication between application servers and clients and are used only in the interactive DACS application 1 TMQFORWARD can call any server that advertises the same server name as the name of the queue that TMQFORWARD monitors The DACS uses TAQFORWARDs that only call tuxshell servers Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 3 Tuxedo Components and V Concepts IPC Resources Tuxedo uses several IPC resources These are shared memory message queues and semaphores These resources must be sized correctly within the operating sys tem in the etc system file and are dynamically allocated and freed by Tuxedo at run time Special Files ubbconfig tuxconfig The binary tuxconfig file contains the complete configuration of the application in machine readable form The Tuxedo operator on the Master machine generates this file by compili
137. ions 1 Data Monitors include tis server tiseg server ticron server tin server and WaveGet server 2 The IDC does not use the request based interval creation feature of tiseg server Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 5 Requirements This chapter describes the requirements of the DACS and includes the following topics m Introduction m General Requirements m Functional Requirements m CSCI External Interface Requirements m CSCI Internal Data Requirements m System Requirements m Requirements Traceability Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 5 Requirements INTRODUCTION The requirements of the DACS can be categorized as general functional or system requirements General requirements are nonfunctional aspects of the DACS These requirements express goals design objectives and similar constraints that are qual itative properties of the system The degree to which these requirements are actu ally met can only be judged qualitatively Functional requirements describe what the DACS is to do and how it is to do it System requirements pertain to general constraints such as compatibility with other IDC subsystems use of recognized standards for formats and protocols and incorporation of standard subprogram libraries GENERAL REQUIREMENTS The DACS capabilities deri
138. irements TABLE 13 TRACEABILITY OF FUNCTIONAL REQUIREMENTS RELIABILITY CONTINUED Requirement The DACS shall execute Automatic Processing programs exactly once for each data element A program execu tion is a transaction consisting of start run and exit If the transaction aborts before completion of the exit the DACS shall retry the transaction a limited configurable number of times How Fulfilled This requirement is fulfilled by the DACS TMQFORWARD server and transaction The DACS shall function as a system in the event of defined hardware and software failures The failure model used by the DACS is given in Table 7 For failures within the model the DACS shall mask and attempt to repair the failures Failure masking means that any process depending upon the services of the DACS pri marily the Automatic and Interactive Processing software remains unaf fected by failures other than to notice a time delay for responses from the failed process Failures outside the failure model may lead to undefined behavior for example a faulty ether net card is undetectable and unre pairable by software This requirement is fulfilled via the DACS ability to survive most failure conditions as discussed previously The DACS shall detect failures and respond to failures within specified time limits The time limits are given in Table 7 This requirement is fulfilled via the DACS ability to surviv
139. is found in the absence of a specified minimum number of related station channels Partial intervals in the interval table are completed updated to queued when the mini mum number of missing channels can be found within a user specified elapsed time period Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design a e Db operations intervals scheduler timestamp Reschedule Update timestamp D user parameters Main Driver Apply partial interval check existing intervals Db operations partial intervals Sort wfdisc by station check for monitor channel Insert new partial intervals updated M partial intervals Check minimum channels M3 full intervals Create partial interval Write and send intervals one transaction 2 partial intervals 4 1 DFX FIGURE 20 TISEG_SERVER DATA FLOW Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design ticron server ticron server creates fixed length intervals of the class based on a fixed elapsed time setback prior to the current real time Created intervals are inserted i
140. isible to the rest of the system until the global transaction is resolved by either commit or rollback after which only one queue element will remain in the state queue If scheduler returns success to TMQFOR WARD step 3 following success of the updated requeue step 7 TMQFOR WARD will commit the global transaction This commit operation results in the commits of the original dequeue operation step 2 after commits for the com mand s dequeued step 4 any results enqueued step 5 and the enqueue of the updated state step 7 Otherwise if scheduler returns fail to TMQFORWARD step 3 TMQFORWARD rollsback the global transaction This rollback operation negates all queuing operations including any dequeues from the command queue step 4 enqueues to the result queue step 5 requeues to the state queue step 7 and the original dequeue from the state queue step 2 Prior to scheduler returning success to TMQFORWARD and final transaction com mit step 7 data monitor servers are called for all services that are at or past this scheduled time step 6 The data monitor service call is asynchronous and cannot be rolled back therefore it is not considered part of the global transaction In prac tice this limitation does not present a problem because the function of the sched uling system is to call the data monitor servers on schedule Failure of a data monitor service is outside the scope of the scheduling system design The
141. ith other IDC systems 34 machines viii operational modes 127 operator interface 35 data flow symbols v data monitors viii ticron server 63 64 67 tiseg server 61 data monitor servers 54 DBBL 42 dbserver 31 32 51 89 91 control 92 error states 93 91 interfaces 92 Index dequeue viii distinguished bulletin board 43 distributed processing 8 23 distribution objectives 24 dman 100 control 109 error states 109 105 interfaces 109 E enqueue viii entity relationship symbols vi F forwarding agent viii 23 functional requirements 128 G generalized processing server tuxshell viii general requirements 126 traceability 144 H hardware requirements 11 host 21 instance Interactive Processing 6 32 conceptual data flow 16 interval ix IDC DOCUMENTATION Software interval router 30 32 49 90 control 92 error states 93 91 interfaces 92 IPC resources 45 L lastid 27 121 122 libgdi 119 libipc 19 100 control 109 error states 109 I O 105 interfaces 109 libraries global 18 listener daemons tlisten and tagent 38 LMID ix load balancing 24 load limitation 24 log files 20 M Master ix message ix message passing requirements 129 traceability 150 message queue ix 21 45 middleware 7 minimization of network traffic 24 msg window 110 control 118 error states 119 115 interfaces 118 Distributed Application Control System DACS June 2001 IDC
142. ive Processing full interactive processing How Fulfilled For Automatic Processing the DACS provides extensive support for scaling the number of machines servers and services as well as which of these resources are active at any given time Slow motion can be displayed via tuxpad by deactivating or shut ting down a class or servers for example network processing or reducing the number of a particular type of server for example reduce the number of DFX instances In addition the tuxpad schedule it script can be used to stall data moni tor instances to eliminate or reduce the creation of new pipeline process ing sequences For Interactive Processing this requirement is fulfilled the same as above although this processing mode is not generally applicable to Interac tive Processing IO Operational Mode rewind Automatic Processing full automatic processing after resetting the data base to an earlier time Interactive Processing full interactive processing For Automatic Processing the rewind processing mode requires an operator to delete intervals in the interval table or set them to state skipped where applicable so that data monitor serv ers will completely reprocess a time period of data For Interactive Processing this mode is not applicable as far as the DACS is concerned Repeated Event Review is controlled by the analyst Distributed Application Control System DACS June 2001 IDC 7 3 1
143. ives input from user parameters and data monitor application servers The user parameters define the mapping between interval name same as station or sensor name and the target Tuxedo queue name as well as the name of the qspace to which the messages will be routed A data monitor server such as tis server can optionally rely upon interval router for enqueuing new intervals into Tuxedo queues A tis server sends interval router the interval IPC message and interval router performs the enqueue operation as a function of the interval name The interval name is extracted from the interval message The name is extracted by the Tuxedo FML32 library which provides an API interface for reading from and writing to Tuxedo IPC messages The interval message source and destination fields are set by interval router to conform with the DACS interval message format standard see libipc below for details interval router then attempts to map the interval name to the target queue as defined by the user parameters interval router returns a success or failure service call return value to the calling tis server depending on the status of the mapping and or enqueue operation interval router logs all routing progress to the user defined log file em DACS v IDC DOCUMENTATION Chapter 4 Software W Detailed Design recycler server recycler server receives input from user parameters and a TMQFORWARD server The user parameters define the name of the qspac
144. kflow status and the mes tuxpad and dman for the DACS cli sage passing service ents and servers 29 The DACS shall monitor the status of This requirement is fulfilled in the each computer on the network and the status of all computers shall be visible on the operator s console cur rent to within 30 seconds DACS through Tuxedo WorkFlow tuxpad and dman for DACS clients and servers Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 12 TRACEABILITY OF FUNCTIONAL REQUIREMENTS SYSTEM MONITORING CONTINUED Requirement How Fulfilled WwW o 30 The DACS shall provide an interface This requirement is fulfilled in the to indicate the run time status of all DACS through tuxpad and dman but processes relevant to Automatic Pro the database server is not monitored cessing and Interactive Processing because this is not a DACS process This set of processes includes data base servers and DACS components 30 1 The DACS shall provide a display This requirement is fulfilled by the indicating the last completed auto WorkFlow application matic processing step for each inter val within the workflow management 30 2 The same display shall provide a sum Same as above mary that indicates the processing sequence completion times for all intervals available to Interactive Pro cessing that is more recent than the last data migration
145. l migrate all processes on that worksta tion to other workstations The time allowed for migration shall be the upper run time limit for the Auto matic Processing programs In other words running programs shall be allowed to complete before the migration occurs How Fulfilled Run time host and server migration is supported by the DACS and is acces sible via tuxpad Run time addition of a workstation is supported if the workstation was defined in the ubbconfig file Presumably the workstation is defined but is dor mant until an operator decides to migrate or initiate processing on the machine Unconfigured workstations cannot be added during run time Tuxedo supports this feature but the DACS does not currently use it The DACS control interface shall allow run time reconfiguration of the DACS programs Reconfiguration shall allow an increase decrease or migration of Automatic Processing programs Run time server migration is sup ported by the DACS and is accessible via tuxpad The DACS control interface shall allow access to the availability man ager for starting or stopping individ ual DACS and Automatic Processing programs This requirement is fulfilled via tux pad The DACS control interface shall allow manual processing and repro cessing of data elements through their respective sequences This requirement is fulfilled via the interval reprocessing feature of WorkFlow which is bas
146. l maintain start and restart a population of automated and interactive processes equal to the number supplied in the DACS config uration file The DACS shall also monitor its internal components and maintain them as necessary Complete process monitoring includ ing boot and shutdown of all config ured processes as well as monitoring and restart of all configured processes is provided by the DACS via Tuxedo The DACS shall start and manage processes upon messages being sent to a named service If too few auto mated processes are active with the name of the requested service the DACS shall start additional processes up to a limit that have been config ured to provide that service If an interactive process is not active the DACS shall start a single instance of the application when a message is sent to that application For Automatic Processing the Tuxedo DACS generally starts servers and keeps them running so server startup upon message send is not typically required However server scaling is supported wherein the number of active servers advertising a given ser vice name can increase as the number of queued messages increases For Interactive Processing the dman client supports demand execution which starts a single application instance upon a message send if the application is not already running The DACS shall be fully operational in stop mode within 10 minutes of net work boot For Automatic P
147. l not result in anything more than rolling back state and retrying until the problem is fixed The replicated fault tolerant design of the scheduling system allows for continued successful system scheduling during n 1 scheduler server failures when n replicated servers are configured schedclient is relatively simple and may only fail to submit commands to the sched ule command queue if the Tuxedo queuing system is unavailable or has failed Notice of such failures is immediate and failures are reported to the user via the controlling environment be it a command shell or the tuxpad GUI message win dow Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 4 tuxshell Detailed Design IDC Automatic Processing applications such as DFX and GA are not DACS servers or clients Rather they are child processes of the generalized processing server tux shell tuxshell satisfies the system requirements for support of basic but reliable pipeline process sequencing Pipeline process sequencing requires application soft ware execution and management within a transactional context tuxshell performs the following functions as a transaction when called by a TMQFORWARD or another tuxshell Figure 25 1 Receive the message that was dequeued from the source queue by the TMQFORWARD that is upstream in th
148. l time SEL3 Standard Event List 3 S H I bulletin cre ated by totally automatic analysis of both continuous data and segments of data specifically down loaded from sta tions of the auxiliary seismic network Typically the list runs 12 hours behind real time server Software module that accepts requests from clients and other servers and returns replies Distributed Application Control System DACS Glossary W server group Set of servers that have been assigned a common GROUPNO parameter in the ubbconfig file All servers in one server group must run on the same logical machine LMID Servers in a group often advertise equivalent or logically related services service Action performed by an application server The server is said to be advertis ing that service A server may advertise several services multiple personalities and several servers may advertise the same service replicated servers shutdown Action of terminating a server process as a memory resident task Shutting down the whole application is equivalent to terminating all specified server pro cesses admin servers first application servers second in the reverse order that they were booted Solaris Name of the operating system used on Sun Microsystems hardware SRVID Server identifier integer between 1 and 29999 uniquely referring to a particular server The SRVID is used in the ubbconfig file and with Tuxedo administra
149. le machine crash can be directed to a system wide error queue from where they are automat ically recycled back into service by the automatic reprocessing server process 11 The system operator can control DACS via the GUI based operator console pro cess 8 Control includes complete DACS bootup or shut down boot and shut down on a machine process group or process server basis control of the DACS scheduling system and monitoring of Tuxedo queues The system operator can also manually reprocess failed intervals via a feature of the workflow monitoring system process 10 Figure 7 shows the conceptual data flow of the DACS for Interactive Processing using as an example a request for frequency wavenumber Fk analysis of a signal Here the DACS supports the asynchronous messaging between Interactive Tools manages the interactive session by monitoring messages and Interactive Tools Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design V within the session and starts the Interactive Tools on demand All messages exchanged between the Interactive Tools pass through Tuxedo disk queues Stor ing messages within a disk based Tuxedo queue ensures that the messaging is asynchronous because the message send and receive are part of separate queuing operations and transactions Asynchronous messaging allows for one Interactive Tool process 1 to send a message
150. le it and qinfo The GUI presents the messages in a scrolling window that can be cleared via a button press The total number of buffered messages is also displayed msg window is designed around a UNIX tai1 command that is issued on the tuxpad temporary logging file created by tuxpad process 4 in Figure 29 tuxpad redirects standard output and standard error to the temporary file so that all output by tuxpad and any other program or script that is started by tuxpad for example schedule it is captured and displayed msg window is started by a tuxpad button and is intended to run via tuxpad qinfo provides a GUI to display the state of a Tuxedo qspace The script is a conve nient front end to the Tuxedo qmadmin queue administration utility Figure 30 qinfo runs qmadmin on the specified QHOST The QHOST can be reset within tux pad so that the backup qspace can also be monitored via a separate qinfo instance qinfo dynamically updates the display at a user defined interval by presenting the colored bars to show the number of messages in each queue qinfo issues the qmadmin commands and parses command output to open the qspace command qopen and obtains the name and number of messages queued in every queue in the qspace command qinfo The qspace and queues that are monitored by qinfo are defined by user parameters where each queue name to be monitored is speci fied along with the color to use for the message queue length graph 14 msg w
151. line Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design The Processinterval and SendMessage interfaces and interaction between each other and WorkFlow are described in the previous sections Error States WorkFlow errors and failures can occur at program initialization or during program execution The most typical error state is invalid or incomplete user parameters User parameters define the time interval classes state to color mappings interval reprocessing commands as well as database account and query information Incorrect database parameters usually result in WorkFlow termination Incomplete color state specification can result in program termination or unexpected and con fusing color mappings Insufficient color map availability is a common error state whereby WorkFlow will not even start WorkFlow provides produces relevant error messages to direct the user to a solution Runtime WorkFlow errors are most typically associated with a database server fail ure where for example the server may go away for a period of time WorkFlow has been designed to survive a database server outage via recurring attempts to reconnect to the database server and resume normal continuous monitoring ProcessInterval errors are probably due to invalid user parameters which should become apparent via error messages provided to the WorkFlow GUI message win dow Sen
152. lure How Fulfilled This requirement is fulfilled by tux shell The DACS shall interface with an operator or operators The DACS shall provide monitoring displays and control interfaces The monitoring displays shall provide system moni toring for computer status process status workflow status and the mes sage passing service The informa tion presented with each monitoring display is specified in System Moni toring on page 133 The control interface shall enable the operator to take actions on the DACS The con trol interface supports the functions listed in the following subparagraphs This requirement is fulfilled by the tuxpad scripts WorkFlow and the dman client The DACS control interface shall allow selection from among the auto matic processing modes listed in Table 6 on page 127 This requirement is fulfilled by the tuxpad scripts tuxpad and schedule it The processing modes are defined in requirements 1 7 v Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement The DACS control interface shall allow run time reconfiguration of the host computer network Reconfigura tion may take the form of added deleted or upgraded workstations The DACS shall allow an operator to dynamically identify the available workstations When a workstation is removed from service the DACS shal
153. ly stored in files which control the behavior of appli cations at run time configuration item Aggregation of hardware software or both treated as a single entity in the con figuration management process control flow Sequence in which operations are per formed during the execution of a com puter program COTS Commercial Off the Shelf terminology that designates products such as hard ware or software that can be acquired from existing inventory and used with out modification crash Sudden and complete failure of a com puter system or component CSC Computer Software Component CSCI Computer Software Configuration Item Distributed Application Control System DACS Glossary V D DACS Distributed Application Control System This software supports inter application message passing and process manage ment DACS machines Machines on a Local Area Network LAN that are explicitly named in the MACHINES and NETWORK sections of the ubbconfig file Each machine is given a logical reference see LMID to associate with its physical name daemon Executable program that runs continu ously without operator intervention Usually the system starts daemons dur ing initialization Example cron data flow Sequence in which data are transferred used and transformed during the execu tion of a computer program data monitors Class of application servers that monitor data streams an
154. mated processing application com mand line For messages returned to an Inter active Tool from tuxshell MSGDATA stores a success or fail code string that represents the status of the automated processing applica tion For messages within Interactive Process ing MSGDATA stores string based IPC messages relevant to the sender and receiver Interactive Tools These IPC messages may include algorithm parameters database account and table names file path names scheme code and so on 6 MSGDATA2 N A This field stores interval priority assigned by a DACS data monitor DACS queuing optionally supports out of order dequeuing for example via TMQFORWARD based upon interval pri ority The data monitor server tis server can enqueue new intervals such that more recent or current data are processing before older or late arriving data 7 MSGDATA3 N A This field stores application processing time out failcounts which are managed by tux shell 8 MSGDATA4 N A This field is reserved for future use 9 MSGDATA5 N A This field is reserved for future use 10 FAILCOUNT N A This field stores application processing fail counts which are managed by tuxshell Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design Input Processing Output Input and outputs within ibipc are largely based upon the details or call semant
155. ment the time out retry count or if the time out retry count has been exceeded place the message into the failed queue and update inter val state to timeout xxx go to sleep await next service call The preceding list is applicable to tuxshell for Automatic Processing For the Inter active Processing database operations are absent in other words no interval table updates and an additional reply message success or failure is sent to the sender for example ARS the value of which is equal to the return code of the child tuxshell works in a transactional mode tuxshell rolls back any changes to the queues and the interval request table if some error other than a failure of the appli cation program occurs Application program failures both orderly ones with non zero return codes and ungraceful terminations are handled through the retry failed mechanism described previously However child processes access the data base independently and not through the DACS so they are responsible for ensur ing the rollback upon abnormal termination or time out Distributed Application Control System DACS June 2001 IDC 7 3 1 L E Z 2GI Looz SDV u9js s J01ju05 uomeo2ijddy poinquiasig D user parameters E Forward IPC message TMQFORWARD or another tuxshell L p Main Driver 3 3 Parse extract key values Db interval Fo
156. mputer type Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements IDC DOCUMENTATION Software TABLE 16 TRACEABILITY OF SYSTEM REQUIREMENTS CONTINUED Requirement Only authorized users shall be allowed to initiate processing Unau thorized requests shall be rejected and logged The DACS shall require passwords from authorized users at login How Fulfilled Administration of the DACS typically carried out through tuxpad is limited to the Tuxedo user or the user that owns the DACS processes defined in the ubbconfig file Password authentication is implicitly handled by the operating system The DACS has not implemented any authentication specific to the CSCI Tuxedo offers various options to do so if needed The DACS shall operate in the IDC environment Fulfilled The DACS shall operate in the same hardware environment as the IDC Fulfilled The DACS requires extensive data base queries to detect new wfdisc records These queries will impact the database server Otherwise the DACS shall consume negligible hard ware resources This requirement has been fulfilled Even though the Tuxedo based DACS manifests in a large number of processes spread across the LAN the processes consume a relatively small amount of computing resources The expense of the wfdisc queries has been partially mitigated through the introduction of database t
157. n Reliability on page 134 Reliability Reliability in the context of the DACS refers primarily to the integrity of the work flow management and message passing and secondarily to the continued but perhaps limited operation of the DACS during system failures The DACS is one of the primary providers of computing reliability in the IDC System Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Distributed Application Control Syst IDC 7 3 1 June 2001 Requirements The integrity of the DACS guarantees that messages are delivered exactly once and Automatic Processing is invoked exactly once for each data element Mes sages and data sequences are preserved across system failures When forced to choose the DACS takes the conservative approach of preserving data at the expense of timely responses The DACS provides continued operation in the event of defined system failures The DACS operation may be interrupted briefly as replacement components are restarted possibly on other computers The DACS monitors and restarts both internal components and Automatic Processing programs Interactive programs are not restarted because it is not known whether the user intentionally terminated a program 36 The DACS shall deliver each message exactly once after the successful post ing of the message by the sending process 37 The DACS shall execute Automatic Processing progr
158. n monitoring these intervals for example via the WorkFlow dis play Control Tuxedo controls the start up and shut down of tuxshell because tuxshell is a Tux edo application server However tuxshell can also be manually shut down and booted by the operator Tuxedo actually handles all process execution and termi nation Tuxedo also monitors tuxshell servers and provides automatic restart upon any unplanned server termination Interfaces Operators use the Tuxedo command line administration utilities directly or indi rectly by tuxpad to manually boot and shut down tuxshell Error States tuxshell can fail during start up if the user parameter file is non existent or contains invalid settings Start up errors are recorded in the local Tuxedo ULOG file of the machine hosting the failed tuxshell server tuxshell error handling of the application server child process is fairly extensive and is described in Input Processing Out put on page 78 tuxshell servers benefit from server replication wherein a given tuxshell instance can be replicated across more than one machine In this scenario recovery from any server or machine failure is seamless because the replicated tuxshell server takes over processing Tuxedo recovers the program crash by automatically restart ing the server Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design W dbserver in
159. n programs on each machine and using the underlying operating system to maintain contact The UNIX operating system contains some tools for distrib uted command execution the suite of remote commands rsh rusers rcp but these lack the extended functionality necessary to support a highly available auto matic application In particular these tools intrinsically do not support process mon itoring process and resources replication and migration and transactions which are all important elements in a highly available and fault tolerant distributed appli cation Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 1 Overview Figure 5 shows how the DACS controls the application software that is running on several machines in a distributed fashion The individual instances of the DACS coordinate among themselves using features of the underlying operating system and the LAN connecting the machines The DACS provides UNIX process management failure retries controlled start up and shut down priority processing run time reconfiguration a monitoring inter face and fault tolerant operations All of these functions are supported across a distributed computing platform computer 1 application software computer3 computer2 application software application software operating system operating system operating system
160. n the interval and request database tables Workflow monitoring is primarily a read only operation However failed intervals can be reprocessed under operator control process 10 in Figure 12 on page 29 The interval reprocessing function is implemented by the SendMessage client and Pro Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design cessInterval script which collectively change the state in the database of the inter val being reprocessed and requeue the interval message to the source queue These operations manually initiate automatic processing on the interval Automatic Processing Utilities Elements of scalability and reliability in the DACS are provided by several Auto matic Processing utilities Two of these utilities have been described above dbserver updates the database for all interval state or request state updates within the DACS process 7 in Figure 12 on page 29 and interval router process 3 in Figure 12 on page 29 routes interval messages created by the data monitor serv ers to a set of queues as a function of the interval name System errors such as a machine crash or network failure can and do result in messages that cannot be reli ably delivered within the distributed processing system The DACS message pass ing is based on Tuxedo disk queues which safeguard against the loss of messages during system failures Queue ope
161. nd libtable Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design The software of the DACS is linked to a number of standard system libraries the bulk of which are required for X11 Window GUI based applications such as Work Flow The software is also linked to several Oracle COTS libraries indirectly through run time linking by libgdi The software is linked to the following Tuxedo COTS librar ies libbuft libfml libfml32 libgp libtux and libtux2 Database See Database Schema Overview on page 27 for a description of database tables and usage by DACS Interprocess Communication IPC By its very nature of being a distributed processing system the DACS uses and implements various types of IPC and IPC resources All Tuxedo queuing operations are a form of IPC message passing across machines Tuxedo provides the BRIDGE server which runs on each distributed machine in the DACS and provides a single point for all Tuxedo based distributed message sends and message receives The libipc messaging library implements a message passing API based upon Tuxedo queuing The Tuxedo system makes extensive use of the UNIX system IPC resources including shared memory message queues memory based and sema phores Finally the DACS relies upon the ORACLE database for another type of IPC via creation update and read to the interval request timestamp
162. nents of the operation for simplicity A Reply Queue feature is provided by Tuxedo but is not exploited for building pipelines in the IDC application instead the processing server places messages directly in the next queue of the processing sequence queue B in Figure 9 not shown on Figure 10 request ZI 1 request NE forwarding processing agent server 2 3 md response reply av 4 FiGURE 10 FORWARDING AGENT Distribution and Backup Concept Even with multiprocessor machines no single computer within the IDC has the capacity to run the entire IDC software Therefore the application must use several physical machines Moreover the number of data sources exceeds the number of available processors by an order of magnitude and processing the data from a sin gle source requires substantial computing resources This combination suggests a queueing system to distribute the processing load over both space and time Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design The constraints imposed by the computer resources lead to the design of the IDC software as a distributed application with message queues Processing is divided into a number of elementary services These services are provided by server pro grams which run on a number of machines under the control of
163. ng the text file ubbconfig using the command tmloadcf The Syntax is checked before the compilation At boot time the tuxconfig binary file is then automatically propagated to all machines in the application The current state of the configuration of the application can be observed using the command tmunloadcf or with the tuxpad GUI User Logs All Tuxedo processes write routine messages warnings and error messages to ASCII user log files ULOG mmddyy with mmddyy representing month day and year The log files are kept on a local disk partition for each machine to avoid los ing logs or delaying processing due to network problems Transaction Logs Tuxedo tracks all currently open transactions on all machines by recording transac tion states in tlog files Consequently open transactions are not lost even if a machine crashes The tlog files are binary and have the internal structure of a Tuxedo device Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 3 Software Tuxedo Components and Concepts Queue Spaces and Queues The DACS uses the Tuxedo queuing system to store processing requests that have been issued for example by a data monitor but have not yet been executed These process requests are stored as messages in disk queues Each queue holds requests for a certain service for example GAassoc sel1 or DFX recall where the service name matches the queue name A qu
164. ning or messaging which is based on the communicating concurrently Tuxedo reliable queuing service UJ Ordered messages are delivered in This requirement is fulfilled via libipc the order they were sent FIFO messaging which is based on the Tux edo reliable queuing service N as Scoped messages sent and received This requirement is fulfilled via libipc by one interactive user are not messaging which is based on the crossed with messages sent and Tuxedo reliable queuing service Mes received by another user sage scoping is supported via queue names that are scoped to application name and session number Multiple analysts running a single machine would have to run in their own ses sions In general the operational model is one analyst per machine and it is up to analysts to manage their own sessions within a single machine Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements TABLE 10 TRACEABILITY OF FUNCTIONAL REQUIREMENTS MESSAGE PASSING CONTINUED Requirement How Fulfilled 175 Point to point There is a single All messaging is point to point but sender and a single receiver for each with the required asychronous deliv message The DACS need not sup ery wherein the Tuxedo queuing sys port broadcast or multicast although tem is
165. not shown in Figure 13 but this function was described above as the generalized processing server tuxshell processes 6 and 9 in Figure 12 on page 29 3 The stand alone configuration is a system configuration decision based largely upon the notion of one analyst one machine DACS for Interactive Processing could be distributed over a set of workstations through configuration changes Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design analyst review Interactive analyst Tool IPC request interactive FK manager result FK interactive IPC broadcast Tuxedo session ueues ur manager 4 IPC Start client ja FK m interactive client FK computation and image display XfkDisplay Tuxedo queues transactions process monitoring and events FIGURE 13 DATA FLOW OF THE DACS FOR INTERACTIVE PROCESSING INTERFACE DESIGN This section describes the DACS interface with other IDC systems external users and operators Interface with Other IDC Systems The DACS controls Automatic Processing by initiating and managing pipeline pro cessing sequences The DACS relies upon the Continuous Data Subsystem to acquire new sensor data so that new processing time intervals can be generated Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUME
166. nterval m and birdie m tuxpad operate admin schedule it quinfo and msg window The following paragraphs describe the design of these units including any con straints or unusual features in the design The logic of the software and any appli cable procedural commands are also provided Data Monitor Servers The DACS data monitor servers satisfy system requirements to monitor data avail ability to initiate automated pipeline processing as the availability criteria are met Figure 16 The data monitor servers tis server tiseg server ticron server tin server and WaveGet server share the following general design features m Initiate a processing cycle when called by scheduler m Apply the availability criteria using the database and create or update data intervals inserting or updating rows in the interval or request table depending on the availability and timeliness of the data being assessed m Enqueue a message into a Tuxedo queue for 1 each new interval created with state queued and 2 each existing interval for which the state is updated from skipped to queued to initiate processing of an automated pipeline Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design W m Return an acknowledgment of completion of the processing cycle to scheduler by sending a SETTIME command to scheduler perform an enqueue command to the schedule
167. nto the interval table and a Tuxedo queue to initiate network processing Figure 21 The length of the intervals is nominally set to 20 minutes but this parameter and other parameters are user configurable Network processing is performed several times at successively greater time delays from the current time to produce the various bulletin products of the IDC To main tain the delay in processing a setback time is used The bottom portion of Figure 19 on page 60 shows the setback criterion used by ticron server yellow bricks new intervals see Setback Time in Figure 19 on page 60 The effect of apply ing this criterion is that network processing in the SEL1 SEL2 and SEL3 pipelines maintains constant delays currently 1 hour 20 min 5 hours 20 min and 11 hours 20 min respectively relative to the current time Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design a scheduler v Main Driver Reschedule D user parameters interval Db operations timestamp Update timestamp Determine next start time 3 4 Break into Compute multiple target 1 intervals size intervals L 5 Write and send intervals one transaction SEL1 2 3 FIGURE 21 TICRON_SERVER DATA FLOW tin
168. number of overlapping channels and percentage threshold is defined by the user parameters Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 4 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design a scheduler 4 affiliation interval timestamp Db operations A A D user parameters Main Driver wfdisc Vv M candidate Check intervals data availability skipped intervals created intervals Apply coverage algorithm M2 intervals one transaction database commit database rollback FIGURE 18 TIS SERVER DATA FLOW Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design The data coverage algorithm accumulates the number of seconds of overlapping channels for each station and then calculates a coverage percentage The coverage percentage is compared to the user specified threshold value and if sufficient data are found a new interval is created and stored in memory M2 in Figure 18 The new interval state is set to queued A message containing information about the interval is enqueued into a Tuxedo queue that initiates pipeline processing If the threshold is not exceeded interval state is set to skipped and the interval is not queued for processing Figur
169. o configuration Binary UBB D2 tmunload Parse output to Main Driver stdout stderr update status of machines groups and servers tmadmin stdout stderr Display messages logging D1 user parameters tmp tuxpad pid FIGURE 29 TUXPAD DESIGN 5 N Boot and shut down Tuxedo Admin BSBRIDGE BBL BRIDGE DBBL operate admin 6 Boot and shut down machines groups and servers in correct order tmboot tmshutdown 7 Partition and clean machine bulletin board cleanup tmadmin pclean and bbclean Display queue graph stdout stderr L stdout stderr Manage DACS scheduler tmp D3 tuxpad pid stdout stderr qinfo V schedule it J Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design operate admin is a separate or compartmentalized tuxpad function that performs the shut down tmshutdown and boot tmboot of the Tuxedo administrative servers BSBRIDGE BRIDGE BBL DBBL for all Tuxedo DACS machines process 5 in Figure 29 operate admin boots the machines in the order they appear in the UBB configuration and shuts them down in the reverse order msg window provides a GUI for the display of messages warnings and errors that are produced by tuxpad schedu
170. o this server transaction Set of operations that is treated as a unit If one of the opera tions fails the whole transaction is considered failed and the sys tem is rolled back to its pre transaction processing state tuxpad DACS client that provides a graphical user interface for common Tuxedo administrative services ubbconfig file Human readable file containing all of the Tuxedo configuration information for a single DACS application 1 Tuxedo clients send and receive messages to and from a server queue messages to a Tuxedo queue or remove messages from a Tuxedo queue 2 Tuxedo servers are booted and shut down by the DACS and may run on a remote machine Servers may be supplied by the Tuxedo distribution upper case names or by application programmers lower case names Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 1 Overview This chapter provides a general overview of the DACS software and includes the following topics m Introduction m Functionality m Identification m Status of Development m Background and History m Operating Environment Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 1 Overview INTRODUCTION The software of the IDC acquires time series and radionuclide data from stations of the International Monitoring System IMS and other locations These data a
171. ocessing after one and only one successful run The decision to introduce a reliable queuing system addresses many of the fault tolerance requirements because all processing is managed through reliable disk queues under transaction control The DACS is designed around the Tuxedo dis tributed processing COTS product to satisfy the requirements to support automatic failover in the case of hardware and software failures The decision to use Tuxedo for the message passing requirement for the Interactive Tools was based upon the preference to have a unified distributed processing solu tion for both Automatic Processing and Interactive Processing In addition the Interactive Tools rely upon some limited access to Automatic Processing for on the fly signal processing Such a requirement further justifies a single unified dis tributed processing solution However a Tuxedo implementation for Interactive Processing could be considered an overly heavy weight solution because the fea tures of the COTS product far surpass the fairly limited message passing and inter active session management requirements Programming Language Each software unit of the DACS is written in the C programming language unless otherwise noted in this document The tuxpad script is implemented using the Perl scripting language Global Libraries The software of the DACS is linked to the following shared development libraries libaesir libgdi libipc libpar libstdtime a
172. of DACS CSCs for Interactive Process ing The data flow among the various processes and DACS is described in Data Flow Model on page 48 The messages exchanged between the Interactive Tools all libipc messages pass through Tuxedo disk queues Storing messages within a disk based Tuxedo queue ensures that the messaging is asynchronous because the message send and receive are part of separate queuing operations and transac tions For example under analyst control a in Figure 15 on page 53 a message sent from ARS process 2 in Figure 15 on page 53 intended for XfkDisplay is enqueued by ibipc into the XfkDisplay queue Asynchronous messaging allows for the possibility that XfkDisplay may not be currently running in the analyst s interac tive session ibipc uses Tuxedo based events memory based broadcast messages to signal dman for each message send or receive within the interactive session processes 3 and 1 in Figure 15 on page 53 The Tuxedo server TMUSREVT not shown in Figure 15 processes all user events for Tuxedo clients and servers The event processing includes notification and delivery of a posted event for example from ARS to all clients or servers that subscribe to the event or event type for example dman dman tracks the processing status of all clients within the analyst s interactive session via libipc dman executes XfkDisplay on demand if it is not already running process 4 in Figure 15 on page 53 dman uses the proce
173. of failure This single point of failure can be masked by migrating the scheduling queue server to an existing machine that is a single point of failure such as the database server or file log ging server The rewind mode is also partially addressed by operator assisted interval reprocessing by WorkFlow Full automatic reprocessing could be provided by the WorkFlow reprocessing model by augmenting the existing scheme to support reprocessing of all intervals or all inter vals of a particular class for a specified range of time However this feature would have to be consistent with the fact that application software must be able to repeat the processing steps Furthermore reprocessing is also subject to IDC policy decisions particularly where intermediate or final processing results have been published or made available as IDC prod ucts v v Chapter 5 Requirements TABLE 9 Io IDC DOCUMENTATION Software TRACEABILITY OF FUNCTIONAL REQUIREMENTS AVAILABILITY MANAGEMENT Requirement The DACS shall be capable of starting and stopping any configured user level process on any computer in the IDC LAN The DACS shall provide an interface to an operator that accepts process control commands A single operator interface shall allow process control across the network How Fulfilled Any DACS process can be started or stopped by the operator using tuxpad or a Tuxedo administration utility The DACS shal
174. ommands exist to initialize or re initialize the schedule service table O1 in Figure 24 on page 79 add new services delete existing services stall services unstall services display the current schedule service table and enable or disable the scheduler server s ability to call services The sched ule commands sent by schedclient are passed to the scheduler server via the tpac all Tuxedo API function for asynchronous service calls The string based commands are packed into a Tuxedo STRING buffer which is interpreted by sched uler The scheduler server does not return any data to schedclient but with the show command scheduler enqueues the service list in text form to the result queue step 5 in Figure 24 on page 79 schedclient polls the result queue waiting for the show command result step b in Figure 24 on page 79 In practice schedclient commands are handled by the schedule it GUI which is part of the tuxpad operator console tuxpad in Figure 24 on page 79 Error States scheduler can fail during start up if the user parameter file is non existent or con tains invalid settings Start up errors are recorded in the local Tuxedo ULOG file of the machine hosting the failed scheduler server In general the scheduling system is designed to continue operation during system failures such as a Tuxedo queuing system error which may only be transient in nature Because the schedule state is stored in a reliable disk queue failures wil
175. on of the entire distributed application configuration These arrays are central to all supported tuxpad opera tions M1 M4 in Figure 29 on page 112 The arrays are updated automatically on a user specified interval or more typically on demand following operator selection of the refresh R GUI button The arrays are updated through a parsing of the tmadmin command which outputs the current state of the distributed application The current state of the DACS is returned displayed on the tuxpad main display with the presentation organized by user selection of the GUI provided scrolled ver tical lists of machine group or server A color coded number is displayed adjacent to each listed machine group or server The number represents the number of ele ments running number of machines groups and servers where a value other than 0 or 1 is most relevant for servers which can be configured to run many rep licated copies The color red denotes shut down green denotes running and yel low represents running where the number running is not the configured maximum tuxpad is designed to drive all operator tasks for system start up and system main tenance Initial system booting system shut down and all intermediate machine group or server booting and shut down are handled via tuxpad driven tmboot and tmshutdown commands The commands are built on the fly and target specific machines group or servers selected by the user through the tuxpad GUI A
176. ore ana IDC DOCUMENTATION Software lyst review 2 Sequence of IDC pro cesses controlled by the DACS that either produce a specific product such as a Standard Event List or perform a general task such as station processing PS Processing server Q qspace Set of message queues grouped under a logical name The IDC application has a primary and a backup qspace The pri mary qspace customarily resides on the machine with logical reference LMID QHOST R real time Actual time during which something takes place run 1 Single usually continuous execution of a computer program 2 To execute a computer program S SAIC Science Applications International Cor poration Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Scheme Dialect of the Lisp programming lan guage that is used to configure some IDC software script Small executable program written with UNIX and other related commands that does not need to be compiled SEL1 Standard Event List 1 S H I bulletin cre ated by total automatic analysis of con tinuous timeseries data Typically the list runs one hour behind real time SEL2 Standard Event List 2 S H I bulletin cre ated by totally automatic analysis of both continuous data and segments of data specifically down loaded from sta tions of the auxiliary seismic network Typically the list runs five hours behind rea
177. pable of displaying all intervals cur rently managed by the workflow management How Fulfilled This requirement is fulfilled by tux pad dman qinfo and WorkFlow The DACS shall provide these displays simultaneously to 1 user although efforts should be made to accommo date 10 additional users Any number of users logged in as the Tuxedo user can access tuxpad Typically dman would only be accessed by the analyst that is using the interactive session that dman is managing WorkFlow can be viewed by any number of users The DACS shall continue to function as a system monitor in the event of defined hardware and software fail ures The DACS reliability and contin uous operations requirements are described in Reliability on page 134 TABLE 13 TRACEABILITY OF FUNCTIONAL RELIABILITY Requirement The DACS shall deliver each message exactly once after the successful posting of the message by the send Ing process This requirement is fulfilled via the DACS ability to survive most failure conditions as discussed previously REQUIREMENTS How Fulfilled This requirement is fulfilled via the Tuxedo reliable queuing service which uses transactions to ensure that each message is delivered only once Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requ
178. parent The message passing service shall provide an API to the interactive processing programs Each attribute is specified in the following subparagraph Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 5 Software Requirements 17 1 Reliable messages are not lost and no spurious messages are created A consequence of reliable messages is that the same message may be delivered more than once if a process reads a message crashes restarts then reads a message again 17 2 Asynchronous sending and receiving processes need not be running or communicating concurrently 17 3 Ordered messages are delivered in the order they were sent FIFO 17 4 Scoped messages sent and received by one interactive user are not crossed with messages sent and received by another user 17 5 Point to point There is a single sender and a single receiver for each message The DACS need not support broadcast or multicast although sending processes may simulate either by iteratively sending the same message to many receivers one to many Similarly many to one messaging is supported by multiple point to point messaging that is receiving processes may receive separate messages from many senders 17 6 Location transparency sending and receiving processes do not need to know the physical location of the other All addressing of messages is accomplished through logical names 17 7 Application pro
179. passes some of the information previ ously extracted from the message to the child process step 2 The information passed to the child process typically designates a data interval on which the service a is to be performed The child process processes the data and signals its comple tion to the processing server step 3 If the data were processed successfully a message is placed provisionally in queue B step 4 The concluding step 5 commits finalizes the changes to the source queue A and the destination queue B If a failure occurs on any of the steps 0 through 5 the entire transaction is rolled back which means that the provisional queueing operations in step O and step 4 and any other change in the state of the system for example in the database are Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design reversed The rollback applies not only to failures of the actual processing by the child process but also to the queueing operations the actions of the processing server and to the final commit Figure 10 provides further detail on the interface between the message queue and the processing server It shows that a forwarding agent mediates between the two Only the forwarding agent a Tuxedo supplied server called TMQFORWARD described in Application Servers on page 43 handles the queue operations Fig ure 10 omits the transactional compo
180. pecifies the DACS reliabil database server and file logging ity and continuous operations server machines which are accepted requirements single points of failure Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 5 Software Requirements TABLE 10 TRACEABILITY OF FUNCTIONAL REQUIREMENTS MESSAGE PASSING Requirement How Fulfilled 17 The DACS shall provide a message The message passing requirements passing service for the interactive are fulfilled by the DACS ibipc API processing system The message Location transparency messaging passing service shall have the across machine or via the LAN is fully attributes of being reliable asynchro supported but not generally used at nous ordered scoped point to the IDC point and location transparent The message passing service shall provide an API to the interactive processing programs Each attribute is specified in the following subparagraph T1453 Reliable messages are not lost and This requirement is fulfilled via libipc no spurious messages are created messaging which is based on the consequence of reliable messages is Tuxedo reliable queuing service that the same message may be deliv ered more than once if a process reads a message crashes restarts then reads a message again 22 N Asynchronous sending and receiving This requirement is fulfilled via libipc processes need not be run
181. pplication provides tools for a human analyst to refine and improve the event bulletin by interactive analysis ASCII American Standard Code for Informa tion Interchange Standard unformatted 256 character set of letters and num bers B backup component System component that is provided redundantly Backups exist on the machine group server and services level Appropriate backups are config ured to seamlessly take over processing as soon as a primary system component fails or becomes unavailable Glossary beam 1 Waveform created from array station elements that are sequentially summed after being steered to the direction of a specified azimuth and slowness 2 Any derived waveform for example a fil tered waveform Beamer Application that prepares event beams for the notify process and for later analy sis boot Action of starting a server process as a memory resident task Booting the whole application is equivalent to boot ing all specified server processes admin servers first application servers second bulletin Chronological listing of event origins spanning an interval of time Often the specification of each origin or event is accompanied by the event s arrivals and sometimes with the event s waveforms C CCB Configuration Control Board CDE Common Desktop Environment child process UNIX process created by the fork rou tine The child process is a snapshot of
182. r DACS application server tuxshel that is the interface between the DACS and the automatic processing system It executes application programs as child processes GUI Graphical User Interface H host Machine on a network that provides a service or information to other comput ers Every networked computer has a hostname by which it is known on the network Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software hydroacoustic Pertaining to sound in the ocean IDC International Data Centre infrastructure Foundation and essential elements of a system or plan of operation instance Running computer program An individ ual program may have multiple instances on one or more host computers IPC Interprocess communication The mes saging system by which applications communicate with each other through libipc common library functions See tuxshell J Julian date Increasing count of the number of days since an arbitrary starting date L LAN Local Area Network launch Initiate spawn execute or call a soft ware program or analysis tool Distributed Application Control System DACS Glossary V LMID Logical machine identifier the logical reference to a machine used by a Tuxedo application LMIDs can be descriptive but they should not be the same as the UNIX hostname of the machine M Map Application for displaying S H I events
183. r command queue see Figure 17 on page 56 tuxpad D user parameters schedule it l 1 Y scheduler wfdisc request interval query insert update gt Db operations data monitor Tuxedo queues DFX GA REB EVCH dispatch and so on FIGURE 16 DATA MONITOR CONTEXT Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 4 Software W Detailed Design scheduler cM DER SIUE S l 1 l l l 1 d fl x DENN GNE 6 EE mg WaveGet_server tis_server tiseg_server pm tin server sched command i SETTIME commands acknowledge last scheduler call and schedule next FIGURE 17 DATA MONITOR ACKNOWLEDGEMENT TO SCHEDULING SYSTEM All of the data monitors are database applications and all monitoring is based upon periodic polling of the database to check for availability based on varying cri teria Different data monitors are used to create different classes of intervals User parameters define the queries used to check for the availability of data that each data monitor server is designed to assess tis server creates detection processing intervals based upon the availability of new continuous station data tiseg server creates detection processing intervals based upon the availability of new auxiliary seismic station dat
184. raphs The DACS interfaces with the Database Management System through the GDI with the operator through an operator interface with the Interactive Processing through a messaging interface and with the host operating system through sys tem utilities The exact data model exported by the Database Management System is critical to the DACS 42 43 44 45 The DACS shall interface with the ORACLE database through the GDI The DACS shall read from the wfdisc table The DACS shall assume wfdisc table entries will follow the data model described in IDC5 1 1Rev2 The DACS shall insert and update entries in the interval table which is used as a monitoring point for the Automatic Processing system As part of reset mode the DACS may delete or alter entries in the interval table to force repro cessing of recent data elements Purging of the interval table is left to pro cesses outside the DACS The DACS shall interface with the wfdisc table of the ORACLE database The software systems of the Data Services SCSI shall acquire the time series data and populate the wfdisc table The DACS shall assume a particular model for wfdisc record insertion and updates The DACS shall be capable of accepting data in the model described by the following subparagraphs 45 1 The IDC Continuous Data system acquires seismic hydroacoustic and infrasonic waveforms from multiple sources The data quantity is 5 10 gigabytes of data per day arriving in
185. rations that cannot be successfully completed typically result in message redirection to an error queue These messages are then automatically requeued for reprocessing attempts by recycler server process 11 in Figure 12 on page 29 Operator Console The operator console function provides an interface for controlling the DACS pro cess 8 in Figure 12 on page 29 This function is implemented by tuxpad a conve nient centralized operator console that can be used to control all aspects of the running distributed application Interactive Processing The DACS provides several key functions for Interactive Processing including asyn chronous message passing session management for Interactive Tools and access to Automatic Processing applications The Interactive Tools are used by an analyst 2 Tuxedo queue message loss or queue corruption could occur if the physical disk drive hosting the qspace failed Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design see Figure 13 within an interactive session that is typically hosted by a single workstation Tuxedo is thus configured to run stand alone on the single worksta tion which results in all the DACS processes queuing and Automatic Processing being isolated on this machine The stand alone machine is still connected to the operational LAN with full access to the database server and so on The an
186. re passed through a number of automatic and interactive analysis stages which cul minate in the estimation of location and in the origin time of events earthquakes volcanic eruptions and so on in the earth including its oceans and atmosphere The results of the analysis are distributed to States Parties and other users by vari ous means Approximately one million lines of developmental software are spread across six CSCIs of the software architecture One additional CSCI is devoted to run time data of the software Figure 1 shows the logical organization of the IDC software The Distributed Processing CSCI technically includes the DACS How ever in practice the DACS is synonymous with the Distributed Processing CSCI The DACS consists of the following CSCs m Application Services This software consists of the SAIC supplied server and client processes of the DACS m Process Monitoring and Control This software consists of scripts and GUIs that control the way the DACS operates m Distributed Processing Libraries This software consists of libraries common to the DACS processes m Distributed Processing Scripts This software consists of a few utilities that create and manage certain aspects of the DACS Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 1 Overview WV IDC Software Automatic Interactive Distributed Data Data System Da
187. riggers The database triggers update wfdisc end time values in an efficient manner saving similar queries which would otherwise be submitted against the wfdisc table Similarly the DACS must share the same software environment as the rest of the IDC While this environ ment is not exactly defined at this time it is likely to include Solaris 7 or 8 ORACLE 8 x X Window System X11R5 or later TCP IP network utilities Fulfilled Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 5 Requirements TABLE 16 TRACEABILITY OF SYSTEM REQUIREMENTS CONTINUED Requirement How Fulfilled 59 The DACS shall adhere to ANSI C Fulfilled POSIX and SQL standards 60 The DACS shall use common UNIX The DACS limits vendor specific utilities for example cron sendmail products to Tuxedo The DACS and system calls for example sock makes use of public domain software ets exec whenever possible to take as Perl Tk Perl with Tk GUI advantage of widespread features bindings As such the requirement is that shall aid portability Vendor spe fulfilled cific UNIX utilities shall be isolated into separate modules for identifica tion and easy replacement should the need arise 61 The DACS shall implement middle This requirement is fulfilled to a rea ware layers to isolate t
188. rnal interfaces is shown in each of the data monitor server data flow diagrams Figures 18 23 em DACS v IDC DOCUMENTATION Chapter 4 Software W Detailed Design Error States The data monitor servers can handle three primary failure modes a spontaneous data monitor server crash a database server failure and a Tuxedo queuing failure Attempts are made to automatically recover from each failure mode Spontaneous data monitor server crashing normally results from a previously unex ercised program defect or a system resource limit Tuxedo automatically restarts the data monitor servers upon server failure Server failures due to system resource limitations for example swap or virtual memory exceeded can be more easily recovered from than those from program defects because such a resource error may be transient or resolved by operator intervention In this case the failure recovery is automatic for the data monitor server Server failures due to a previ ously unknown program defect are typically more problematic because although the program reboot is automatic the program defect is often repeated resulting in an endless server reboot cycle The data monitor servers accommodate a variety of database server error condi tions If the database server is unavailable the data monitor server attempts to reconnect for a maximum number of times during the current interval creation cycle before giving up This cycle is repeated durin
189. rocessing the DACS can take several minutes to com pletely boot across the LAN but the time does not exceed 10 minutes For Interactive Processing the DACS boots in approximately 30 seconds Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 9 TRACEABILITY OF FUNCTIONAL REQUIREMENTS AVAILABILITY MANAGEMENT CONTINUED Requirement How Fulfilled 13 The DACS shall detect process fail The DACS can be configured to ures within 30 seconds of the failure detect server and machine failures and server hardware failures within well within the required specification 60 seconds The configuration is via the Tuxedo ubbconfig file 14 The DACS shall start new processes Same as above and replace failed processes within five seconds This time shall apply to both explicit user requests and the automatic detection of a failure 15 The DACS shall be capable of manag The DACS can scale to the required ing starting monitoring terminat specification and beyond ing 50 automated and interactive processing programs on each of up to 50 computers 16 The DACS shall continue to function DACS continues to function or as an availability manager in the can be configured to function in the event of defined hardware and soft of most process and system fail ware failures Reliability on ures Exceptions include failure of the page 134 s
190. rver or optionally the interval router server can enqueue the interval data into one queue from a set of possible queues as a function of the interval name process 4 System operators can use the Work Flow application to monitor the progress of Automatic Processing process 9 which renders database time interval states as colored bricks 1 tuxpad is the most typical interface to schedclient Distributed Application Control System DACS IDC 7 3 1 June 2001 v LOOC Sova uia1sAs J011u05 uome2i ddy pojnqiasig L Z 2GI 4 n et Ll BS o RO tuxpad scheduler tis server P Db operations 3 DFX 1 errors WorkFlow 6 7 tuxshell gt dbserver recyler server Retry return code l DB operationis waveforms Tuxedo failed queues StaPro operations Tuxedo queuing transactions process monitoring runs on every host FIGURE 14 DATA FLOW OF DACS CSCs FOR AUTOMATIC PROCESSING S z D 2 2 gt 4 gt 4 gt IDC DOCUMENTATION Software Chapter 4 Detailed Design The Tuxedo queue forwarder TMQFORWARD passes the interval data to tuxshell as part of a service call processes 5 and 6 in Figure 14 on pag
191. rward to next queue or tuxshell Compound tuxshell Return success fail FIGURE 25 TUXSHELL DATA FLOW message Build command to execute Execute command A Monitor command subject to timeout Retry or fail db serve Db interval failed queue uSisaq A 91 M3jJIOS ip 21931d9eu5 n n c z m z 4 gt 4 gt IDC DOCUMENTATION Chapter 4 Software Detailed Design Input Processing Output Figure 25 on page 85 shows tuxshell s data and processing flow tuxshell receives input from user defined parameter files and IPC messages through a Tuxedo ser vice call The Tuxedo service call originates from a TMQFORWARD server or another tuxshell processes O and 1 in Figure 25 on page 85 The parameter files and IPC message specify all processing details for a given instance of the tuxshell server Details include the name of the application program to be executed and managed various keys and values used in the construction of the application pro gram command line database state values processing sequencing values and the name of the database service used for database updates The user parameters are used to execute and manage the application program and forward retry or decl
192. ry station waveform requests and initiates actions to acquire the requested wave forms For each interval created or updated a data monitor also sends a processing request message to interval router process 3 in Figure 12 on page 29 or depend ing on configuration bypasses interval router and enqueues the message s directly in Tuxedo queues The Tuxedo queue messages seed the DACS with time interval based pipeline processing requests which are managed by the DACS System Scheduling The system scheduling function provides a centralized server scheduler for auto matic data monitor calls and a tool for centralized management of the scheduling system process 1 in Figure 12 on page 29 The DACS data monitor application servers for example tis server WaveGet server await service calls from scheduler Distributed Application Control System DACS o June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Architectural Design to perform or complete their data monitoring function and return acknowledg ments to scheduler following completion of their service cycle The scheduling sys tem can be controlled by the user via the schedclient application The tuxpad GUI operator console provides a convenient interface to schedclient Pipeline Processing The pipeline processing function provides for reliable process sequencing process 6 in Figure 12 on page 29 and is implemented by the generalized processing server tu
193. s a library and is therefore not explicitly started or stopped but is instead embedded or linked into client applications dman is started by the analyst either manually via the desktop GUI environment such as the CDE or via the analyst log application The dman GUI is controlled by the analyst dman is typically stopped via a script which is bound to a CDE button or dman can be terminated by selecting the dman GUI s exit menu option birdie is started controlled and stopped by an operator or via a script that embeds birdie commands within it Interfaces The exchange of data and control among ibipc and its clients including dman has been described in the sections libipc dman and birdie on page 100 Input Processing Output on page 105 and Control above birdie is basically a driver for libipc and it exchanges data with and other ses sion clients via the ibipc The operator provides command line input which is interpreted by birdie and included within the ibipc API calls Error States The ibipc implementation tests for many error conditions Example errors include non existent QSPACE environment variables bad queue names and attempts to send or receive messages when not attached to an interactive session The errors are returned back to the calling client via API return codes Error detection and detailed error codes and messages are accessible via the ipc get call see Table 4 on p
194. s for API level mes sage passing between applications in the Interactive Processing CSCI as well as a GUI based application for the monitoring of all interactive applications and mes sages within an interactive session This latter CSC includes message based demand execution automatic execution and user assisted execution and termina tion of interactive applications within the session Figure 14 shows the data flow among the DACS CSCs for Automatic Processing Tuxedo provides the reliable distributed processing infrastructure for DACS includ ing reliable queuing transactions and process monitoring bottom bar in Figure 14 DACS is controlled by the system operator through the centralized operator GUI tuxpad a Operator control includes complete DACS bootup or shutdown bootup and shutdown on a machine basis a process group basis or a process server basis control of the DACS scheduling system and monitoring of Tuxedo queues The DACS scheduling system is managed by schedclient process 1 which is used to send commands to the scheduling server scheduler process 2 The operational database is monitored by the DACS data monitor servers such as tis server process 3 in a recurring attempt to create processing intervals subject to data availability Confirmation of sufficient data results in new interval information that is inserted into both the database and Tuxedo queues The enqueues are either directly initiated by the data monitor se
195. s interface as a library for use by the developers of the Interactive Processing programs The library shall contain entry points to allow processes to register subscribe unregister send poll receive replay and delete messages The DACS shall offer several types of notification when new messages are sent to a process The API is specified in more detail in the following list 46 1 register connect to messaging system arguments specify logical name and physical location of process method of notification for waiting messages 46 2 subscribe specify types of messages to read argument lists message types to read 46 3 unregister disconnect from messaging system argument indicates whether to keep or discard unread messages 46 4 send send a message to another process by logical name arguments specify message type message data and return address of sender 46 5 poll request empty non empty status of incoming message queue 46 6 receive receive a message argument specifies message types to read 46 7 delete delete messages from queue argument specifies most recent or all messages 47 The DACS shall offer three types of notification of new messages none call back invocation or an interrupt The type shall be chosen by a process when it registers With none the process shall call the poll function to check on mes sage availability With callback invocation the process shall register a callback procedure to be executed when a message
196. scale to a system twice as large as the ini tial IDC requirements without a noticeable degradation in time to perform the DACS functions This requirement is fulfilled by Tux edo The DACS requires a capable UNIX system administrator for installation of the DACS components and sys tem level debugging of problems such as file system full insufficient UNIX privileges and network con nectivity problems This requirement is fulfilled although the DACS has matured to the point that a UNIX system administrator is not required for the majority of the DACS installation task The DACS shall be delivered with a System Users Manual that explains the operations and run time options of the DACS The manual shall also specify all configuration parameters of the DACS The DACS shall only require a user level prior understand ing of UNIX and Motif This requirement is fulfilled see IDC6 5 2Rev0 1 The DACS shall be delivered electron ically This requirement is fulfilled Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 16 TRACEABILITY OF SYSTEM REQUIREMENTS CONTINUED Requirement How Fulfilled 68 The DACS capabilities of workflow This requirement is fulfilled via the management and message passing DACS ability to survive most failure are ranked equally high in terms of conditions as discussed previously criticality
197. schedule it qinfo and msg window is primarily file based tuxpad updates machine group and server status by parsing the standard file output returned from a run of tmadmin schedule it and qinfo work along the same lines by parsing standard file output from schedclient and qmadmin respectively msg window updates the GUI message window with any new output written to the tmp tuxpad pid file by tuxpad schedule it or qinfo Data exchange within tuxpad schedule it and qinfo is based upon memory stores These memory stores maintain dynamic lists of machines groups and servers in the case of tuxpad queues in the case of qinfo and scheduled services in the case of schedule it Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design Error States tuxpad operate admin schedule it qinfo and msg window are for the most part front ends to the Tuxedo administrative commands and the schedclient DACS command These commands are generated in well known constructions and as such not many error states are directly associated with the scripts Error states resulting from exercising the scripts and options selectable within the GUI can and do result in errors The breadth of the error states is substantial because tuxpad controls and administers a distributed application The discussion of general system errors is beyond the scope of this document However the tuxpad m
198. server tiseg server 54 tiseg server 30 61 70 tlisten 38 tmadmin 46 tmloadcf 46 TMQFORWARD 23 31 44 51 TMQUEUE 44 TMS 43 TMS_QM 43 TMSYSEVT 44 tmunloadcf 46 TMUSREVT 44 transaction x description 22 transactional resource managers 9 transaction logs 45 tuxconfig 45 tuxpad x 32 49 110 control 118 error states 119 115 interfaces 118 tuxshell 31 83 control 88 error states 88 IDC DOCUMENTATION Software 86 interfaces 88 typographical conventions vii U ubbconfig 20 45 file x user logs 45 utility programs tmadmin qmadmin 46 W WaveGet_server 30 54 74 wfdisc 28 59 123 WorkFlow 31 49 93 control 99 error states 100 96 interfaces 99 workflow management requirements 131 traceability 153 Distributed Application Control System DACS June 2001 IDC 7 3 1
199. sg window GUI provides a convenient capture of error messages that can be used by the oper ator to initiate system remedies Operator Interventions on page 65 Mainte nance on page 89 and Troubleshooting on page 101 of IDC6 5 2RevO 1 can be used as a source for debugging the DACS error states DATABASE DESCRIPTION This section describes the database design database entity relationships and data base schema required by the DACS The DACS relies on the database for all aspects of interval creation updating and monitoring Management of the interval table involves access to several other database tables The DACS also reads and updates the request table Access to the database is made through ibgdi Database Design The entity relationship model of the schema is shown in Figure 32 The database design for the DACS is based upon the interval table where one interval record is created by the DACS for each Automatic Processing pipeline and for each defined interval of time The interval state column is updated by the DACS to reflect the processing state or pipeline sequence as each interval is processed Station based pipeline processing is driven by wfdisc records which are read to determine any newly acquired station channel waveforms that have not yet been processed Static affiliation records are read to map a network net name to a list of stations to map a station to a list of station sites to map a station site to a list of
200. shown to provide seamless fault tolerance within a Tuxedo queuing system Distributed Application Control Syst IDC 7 3 1 June 2001 em DACS v IDC DOCUMENTATION Chapter 4 Software Detailed Design Input Processing Output Figure 24 shows the design of the fault tolerant scheduling system The sequenced queuing transaction and execution steps are numbered The Tuxedo reliable queuing system provides the foundation for the reliable scheduling system The queuing system consists of the built in Tuxedo forwarding servers TMQFOR WARD as well as queues Q1 schedule Q2 sched command and Q3 sched result the scheduler state command and result queues respectively scheduler and schedclient input output and control flow are also shown in the figure How ever the figure does not show that both the scheduler servers and schedclient receive input from user parameters via libpar The scheduler state consists of the table of scheduled services and their next due time and other global state for example kick state When this due time is equal to current time scheduler issues a service call to a server advertising the required service These services are typically advertised by data monitors For example tis server advertises services tis tis late tis verylate and others The state table is encapsulated in one Tuxedo queue element that is reliably maintained in the state queue Q1 The queue structure is based upon a T
201. sing Tuxedo provides the core distributed processing environment in the DACS Tuxedo servers are present on all DACS machines This is shown at the bottom of Figure 6 where Tuxedo queuing transactions and process monitoring interact with all of the DACS functions The DACS monitors the database for data creates processing intervals characterized by the start times and end times subject to data availability process 2 and manages a pipeline sequence of processing tasks for each interval The data monitor servers are called on a recurring basis by a scheduling server process 1 which manages the scheduling and execution of the data monitor ser vices based upon user parameters and input from the data monitors New process ing intervals result in a new pipeline processing sequence that consists of one or more processing tasks The processing interval information is placed in both the database and Tuxedo queues Each processing interval contains a state field which is set by the DACS to reflect the current processing state of the interval System operators can monitor the progress of Automatic Processing by collectively moni toring a time window of intervals in the database Such process workflow monitor ing process 4 is conveniently presented through a GUI based display which renders time interval states as colored bricks Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 2 Archite
202. ssing of failed intervals is handled under operator control via the workflow monitoring utility WorkFlow Application failures and subsequent reprocessing is normally part of operator s investigation into the reason for the failure System errors which are often transient in nature are ideally automatically reprocessed The design of recycler server is influenced by the DACS system wide requirement to provide fault tolerance Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Distributed Application Control Syst IDC 7 3 1 June 2001 Detailed Design Input Processing Output dbserver dbserver receives input from user parameters and tuxshell application servers The user parameters define the ORACLE database account to which dbserver connects and forwards database statements tuxshell servers send dbserver the database update messages through an IPC message string The IPC input message consists of a fully resolved SQL statement that is simply submitted to the ORACLE data base server via a standard ibgdi call dbserver further uses a libgdi call to commit the database submission assuming a successful database update dbserver returns a success or failure service call return value to the calling tuxshel depending on the status of the database operation dbserver logs all database statements and progress to the user defined log file interval router interval router rece
203. ssing sta tus for each client to visually present to the analyst the status of that client dman monitors all message traffic within the interactive session via the ibipc events described above and can therefore keeps track and visually display the consump Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software W Detailed Design tion and creation of messages In addition dman can query the number of queued messages for any session client queue which is required at session start up to determine the absolute number of pending messages in each queue The relationship between ibipc and DACS for Automatic Processing is limited and nonexistent for the purposes of the dman client However libipc defines the struc ture of the IPC messages that are used within Automatic Processing and Interactive Processing as well as between these subsystems ARS relies upon Automatic Pro cessing for interactive recall processing such as DFX based Beam on the Fly BOTF processing Recall processing depends upon a standard ibipc based mes sage send by ARS to the BOTF queue which is configured within the interactive session queuing system processes 2 and 5 in Figure 15 on page 53 The TMQ FORWARD tuxshell configuration for managing Automatic Processing applications works in a similar but not identical manner to DACS for Interactive Processing processes 5 7 in Figure 15 on page 53 TMQFORWARD
204. st Don e a el Main Driver schedule_it service list schedclient stall unstall selected services kick scheduler initilize system service list stdout stderr FIGURE 31 SCHEDULE_IT DESIGN Input Processing Output sched command sched result sched command sched result tmp De tuxpad pid tuxpad receives input from user parameters the Tuxedo and DACS administrative commands and clients that it executes and from the user via manipulations and selections on the GUI User parameters define various optional user preferences the Tuxedo master host THOST and the primary Tuxedo queuing server QHOST The user parameters also include pointers to all standard system vari ables for example IMSPAR which are required for successful execution of the Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software Detailed Design Tuxedo and DACS commands Machine group and server booting and shut down carried out via the tmadmin command must be executed on the THOST and as such tuxpad must also be run on the THOST During tuxpad initialization internal arrays of configured machines groups serv ers and services are created by parsing the output of the tmunloadcf command This command produces returns an ASCII text versi
205. t Tuxedo queue or to the next tuxshell depending on user parameters process 6 in Figure 25 on page 85 Successful forwarding is always coupled with a database update via a Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design service call to the database server dbserver Forwarding failures in the form of a database service request failure Tuxedo enqueue failure or failure of the next tux shell service request result in a rollback of processing The rollback is Tuxedo queue based wherein the transaction opened by the calling TMQFORWARD is undone and the IPC message is returned to the source queue In the case of tux shell compound processing where one tuxshell is called by another tuxshell pro cess 7 in Figure 25 on page 85 the service requests are unwound by failure returns and the original transaction from the originating TMQFORWARD is rolled back Illegal exit codes application server timeout or abnormal process terminations are handled by tuxshell in a similar manner Basically processing intervals are either retried or declared failed subject to a user specified maximum number of retries process 5a in Figure 25 on page 85 Retry processing results in requeuing the interval into the source queue Error processing results in enqueueing the interval into the user specified failure queue tuxshell queuing operations are always cou pled with datab
206. t and end time the interval creation algorithm proceeds to confirm the required data counts as a function of time as described above and shown in processes 5 and 6 in Figure 22 on page 66 The data count query is user defined and is usually targeted at a logical processing group such as a network of seismic stations or a group of hydroacoustic sensors Complete inter vals are created along with an enqueue into a Tuxedo queue as one logical transac tion process 7 in Figure 22 on page 66 Following a successful complete interval creation and enqueue the end time of the interval is recorded in the timestamp table process 9 in Figure 22 on page 66 Incomplete intervals are created absent an enqueue process 8 in Figure 22 on page 66 tin server generates output to log files the database Tuxedo queues and the scheduler server Output to the database includes the complete and incomplete intervals and timestamp table updates Upon interval creation tin server queues the time interval information to a Tuxedo queue for initiation of a pipeline processing sequence on the time interval tin server completes its interval creation cycle by sending an acknowledgement sETTIME command to the scheduler server which results in rescheduling for the next tin server service call Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 4 Software Detailed Design WaveGet server Figure 23 on pag
207. t for instances of DFX ward can be displayed via tuxpad by deactivating or shutting down one Interactive Processing full interactive type of server and activating or boot processing ing another type of server s for example GA replaced by additional instances of DFX For Interactive Processing this requirement is fulfilled the same as above although this processing mode is not generally applicable to interactive processing 4 Operational Mode play For Automatic Processing the play Distributed Application Control System DACS IDC 7 3 1 June 2001 Automatic Processing full automatic processing automatic processing configured for normal operation Interactive Processing full interactive processing processing mode is usually initiated by starting the scheduling of the data monitor servers This is accomplished via the kick command to the sched uling system typically using the tux pad schedule it script For Interactive Processing the play processing mode is the default and automatic processing mode follow ing the DACS startup following ana lyst workstation boot v Chapter 5 Requirements TABLE 8 IDC DOCUMENTATION Software TRACEABILITY OF GENERAL REQUIREMENTS CONTINUED Requirement Operational Mode slow motion Automatic Processing partial auto matic processing automatic process ing configured to run only a core subset of automatic processing tasks Interact
208. ta are confirmed a complete interval is created and the interval information is enqueued into a queue If insufficient or no data are available after a defined amount of time a skipped interval is created The end time of the created interval whether complete or skipped defines the start time for the next interval s elapsed time measurement The updating of skipped intervals is based upon a user defined SQL query tin server does not supply time values for substitution in the SOL query Skipped intervals returned from the query are updated to complete inter vals and then enqueued into a queue process 2 The IDC software uses tin server to create intervals for Hydroacoustic Azimuth Estimation which are labeled with the class HAE Explicit classes and states of intervals are configurable for each data monitor This document lists the generic names which coincide often but not always with explicit names There are few no requirements for skipped interval processing for tin server The creation of skipped intervals is intended primarily to keep interval creation current relative to present time thereby avoiding interval gaps or the stalling of interval creation due to delays or failure of the processing that is moni tored by tin server Distributed Application Control System DACS IDC 7 3 1 June 2001 v Chapter 4 IDC DOCUMENTATION Software W Detailed Design a scheduler Reschedule v
209. ta for Processing Processing Processing Services Management Monitoring Software Station M Time series Application Continuous Bi Data System HE Automatic Processing Analysis Services Data Archiving Monitoring Processing Subsystem Data Bl Network Wi Bulletin Process M Message Database Performance Interactive Processing Monitoring Subsystem Libraries Monitoring Data and Control B Post B Interactive Distributed Retrieve ll Database W Distributed location Tools Processing Subsystem Tools Processing Processing Libraries Data B Event W Analysis Distributed Subscription lg Configuration Bl Data Screening Libraries Processing Subsystem Management Services Scripts Data Bl lime series Radionuclide Data Services Data Tools Analysis Utilities and Management Libraries Bl Time series Web System Libraries Subsystem Monitoring Data Operational Bl Authentication COTS Scripts Services Data Bl Radionuclide Environmental Processing Data Bl Atmospheric Transport FIGURE 1 IDC SOFTWARE CONFIGURATION HIERARCHY Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 1 Software Overview The DACS is the software between the operating system OS and the IDC appli cation software The purpose of this middleware is to distribute the application software over several machines and to control and monitor the execution of the various components of the application software Figure 2
210. tach libipc API call The phys ical location of the process is implied or transparent to the messaging sys tem The method of notification for waiting messages is not addressed by this function subscribe specify types of messages to read argument lists message types to read This requirement is fulfilled specifi cally for dman where libipc broad casts to dman upon any message send and receive among clients with the interactive session A general sub scribe mechanism is not provided by libipc and is apparently not required However Tuxedo supports general publish subscribe messaging Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements V TABLE 14 TRACEABILITY OF CSCI EXTERNAL INTERFACE REQUIREMENTS CONTINUED Requirement How Fulfilled 46 3 unregister disconnect from messag This requirement is fulfilled via the ing system argument indicates ipc detach libipc call although whether to keep or discard unread there is no mechanism to direct dis messages carding of unread messages by this function 46 4 send send a message to another pro This requirement is fulfilled via the cess by logical name arguments ipc sendQ libipc API call specify message type message data and return address of sender 46 5 poll request empty non empty sta This requirement is fulfilled via the tus of incoming message queue ipc pendingQ libipc AP
211. tems because the DACS imposes rela tively minor disk space requirements with the one exception being in server pro cess logging which shares significant disk space usage requirements with other CSCIs The DACS relies upon other system infrastructure and services including the LAN Network File System NFS the ORACLE database server and the mail server Commercial Off The Shelf Software The software is designed for Solaris 7 ORACLE 8i and Tuxedo 6 5 Distributed Application Control System DACS IDC 7 3 1 June 2001 o IDC DOCUMENTATION Software Chapter 2 Architectural Design This chapter describes the architectural design of the DACS and includes the fol lowing topics m Conceptual Design m Design Decisions m Functional Description m Interface Design Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Software Chapter 2 Architectural Design CONCEPTUAL DESIGN The DACS was designed to address requirements for reliable distributed processing and message passing within the IDC System The requirements include a number of processing and control features necessary for reliable automatic processing across a distributed network of computers The message passing requirements for Interactive Processing entail features for passing messages between Interactive Tools and managing the Interactive Tools session Figure 6 shows the conceptual data flow of the DACS for Automatic Proces
212. terval_router and recycler_server dbserver dbserver provides an interface between the ORACLE database and DACS servers All instances of tuxshell within the context of Automatic Processing operate on the interval or request table in the database through dbserver Any number of tuxshell servers send database update statements to one of several replicated dbservers In turn dbserver submits the database update to the ORACLE database server Figure 26 This setup has the advantage that fewer database connections are required Conservation of database connections and or concurrent database connections is at least an implicit system requirement and as such inclusion of dbserver within the pipeline processing scheme of DACS was an important design decision Calling client or server for example D user parameters tuxshell A string Send SQL statement to database interval request FIGURE 26 DBSERVER DATA FLOW Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 4 Software Detailed Design interval router The routing of messages to particular instances of a server for different data sources is supported by interval router process 5 in Figure 14 on page 50 Mes sage routing is manifest in message enqueues into a set of defined queues Each message route is a function of the message data where the user defined parame ters map data valu
213. the parent at the time it called fork IDC DOCUMENTATION Software click Select an element on the screen by posi tioning the pointer over the element then pressing and immediately releasing the mouse button client Software module that gathers and pre sents data to an application it generates requests for services and receives replies This term can also be used to indicate the requesting role that a software mod ule assumes by either a client or server process command Expression that can be input to a com puter system to initiate an action or affect the execution of a computer pro gram Common Desktop Environment Desktop graphical user interface that comes with SUN Solaris component One of the parts of a system also referred to as a module or unit Computer Software Component Functionally or logically distinct part of a computer software configuration item typically an aggregate of two or more software units Computer Software Configuration Item Aggregation of software that is desig nated for configuration management and treated as a single entity in the con figuration management process Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software configuration 1 hardware Arrangement of a com puter system or components as defined by the number nature and interconnec tion of its parts 2 software Set of adjustable parameters usual
214. the workflow man agement shall queue the interval and deliver the interval with the highest priority in the queue within 5 seconds of when the second program becomes available v IDC DOCUMENTATION Software Chapter 5 v Requirements TRACEABILITY OF FUNCTIONAL REQUIREMENTS WORKFLOW MANAGEMENT CONTINUED TABLE 11 Requirement The DACS shall be capable of queu ing holding 10 000 intervals for each active Automatic Processing program there can be up to fifty pro cesses per computer The size and composition of an interval is left as a detail internal to the DACS How Fulfilled This requirement is fulfilled via the Tuxedo reliable queuing service which can be scaled well beyond the specification The DACS shall continue to function as a workflow manager in the event of defined hardware and software failures The DACS reliability and con tinuous operations requirements are specified in Reliability on page 134 This requirement is fulfilled via the DACS ability to survive most failure conditions as discussed previously 1 This feature has been at least partially implemented but has not been sufficiently tested to date TABLE 12 TRACEABILITY OF FUNCTIONAL SYSTEM MONITORING Requirement REQUIREMENTS How Fulfilled 28 The DACS shall provide system moni This requirement is fulfilled in the toring for computer status process DACS through Tuxedo WorkFlow status wor
215. tion Description Processing Processing TEA Tuxedo MIB Usage Usage 8 12 ipc get groupQ Convenience function that extracts the IPC N all except N A group name given the specified IPC queue dman name IPC address 13 ipc get nameQ Convenience function that extracts the IPC Y all except N A name given the specified IPC queue name IPC dman address 14 ipc make address Returns the IPC address IPC queue name Y all except N A based upon the specified IPC group and name dman 1 libipc based clients that are relevant to the DACS for Interactive Processing include dman birdie ARS XfkDisplay Map PolariPlot SpectraPlot IADR and AEQ 2 The ipc checkQ call was intended to enable a check for pending queue messages without an actual message read dequeue Problems with the Tuxedo unsolicited message handling feature required an implementation change wherein polling is carried out via explicit calls to ipc receiveQ The implementation change included making ipc check always return true which in effect forces an ipc_receive call for every client based attempt to check for any new messages 3 The callbacks are added to the clients ibXt based Xtoolkit event loop in the form of a timer based event via the XtAppAddTimeOut libXt call uSiseg A ip 1931d u5 91 M31J0S a e a z m z 4 gt 4 2 IDC DOCUMENTATION Software Chapter 4 Detailed Design Control libipc i
216. tive utilities to refer to this server StaPro Station Processing application for S H I data Glossary station Collection of one or more monitoring instruments Stations can have either one sensor location for example BGCA or a spatially distributed array of sensors for example ASAR subsystem Secondary or subordinate system within the larger system T TI Class of DACS servers that form time intervals by station sensor for example tis_server TMS Transaction manager server transaction Set of operations that is treated as a unit If one of the operations fails the whole transaction is considered failed and the system is rolled back to its pre trans action processing state Tuxedo Transactions for UNIX Extended for Dis tributed Operations tuxpad DACS client that provides a graphical user interface for common Tuxedo administrative services IDC DOCUMENTATION Software tuxshell Process in the Distributed Processing CSCI used to execute and manage appli cations See IPC U ubbconfig file Human readable file containing all of the Tuxedo configuration information for a single DACS application UID User identifier UNIX Trade name of the operating system used by the Sun workstations V version Initial release or re release of a computer software component W waveform Time domain signal data from a sensor the voltage output where the voltage
217. to a Tuxedo queue to ini tiate pipeline processing of the intervals ticron server generates output to log files the database Tuxedo queues and the scheduler server Output to the database includes new intervals and updates to the timestamp table Upon interval creation ticron server enqueues a message contain ing interval information into a Tuxedo queue for initiation of a pipeline processing sequence on the interval ticron server completes its interval creation cycle by sending an acknowledgement SETTIME command to the scheduler server which results in rescheduling for the next ticron server service call tin server Figure 22 on page 66 shows data and processing flow for tin server tin server receives input from user defined parameter files the database and the scheduler server The parameter files specify all processing details for a given instance of the data monitor server Details include database account class name and size of tar get intervals to be created for example HAE WAKE GRP 10 minutes database queries and arrays of time and data count values These values define the time data threshold function for interval creation The user parameters are used to con struct the recurring database queries to determine the time and duration of the last interval created so that the start time and end time of the next interval can be established Initial database input to tin server includes timestamp and interval information use
218. to another Interactive Tool that is not currently running A DACS application tracks all message traffic through Tuxedo IPC events process 2 This application provides execution on demand for any Interactive Tool that has been sent a message and is not currently running in the analyst s interac tive session process 3 analyst review Interactive analyst Tool ARS interactive pe IPC request manager FK result FK interactive IPC broadcast Tuxedo session FK queues manager A Start client bd M 3 FK interactive client L p FK computation and image display ji Tuxedo queues transactions process monitoring and events FiGURE 7 CoNCEPTUAL DATA FLOW OF DACS roR INTERACTIVE PROCESSING Distributed Application Control System DACS IDC 7 3 1 June 2001 IDC DOCUMENTATION Chapter 2 Software W Architectural Design DESIGN DECISIONS All design decisions for the DACS are measured against and can be traced to the significant reliability requirements for Automatic Processing In general the DACS must provide fault tolerance and reliability in case of machine server and applica tion failures Fundamentally all processing managed by the DACS must be under transaction control so that processing tasks can be repeated for a configured num ber of retries declared failed following a maximum number of retries and for warded for further pr
219. ues transactions process monitoring FIGURE 12 DATA FLOW OF THE DACS FOR AUTOMATIC PROCESSING Distributed Application Control System DACS IDC 7 3 1 June 2001 v IDC DOCUMENTATION Chapter 2 Software W Architectural Design Data Monitoring The data monitoring function determines whether new data have become avail able or if a data condition or state is met If the monitored condition is met interval data are inserted into the database or existing rows are updated from interval state to interval queued and the interval information is inserted into Tuxedo queues pro cess 2 in Figure 12 on page 29 The data monitored in the database varies and several data monitor servers process the different types of data The component tis server monitors S H I data delivered from stations that have a continuous real time data feed tiseg server monitors auxiliary seismic station data ticron server monitors a timestamp value in the database which tracks the last time the server created a network processing interval The server forms net work processing intervals by time and so its primary function ensures the timely creation of the network processing intervals tin server monitors station processing progress by querying the state of a group of stations The server creates intervals based upon a trade off between data availability and elapsed time WaveGet server is a data monitor server that polls the request table for auxilia
220. ultaneously request any of the message services 22 The DACS shall be capable of queu This requirement is fulfilled via the ing holding 10 000 messages for Tuxedo reliable queuing service each process that is capable of receiv which can be scaled well beyond the ing messages specification 23 The size limit of each message is This requirement is fulfilled via the 4 096 4K bytes in length Tuxedo reliable queuing service which can be scaled beyond the spec ification 24 The DACS shall continue to function This requirement is fulfilled via the as a message passing service in the event of defined hardware and soft ware failures The DACS reliability and continuous operations require ments are specified in Reliability on page 134 1 Tools and messages reside on a single machine DACS ability to survive most failure conditions as discussed previously Interactive Processing is configured to run on a stand alone analyst machine all Interactive The maximum message size was increased to 65 536 bytes for the Interactive Auxiliary Data Request System This increase deviates from the model of passing small referential data between processes for both Interactive and Automatic Processing The change was made specifically for Interactive Processing This change encourages a re examination of the mes saging requirements message size message reliability and so on Distributed Application Control System DACS June 2001 ID
221. uxedo Fielded Markup Language FML message The state queue must be seeded with an initial scheduler table at least the first time the system is started This is accomplished by the schedclient init command This command empties the state queue if necessary and then enqueues the initial state into the state queue step 119 Subsequent system restarts can optionally issue another init command upon system bootup or they can choose to pick up exactly where the system left off because the last scheduler state remains in the state queue 10 schedclient shuts down the TMQFORWARD server prior to dequeuing the scheduler state from the state queue and then reboots the TMQFORWARD after enqueuing the new initial state into the sched uler state to complete the reset of the scheduling system The TMAQFORWARD server is shut down and started through Tuxedo tmadmin commands that are generated and issued by schedclient The TMQ FORWARD management is necessary to avoid race conditions whereby TMQFORWARD might dequeue the scheduler state before schedclient which would result in two or more scheduler states this would manifest in repeated and possibly conflicting scheduling calls to the data monitor servers Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 4 Detailed Design V tuxpad TMQFORWARD scheduler scheduler Q1 schedule Q2 sched command tis_s
222. ve from the twin needs to manage the processes in the system and to add an additional layer of fault tolerance The process management includes starting stopping monitoring communicating and tasking assigning work The fault tolerance includes reconfiguring Automatic Processing in the event of a computer failure The DACS shall provide the following modes in support of Automatic Processing and Interactive Processing shutdown stop fast forward play slow motion rewind and pause Table 6 describes the modes Fast forward mode catch up mode is configured to add more front end automatic processing when recovering from a significant time period of complete data outage or system down time Rewind mode allows for reprocessing of the most recent data by resetting the database to an earlier time Pause mode allows current processing tasks to finish prior to a shutdown of the system Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software TABLE 6 DACS OPERATIONAL MODES Requirement Number 1 Mode shutdown Automatic Processing no automatic processing DACS not running Chapter 5 Requirements Interactive Processing no interactive processing DACS not running stop no automatic processing all automatic processing sys tem status saved in stable storage all automatic pro cessing programs termi nated all DACS processes idle full interactive processing
223. ven a higher priority so that they receive priority ordering for the next avail able Automatic Processing program Within a single priority group the DACS shall manage the order among data elements by attributes of the data including time and source and by attributes of the interval including elapsed time in the queue The ordering algorithm shall be an option to the operator 25 5 Workflow management shall provide error recovery per data element for failures of the Automatic Processing programs Error recovery shall consist of a limited number of time delayed retries of the failed Auto matic Processing program If the retry limit is reached the DACS shall hold the failed intervals in a failed queue for manual intervention Distributed Application Control System DACS June 2001 IDC 7 3 1 IDC DOCUMENTATION Software Chapter 5 Requirements 25 6 The DACS shall initiate workflow management of each data element within 5 seconds of data availability 25 7 Workflow management shall deliver intervals from one Automatic Pro cessing program to the next program in the sequence within five sec onds of completion of the first program If the second program is busy with another interval the workflow management shall queue the inter val and deliver the interval with the highest priority in the queue within 5 seconds of when the second program becomes available 26 The DACS shall be capable of queuing holding 10 000 intervals for
224. ware Chapter 4 Detailed Design process groups servers and service names contained in the distributed Tuxedo DACS application in the main tuxpad window Mapping between logical machine names and actual machine names process group names and numbers server names and server identifiers and server names and service names are also dis played in the main tuxpad window The mappings are interpreted following a pars ing of the complete Tuxedo UBB configuration which is generated upon execution of the tmunloadcf command The mapping and current state of the machines groups and servers is kept current via parsing the output from the tmadmin com mand on a recurring and on demand basis tuxpad is also aware of the Tuxedo DACS notion of the backup or replicated server and is able to organize server dis play to conveniently present the status of both primary and backup servers Machine group and server booting and shut down are handled by tuxpad execu tions of the tmboot and tmshutdown commands Distributed Application Control System DACS IDC 7 3 1 June 2001 v Chapter 4 W Detailed Design 1 machines NT Km groups 3 servers L M4 services Get DACS configuration I I I I I I I I I I I I 4 IDC DOCUMENTATION Software Query for all configured DACS configuration machines groups 4 servers services Tuxed
225. xshell Pipeline process sequencing includes application software execu tion and management within a transactional context tuxshell receives interval messages within a TMQFORWARD transaction process 5 in Figure 12 on page 29 tuxshell extracts parameters from the interval message constructs an applica tion processing command line and then executes and manages the processing application process 9 in Figure 12 on page 29 The processing application is typi cally an Automatic Processing program for example DFX Processing failures result in transaction rollback and subsequent retries up to a configured maximum number of attempts Successful processing results in forwarding the interval infor mation via an enqueue into a downstream queue in the pipeline sequence The state of each interval processed is updated through server calls to the database application server dbserver process 7 in Figure 12 on page 29 Workflow Monitoring The workflow monitoring function provides a graphical representation of interval information in the system database in particular in the interval and request data base tables process 4 in Figure 12 on page 29 The monitoring function is imple mented by the WorkFlow program which provides a GUI based operator console for the purpose of monitoring the progress of all automatic processing pipelines in real or near real time The current state of all processing pipelines is recorded in the state column of each row i
226. y the DACS lastid R W This table contains identifier values which the DACS uses to ensure unique interval intvlid for each interval created Distributed Application Control System DACS IDC 7 3 1 June 2001 Chapter 2 W Architectural Design IDC DOCUMENTATION Software TABLE 1 DATABASE TABLES USED BY DACS CONTINUED Name Mode request R W Description This table contains the state of auxiliary waveform requests which the DACS uses to manage and initiate auxiliary waveform acquisition processing Optionally this table is used to create auxiliary station pipeline pro cessing intervals timestamp R W This table contains time markers which the DACS uses to track interval creation progress and to retrieve wfdisc endtime by station wfdisc R This table contains references to all acquired waveform data which the DACS reads to determine data availability for the creation of processing intervals 1 The IDC does not currently use this feature FUNCTIONAL DESCRIPTION This section describes the main functions of the DACS Figure 12 and Figure 13 on page 34 are referenced in the Functional Description Distributed Process Monitoring Reliable Queueing and Transactions Tuxedo provides the core distributed processing environment in the DACS Tuxedo servers are present on all DACS machines This is shown at the bottom of Figure 12 where Tuxedo queuing transactions and process monitorin
227. ycle The scheduling system was designed to be fault tolerant To achieve this objective the system is based upon the reliable Tuxedo disk queuing system The principal design decision involved the selection of either the database or the Tuxedo queuing system as a stable storage resource The database is a single point of failure The Tuxedo queuing system includes an automatic backup queuing with some limitations The state of the primary queuing system is frozen until recovery by operator intervention Such a scenario works for the DACS Automatic Process ing software where new interval creation and processing proceeds by using the backup DACS qspace even though unfinished intervals are trapped in the primary qspace until the primary queuing system is restored This scenario is not sufficient for the scheduling system because the scheduler state is frozen during queuing system failure and there is one and only one scheduling system state As such the Tuxedo queuing system is also a single point of failure for the scheduling system After weighing various trade offs a decision was made to base the scheduling sys tem on the Tuxedo queuing system Justifications for this decision included an implementation that appeared to be more straightforward and consistent with the rest of the Tuxedo based DACS implementation and some promise for achieving seamless fault tolerant in the future 9 Hardware solutions such as dual ported disk drives have been

Download Pdf Manuals

image

Related Search

Related Contents

User`s Manual Series F4  VERDICT User Manual - Snap-on  resina pp-70s - Poliformas Plásticas  Le coup de main: le paquet cadeau  - SGTechno  Ericsson SP4900HDR Printer User Manual  user manual rev3 - Spartan Controls    Manual de usuario Abrillantador de cubiertos  manual do proprietário  

Copyright © All rights reserved.
Failed to retrieve file