Home
PBS Pro User Guide
Contents
1. mtime Thu Aug 23 10 41 07 2003 Output_Path south u susan tns3d 089 Priority 0 gqtime Thu Aug 23 10 11 09 2003 Rerunable True Resource_List mem 300mb Resource_List ncpus 1 Resource_List walltime 00 20 00 session_id 2083 Variable List PBS_O _HOME u susan PBS_O_LANG en_US PBS _O LOGNAMF susan PBS_O PATH bin usr bin PBS_O_SHELL bin csh PBS_O _HOST south PBS_O_WORKDIR u susan PBS_O_SYSTEM Linux PBS_O_QUEUE workg euser susan egroup mrj queue_type E comment Job run on node south started at 10 41 etime Thu Aug 23 10 11 09 2003 PBS Pro 5 4 73 User Guide 6 1 6 List User Specific Jobs The u option to qstat displays jobs owned by any of a list of user names specified The syntax of the list of users is user _name host user_name host Host names are not required and may be wild carded on the left end e g x pbspro com user_name without a host is equivalent to user_name that is at any host o qstat u james Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time 16 south james workgq aimsl14 18 south james workgq aimsl14 52 south james workq subrun o qstat u james barry 51 south barry workq airfoil 52 south james workq subrun 54 south barry workq airfoil 6 1 7 List Running Jobs The r option to qstat d
2. 4 6 User Authorization Windows Security Tokens Under Windows a job can be run in one of two ways It can be run without requiring a user password default or the user can supply a password via qsub or xpbs For a pass word less job PBS will create a security token authentication identifier for the user This identifier will not be unique causing the job to not have access rights to some system resources like network shares For a password ed job the authentication identifier will be 28 Chapter 4 Submitting a PBS Job unique so users can access within his her job script folders in a network share If you want to supply a password via qsub use the Wpwd option and supply the pass word when prompted gqsub Wpwd job script The password specified will not be shown on screen and will be passed onto the program which will then encrypt it and save it securely for use by the job The password can also be specified in xpbs using the SUBMIT PASSWORD entry box in the Submit window The password you type in will not be shown on the screen Keep in mind that in a multi node job the password supplied will be propagated to all the sister nodes This requires that the password be the same on user s accounts on all the nodes The use of domain account for a multi node job will be ideal in this case Important Because of enhanced security feature found in Windows 2003 Advanced Server you may not be able to run non passworded
3. resource_name value The resource values are specified using the following units node_spec TeSC_Spec time size string unitary specifies the number and type of nodes processors per node tasks per node etc as needed by multi node jobs See Running Multi node Jobs on page 117 for a complete explanation of use specifies a set of resources and the conditions under which they should be allocated to a single node job See section 4 10 Single Node Conditional Requests on page 43 specifies a maximum time period the resource can be used Time is expressed in seconds as an integer or in the form hours minutes seconds milliseconds specifies the maximum amount in terms of bytes default or words It is expressed in the form integer suffix The suffix is a multiplier defined in the following table The size of a word is the word size on the execution host bor w bytes or words kb or kw Kilo 1024 bytes or words mb or mw Mega 1 048 576 bytes or words gb or gw Giga 1 073 741 824 bytes or words is comprised of a series of alpha numeric characters containing no whitespace beginning with an alphabetic character specifies the maximum amount of a resource which is expressed as a simple integer 30 Chapter 4 Submitting a PBS Job Different resources are available on different systems often depending on the architecture of the computer itself The ta
4. rerunnable or not The r y n option declares whether the job is rerunable To rerun a job is to terminate the job and requeue it in the execution queue in which the job currently resides The value oe 99 66 99 oe 99 argument is a single character either y or n If the argument is y the job is rerun 66 99 66 99 able If the argument is n the job is not rerunable The default value is y rerunable qsub r n mysubrun bin sh PBS r n 4 9 9 Specifying which shell to use The S path_list option declares the shell that interprets the job script The option argument path_list is in the form path host path host Only one path may be specified for any host named and only one path may be specified without the corresponding host name The path selected will be the one with the host name that matched the name of the execution host If no matching host is found then the path speci fied without a host will be selected if present If the S option is not specified the option argument is the null string or no entry from the path_list is selected then PBS will use the user s login shell on the execution host qsub S bin tcsh mysubrun bin sh PBS S bin tcsh mars usr bin tcsh jupiter 38 Chapter 4 Submitting a PBS Job 4 9 10 Setting a job s priority The p priority option defines the priority of the job The priority argument must b
5. File staging is a way to specify which files should be copied onto the execution host before the job starts and which should be copied off the execution host when it completes For file staging under Globus see PBS File Staging through GASS on page 104 The W stagein file_list and W stageout file_list options to qsub specifies which files are staged copied in before the job starts or staged out after the job completes execution On completion of the job all staged in and staged out files are removed from the execution system The file_list is in the form local_file hostname remote_file regardless of the direction of the copy Note that the character is used for separating the local and remote specification The name local_file is the name of the file on the sys tem where the job executes It may be an absolute path or relative to the home directory of the user The name remote_file is the destination name on the host specified by hostname The name may be absolute or relative to the user s home directory on the destination host Thus for stage in the direction of travel is local _file remote_host remote_file and for stage out the direction of travel is local_file p remote _host remote file PBS Pro 5 4 99 User Guide Note that all relative paths are relative to the user s home directory on the respective hosts The following example shows how to stage in a file named grid dat loca
6. qtime queue queue_rank queue_type resources_used server session_id substate the user name under which the job is to be run This attribute is available only to the batch administrator The name used as a basename for various files such as the job file script file and the standard output and error of the job This attribute is available only to the batch administrator True if the job is an interactive PBS job The login name on the submitting host of the user who submit ted the batch job The state of the job The time that the job was last modified changed state or changed locations The time that the job entered the current queue The name of the queue in which the job currently resides An ordered non sequential number indicating the job s position within the queue This is provided as an aid to the Scheduler This attribute is available to the batch manager only An identification of the type of queue in which the job is cur rently residing This is provided as an aid to the Scheduler This attribute is available to the batch manager only The amount of resources used by the job This is provided as part of job status information if the job is running The name of the Server which is currently managing the job If the job is running this is set to the session id of the first exe cuting task A numerical indicator of the substate of the job The substate is used by the PBS
7. s job the user must specify how many nodes and of what type are required for the job The user s parallel job must then execute tasks on the allocated nodes The chapter explains how to submit such multi node parallel jobs to PBS Important Recall that the previously discussed resc_spec requests are for single node jobs and are not supported for multi node jobs See also section 4 10 Single Node Conditional Requests on page 43 9 1 Node Specification Syntax The nodes resources_1ist item is set by the user via the qsub command to declare the node requirements for the job It is a string of the form 1 nodes node_spec node_spec suffix 118 Chapter 9 Running Multi node Jobs where node_spec can be any of the following N Number of nodes needed if number is used it must be listed first nodename Host name of the specific node requested property property One or more site specific node properties ppn X The number of processes tasks per node defaults to 1 cpp Y The number of CPUs threads per process defaults to 1 N spec spec Number of nodes followed by any of the above requests which may be further followed by additional of the above requests The node specification value is one or more node_spec joined with the character Each node_spec represents one or more nodes at run time If no number is specified one 1 is assumed The total number of virtual processors alloca
8. ss ssssesesssesssssesssesses 80 Using xpbs TrackJob Feature 0 00 eee eeeeeereeeee 82 7 Working With PBS Jobs scccsccsscsccsscesscssenees 83 Modifying Job Attributes cee eeeeeeeteeeeeteeees 83 Deleting JODS css cdcasiassceesscsaa ec anis 84 Holding and Releasing JobS eeeceesseeeeeteeees 85 Sending Messages to JODS c cceesceceeeeeeeeeeeeeees 87 Sending Signals to Jobs 0 0 eeeeeeseeeesteeeesteeeenes 88 Changing Order of Jobs Within Queue 89 Moving Jobs Between Queues ceeeeeeeeeteeees 90 8 Advanced PBS Features ccscccsccsscssessscsseeeees 93 Wot X10 SUAUUS na ti credit xcaceos RRR Qaeniashs 93 Changing Job umask 2 2 3 cette ies 94 Requesting qsub Wait for Job Completion 94 Specifying Job Dependencies eeeceeeseeeeeeees 94 Delivery of Output Files 0 0 eee eeeeeeeeeeeeeeees 97 Input Output File Staging 0 0 eeeeeeeeeeenteeeeees 98 The pbsdsh Command 1 eee eeceeeseceeeeeereeeeeees 101 Globus SUppot t srin aa ss 102 Advance Reservation of Resources 0e 106 Checkpointing SGI MPI Jobs seese 113 Running PBS in a DCE Environment 114 Running PBS in a Kerberos Environment 114 9 Running Mullti node Jobs cccsscccssssssecsseeees 117 Node Specification Syntax eeeeeeeeeesseeeenees 117 Job specific Nodes File cece eeeeeeseeeesteeeenteeees 119 PBS Pro 5 4 v User Guide EXAMPLE
9. The MailOptions argument is a string which con sists of either the single character n or one or more of the characters a o and e If no email notification is specified the default behavior will be the same as for m a send mail when job is aborted by batch system send mail when job begins execution send mail when job ends execution do not send mail a b e n qsub m ae mysubrun bin sh PBS m b 4 9 6 Setting e mail recipient list The M user_list option declares the list of users to whom mail is sent by the exe cution server when it sends mail about the job The user_list argument is of the form user host user host If unset the list defaults to the submitting user at the qsub host i e the job owner qsub M james pbspro com mysubrun 4 9 7 Specifying a job name The N name option declares a name for the job The name specified may be up to and including 15 characters in length It must consist of printable non whitespace characters PBS Pro 5 4 37 User Guide with the first character alphabetic and contain no special characters If the N option is not specified the job name will be the base name of the job script file specified on the command line If no script file name was specified and the script was read from the stan dard input then the job name will be set to STDIN qsub N myName mysubrun bin sh PBS N myName 4 9 8 Marking a job as
10. app arg1 on all four nodes allocated to the job i e the default behavior bin sh PBS 1 nodes 4 PBS 1 walltime 1 00 00 pbsdsh myapp app argl 8 8 Globus Support Globus is a computational software infrastructure that integrates geographically distrib uted computational and information resources Jobs are normally submitted to Globus using the utility gLobusrun When Globus support is enabled for PBS jobs can be routed between Globus and PBS Contact your PBS system administrator to learn if Glo bus support has been enabled on your PBS systems PBS Pro 5 4 103 User Guide 8 8 1 Running Globus jobs To submit a Globus job users must specify the globus resource name gatekeeper as the following example shows qsub 1 site globus globus resource name pbsjob The pbs_mom_globus daemon must be running on the same host where the pbs_server is running Be sure the pbs_server has a nodes file entry server host gl1 in order for globus job status to be communicated back to the Server by pbs_mom_globus Also be sure to create a Globus proxy certificate by running the utility grid proxy init in order to submit jobs to Globus without a password If user s job fails to run due to an expired proxy credential or non existent credential then the job will be put on hold and the user will be notified of the error by email 8 8 2 PBS and Globusrun If you re familiar with the gLobusrun utility the following ma
11. et c pbs conf file timeoutSecs specify the number of seconds before timing out waiting for a connection to a PBS host xtermCmd the xterm command to run driving an interactive PBS session labelFont font applied to text appearing in labels 64 Chapter 5 Using the xpbs GUI fixlabelFont textFont backgroundColor foregroundColor activeColor disabledColor signalColor shadingColor selectorColor selectHosts selectQueues selectJobs selectOwners selectStates selectRes selectExecTime selectAcctName font applied to text that label fixed width widgets such as list box labels This must be a fixed width font font applied to a text widget Keep this as fixed width font the color applied to background of frames buttons entries scrollbar handles the color applied to text in any context the color applied to the background of a selection a selected command button or a selected scroll bar handle color applied to a disabled widget color applied to buttons that signal something to the user about a change of state For example the color of the Track Job button when returned output files are detected a color shading applied to some of the frames to emphasize focus as well as decoration the color applied to the selector box of a radiobutton or check button list of hosts space separated to automatically select highlight in the HOSTS listbox list of queues space sep
12. one Job Scheduler pbs_sched and one or more execution servers pbs_mom The PBS System can be set up to distribute the workload to one large timeshared system multiple time shared systems a cluster of 12 Chapter 2 Concepts and Terms nodes to be used exclusively or temporarily shared or any combination of these The remainder of this chapter provides additional terms listed in alphabetical order Account Administrator API Attribute Batch or Batch Processing Complex Destination Destination Identifier File Staging An account is arbitrary character string which may have mean ing to one or more hosts in the batch system Frequently account is used by sites for accounting or charge back purposes See Manager PBS provides an Application Programming Interface API which is used by the commands to communicate with the Server This API is described in the PBS Pro External Refer ence Specification A site may make use of the API to imple ment new commands if so desired An attribute is an inherent characteristic of a parent object Server queue job or node Typically this is a data item whose value affects the operation or behavior of the object and can be set by the owner of the object For example the user can supply values for attributes of a job This refers to the capability of running jobs outside of the interactive login environment A complex is a collection of hosts managed by one
13. or is not specified then this implies enabling of access If only hostname is given then users logged into that host are allowed access to like named accounts on the local host If only username is given then that user has access to all accounts except Administrator type users on the local host Finally if both hostname and username are given then user at that host has access to like named account on local host 3 6 2 Windows rhosts File The Windows rhosts file is located in the user s HOMEDIR with the format hostname username This file can also determine if a remote user is allowed to submit jobs to the local PBS Server if the mapped user is an Administrator type of account 3 6 3 Windows User s HOMEDIR Each Windows user is assumed to have a home directory HOMEDIR where his her PBS job would initially be started The home directory is also the starting location of files when users specify relative path arguments to qsub gqalter W stagein stageout options PBS determines a user s HOMEDTIR in one of the following four ways 1 The value of Home directory when one runs the command net user username 2 If 1 did not return a value then it consults the user s USER PROFILE environment variable and if SUSERPRO FILES My Documents exists then uses the following path SUSERPROFILES My Documents PBS Output For instance for user postest the HOMEDIR will be c
14. ple jobs simultaneously called timeshared nodes Often the term host rather than node is used in conjunction with time shared as in timeshared host A timeshared node will never be allocated exclusively or temporarily shared However unlike cluster nodes a timeshared node can be over committed if the local policy specifies to do so See also virtual processors This is any collection of nodes controlled by a single instance of PBS i e by one PBS Server An exclusive VP is one that is used by one and only one job at a time A set of VPs is assigned exclusively to a job for the dura tion of that job This is typically done to improve the perfor mance of message passing programs Temporarily shared VP Load Balance Queue Node Attribute Node Property Portable Batch System PBS Pro 5 4 11 User Guide A temporarily shared node is one where one or more of its VPs are temporarily shared by jobs If several jobs request multiple tempo rarily shared nodes some VPs may be allocated commonly to both jobs and some may be unique to one of the jobs When a VP is allo cated on a temporarily shared basis it remains so until all jobs using it are terminated Then the VP may be re allocated either again for temporarily shared use or for exclusive use If a host is defined as timeshared it will never be allocated exclu sively or temporarily shared A policy wherein jobs are distributed across multiple timeshared hosts
15. stream See the qsub and galter command description for more detail The time after which the job may execute The time is maintained in seconds since Epoch If this time has not yet been reached the job will not be scheduled for execution and the job is said to be in wait state A list of group_names hosts which determines the group under which the job is run on a given host When a job is to be placed into execution the Server will select a group name according to the rules specified for use of the gsub command The set of holds currently applied to the job If the set is not null the job will not be scheduled for execution and is said to be in the hold 46 Chapter 4 Submitting a PBS Job Job_Name Join Path Keep_Files Mail Points Mail Users Output_Path Priority Rerunable Resource_List Shell Path _List state Note the hold state takes precedence over the wait state The name assigned to the job by the qsub or qalter com mand If the Join_Paths attribute is oe then the job s standard error stream will be merged inter mixed with the job s stan dard output stream and placed in the file determined by the Output_Path attribute The Error_Path attribute is main tained but ignored However if the Join_Paths attribute is eo then the job s standard output stream will be merged inter mixed with the job s standard error stream and placed in the file determined by the Error_Path attribute an
16. 2 Viewing Specific Information When requesting queue or Server status qst at will output information about each desti nation The various options to gst at take as an operand either a job identifier or a desti nation If the operand is a job identifier it must be in the following form sequence_number server_name server where sequence_number server_name is the job identifier assigned at submittal time see qsub If the server_name is omitted the name of the default Server will be used If server is supplied the request will be for the job identifier currently at that Server If the operand is a destination identifier it takes one of the following three forms queue server queue server If queue is specified the request is for status of all jobs in that queue at the default Server If the server form is given the request is for status of all jobs at that Server If a full destination identifier queue server is given the request is for status of all jobs in the named queue at the named server 70 Chapter 6 Checking Job System Status Important If a PBS Server is not specified on the qst at command line the default Server will be used See discussion of PBS_DEFAULT in Environment Variables on page 21 6 1 3 Checking Server Status The B option to qst at displays the status of the specified PBS Batch Server One line of output is generated for each Server queried The three letter abbreviations
17. 31 resc_spec 29 31 33 Reservation deleting 112 showing status of 111 submitting 107 resources_list 33 117 S Scheduler 9 Scheduling 7 scp 27 Scrollbar 62 Secure Shell 27 Server 8 SGI MPI 113 SIGKILL 88 SIGNULL 88 SIGTERM 88 size 29 ssh 27 string 29 Suppressing job identifier 42 System integration 6 monitoring 5 T Task 14 Task Manager 101 TCL 49 Temporarily shared VP 11 TGT 115 time 29 Timeshared PBS Pro 5 4 133 Administrator Guide node 10 vs cluster node 123 TK 49 tm 3 101 TMPDIR 126 tracejob 17 U umask 94 UNICOS 31 unitary 29 User defined 14 ID UID 14 interfaces 4 name mapping 6 Vv Veridian 3 Viewing Job Information 71 Virtual Processor VP 14 Ww Wait for Job Completion 94 Widgets 61 Windows 19 20 security tokens 27 Windows 2000 6 21 Windows 2003 28 Workload management 2 X xpbs 17 64 admin 17 buttons 57 configuration 61 usage 49 75 80 87 89 97 xpbsmon 17 X Windows 63
18. Documents and Settings pbstest My Documents PBS Output PBS Pro 5 4 21 User Guide 3 If there is no SUSERPROFILE S My Documents directory but SUSERPROFILES itself exists then HOMEDIR is set to SUSERPROFILES PBS Output 4 If SUSERPROFILES does not exist then the default HOMEDI is finally J ve SUSERPROFILE S PBS Under Windows 2000 not XP an Administrator can specify a home directory for a user via the Control Panel gt Settings gt Users and Passwords dialog box clicking on the Advanced tab and then the Advanced button which will give the directory structure for Users Click on Users select the username and then right mouse click to bring up Prop erties Now click on the Properties tab and the home folder is available for update under the Profile tab You will need to explicitly include the drive information for the home folder Important You must specify a directory for the home folder that is accessi ble to the user If the directory has incorrect permissions PBS will be unable to run jobs for the user 3 7 Environment Variables While we re on the topic of the user s environment we should mention that there are a number of environment variables provided to the PBS job Some are taken from the user s environment and carried with the job Others are created by PBS Still others can be explicitly created by the user for exclusive use by PBS jobs All PBS provided en
19. Support works with parallel programming libraries such as MPI PVM and HPF Applications can be scheduled to run within a single multi processor computer or across multiple systems System Monitoring includes a graphical user interface for system monitoring Displays node status job placement and resource utilization information for both stand alone sys tems and clusters Job Interdependency enables the user to define a wide range of inter dependencies between jobs Such dependencies include execution order synchronization and execution conditioned on the success or failure of another specific job or set of jobs Computational Grid Support provides an enabling technology for meta computing and computational grids including support for the Globus Grid Toolkit Comprehensive API includes a complete Application Programming Interface API for sites who desire to integrate PBS with other applications or who wish to support unique job scheduling requirements Automatic Load Leveling provides numerous ways to distribute the workload across a cluster of machines based on hardware configuration resource availability keyboard activity and local scheduling policy Distributed Clustering allows customers to utilize physically distributed systems and clus ters even across wide area networks Common User Environment offers users a common view of the job submission job query ing system status and job tracking over all systems Cross
20. The following example shows requesting different memory amounts depending on the architecture that the job runs on qsub l resc arch solaris7 amp amp mem 100MB II arch 1rix amp amp mem 1GB Furthermore it is possible to specify multiple resource specification strings The first resc specification will be evaluated If it can be satisfied then it will be used If not then next resc string will be used For example qsub l resc ncpus 16 amp amp mem 1GB amp amp walltime 1 00 l resc ncpus 8 amp amp mem 512MB amp amp walltime 2 00 1 resc ncpus 4 amp amp mem 256MB amp amp walltime 4 00 indicates that you want 16 CPUs but if you can t have 16 CPUs then give you 8 with half the memory and twice the wall clock time But if you can t have 8 CPUs then give you four and 1 4 the memory and four times the walltime This is different then putting them all into one resc specification If you were to do qsub 1 resc ncpus 16 ncpus 8 ncpus 4 you would be requesting the first available node which has either 16 8 or 4 CPUs In this case PBS doesn t go through all the nodes checking for 16 first then 8 then 4 as it does when using multiple resc specifications Important Note the difference between comparison and assignment within a resc_spec The comparison operators only impact which node is selected for the job the
21. W block opt Requesting qsub Wait for Job Completion on page 94 W pwd passwd Running PBS in a DCE Environment on page 114 W umask nnn Changing Job umask on page 94 e Suppressing job identifier on page 42 4 9 1 Specifying Queue and or Server The q destination option to qsub allows you to specify a particular destination to which you want the job submitted The destination names a queue a Server or a queue at a Server The qsub command will submit the script to the Server defined by the desti nation argument If the destination is a routing queue the job may be routed by the Server to a new destination If the q option is not specified the qsub command will submit the script to the default queue at the default Server See also the discussion of PBS _ DEFAULT in Environment Variables on page 21 The destination specification takes the following form q queue host qsub q queue mysubrun bin sh PBS q queueName qsub q server mysubrun qsub q queueName serverName mysubrun qsub q queueName serverName domain com mysubrun PBS Pro 5 4 35 User Guide 4 9 2 Redirecting output and error files The o path and e path options to qsub allows you to specify the name of the files to which the standard output stdout and the standard error stderr file streams should be written The path argument is of the form hostname path_name where hostname is the name of
22. batch sys tem It may be made up of nodes that are allocated to only one job at a time or of nodes that have many jobs executing at once on each node or a combination of these two scenarios This is the location within PBS where a job is sent for processing A destination may be a single queue at a single Server or it may map into multiple possible locations tried in turn until one accepts the job This is a string that names the destination It is composed two parts and has the format queue server where server is the name of a PBS Server and queue is the string identifying a queue on that Server File staging is the movement of files between a specified location and the execution host See Stage In and Stage Out below Group ID GID Group Hold Job or Batch Job Manager Operator Owner POSIX Rerunable Stage In Stage Out PBS Pro 5 4 13 User Guide This unique number represents a specific group see Group Group refers to collection of system users see Users A user must be a member of a group and may be a member of more than one Within UNIX and POSIX systems membership in a group establishes one level of privilege Group membership is also often used to control or limit access to system resources An artificial restriction which prevents a job from being selected for processing There are three types of holds One is applied by the job owner another is applied by a PBS Operator and a
23. correspond to various job limits and counts as follows Maximum Total Queued Running Held Wait ing Transiting and Exiting The last column gives the status of the Server itself active idle or scheduling qstat B Server Max Tot Que Run Hld Wat Trn Ext Status fast pbspro 14 13 0 Active When querying jobs Servers or queues you can add the f option to qstat to change the display to the full or long display For example the Server status shown above would be expanded using f as shown below qstat Bf Server fast pbspro com server_state Active scheduling True total_jobs 14 state_count Transit 0 Queued 13 Held 0 Waiting 0 Running 1 Exiting 0 managers james fast pbspro com default_queue workg log_events 511 mail_from adm query_other_jobs True resources_available mem 64mb resources_available ncpus 2 resources_default ncpus 1 resources_assigned ncpus resources_assigned nodect 1 scheduler_iteration 600 pbs_version PBSPro_5_2_1 PBS Pro 5 4 71 User Guide 6 1 4 Checking Queue Status The Q option to qstat displays the status of all or any specified queues at the optionally specified PBS Server One line of output is generated for each queue queried The three letter abbreviations correspond to limits queue states and job counts as follows Maximum Total Enabled Status Started Status Queued Running Held Waiting Tran si
24. is equal to the prior ity Restricts selection to those jobs residing at the specified desti nation The destination may be one of the following three forms queue server queuelserver If the q option is not specified jobs will be selected from the default Server If the destination describes only a queue only jobs in that queue on the default batch Server will be selected If the destination describes only a Server then jobs in all queues on that Server will be selected If the destination describes both a queue and a Server then only jobs in the named queue on the named Server will be selected r rerun S States u user_list PBS Pro 5 4 79 User Guide Restricts selection of jobs to those with the specified Rerunable attribute The option argument must be a single character The following two characters are supported by PBS y and n Restricts job selection to those in the specified states The states argument is a character string which consists of any combina tion of the characters E H Q R S T U and W The characters in the states argument have the following interpretation Table 9 Job States Viewable by Users State Meaning E Job is in the process of Exiting H Job has been placed on Hold Q Job is in the Queued state R Job is in the Running state S Job has been Suspended T Job is Transiting between states U Job suspended due to workstation user acti
25. line method for accomplishing a particular task is presented first followed by the xpbs method 7 1 Modifying Job Attributes There may come a time when you need to change an attribute on a job you have already submitted Perhaps you made a mistake on the resource requirements or perhaps a previ ous job ran out of time so you want to add more time to a queued job before it starts run ning Whatever the reason PBS provides the qalter command Most attributes can be changed by the owner of the job while the job is still queued How ever once a job begins execution the resource limits cannot be changed These include cputime walltime number of CPUs memory 84 Chapter 7 Working With PBS Jobs The usage syntax for qalter is galter job resources job list The job resources are the same option and value pairs used on the qsub command line See Submitting a PBS Job on page 24 Only those attributes listed as options on the command will be modified If any of the specified attributes cannot be modified for a job for any reason none of that job s attributes will be modified The following examples illustrate how to use the gqalter command First we list all the jobs of a particular user Then we modify two attributes as shown increasing the wall clock time from 13 to 20 minutes and changing the job name from airfoil to engine qstat u barry Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Ti
26. not your job To prevent this you need to preserve the job s exit status in your logout file by saving it at the top then doing an explict exit at the end as shown below set EXITVAL Sstatus previous contents of logout here exit SEXITVAL Likewise if the user s login shell is csh the following message may appear in the stan dard output of a job Warning no access to tty thus no job control in this shell This message is produced by many csh versions when the shell determines that its input is not a terminal Short of modifying csh there is no way to eliminate the message For tunately it is just an informative message and has no effect on the job 3 6 Setting Up Your Windows Environment This section discusses the setup needed for running PBS Pro under a Microsoft Windows environment 3 6 1 Windows hosts equiv File Windows hosts equiv file determines the list of non Administrator accounts that are allowed access to the local host that is the host containing this file This file also deter mines whether a remote user is allowed to submit jobs to the local PBS Server with the user on the local host being a non Administrator account This file is usually SWINDIR system32 drivers etc hosts equiv 20 Chapter 3 Getting Started With PBS The format of the hosts equiv file is as follows hostname username means enable access whereas means to disable access If
27. o combined in either order Or the argument is the letter n If k is not specified neither file is retained e The standard error file is to be retained on the execution host The file will be placed in the home directory of the user under whose user id the job executed The file name will be the default file name given by job_name esequence where job_name is the name specified for the job and sequence is the sequence number component of the job identifier o The standard output file is to be retained on the execution host The file will be placed in the home directory of the user under whose user id the job executed The file name will be the default file name given by job_name osequence where job_name is the 42 Chapter 4 Submitting a PBS Job name specified for the job and sequence is the sequence number component of the job identifier eo Both standard output and standard error will be retained oe Both standard output and standard error will be retained n Neither file is retained o qsub k oe mysubrun bin sh PBS k eo 4 9 19 Suppressing job identifier The z option directs the qsub command to not write the job identifier assigned to the job to the command s standard output qsub z mysubrun bin sh PBS z 4 9 20 Interactive batch jobs The I option declares that the job is to be run interactively The job will be queued and scheduled as any PBS batch job b
28. option defines the user name under which the job is to run on the 40 Chapter 4 Submitting a PBS Job execution system If unset the user_list defaults to the user who submitted the qsub com mand The user_list argument is of the form user host user host Only one user name may be given per specified host and only one of the user specifica tions may be supplied without the corresponding host specification That user name will be used for execution on any host not named in the argument list A named host refers to the host on which the job is queued for execution not the actual execution host Authori zation must exist for the job owner to run as the specified user See User Authorization on page 26 for details qsub u james jupiter barney purpleplanet mysubrun 4 9 15 Specifying job groupID The W group_list g_list option defines the group name under which the job is to run on the execution system The g_list argument is of the form group host group host Only one group name may be given per specified host Only one of the group specifica tions may be supplied without the corresponding host specification That group name will used for execution on any host not named in the argument list If not set the group_list defaults to the primary group of the user under which the job will be run qsub W group_list grpA grpB jupiter mysubrun 4 9 16 Specifying a local account The A account_st
29. paging through the rows of data and each group of fields gets one scrollbar for horizontally scanning long entry strings Moving from field to field can be done using the lt Tab gt move forward lt Cntrl f gt move forward or lt Cntrl b gt move backward keys A spinbox is a combination of an entry widget and a horizontal scrollbar The entry widget will only accept values that fall within a defined list of valid values and incrementing through the valid values is done by clicking on the up down arrows A button is a rectangular region appearing either raised or pressed that invokes an action when clicked with the left mouse button When the button appears pressed then hitting the lt RETURN gt key will automatically select the button A text region is an editor like widget This widget is brought into focus with a click of the left mouse button To manipulate this widget simply type in the text Use of arrow keys backspace delete key mouse selection of text for deletion or overwrite copying and past ing with sole use of mouse buttons are permitted This widget is usually accompanied by a scrollbar for vertically scanning a long entry 5 9 xpbs X Windows Preferences The resources that can be set in the X resources file xpbsrc are serverHosts list of Server hosts space separated to query by xpbs A spe cial keyword PBS_ DEFAULT_SERVER can be used which will be used as a placeholder for the value obtained from the
30. represented by the attribute of the job is greater than or equal to the value represented by the option argument gt The value represented by the attribute of the job is greater than the value represented by the option argument le The value represented by the attribute of the job is less than or equal to the value represented by the option argument lt The value represented by the attribute of the job is less than the value represented by the option argument PBS Pro 5 4 77 User Guide The available options to qselect are a op date_time A account_string c op interval h hold_list Restricts selection to a specific time or a range of times The qselect command selects only jobs for which the value of the Execution_Time attribute is related to the date_time argu ment by the optional op operator The date_time argument is in the POSIX date format CC YY MMDDhhmm SS where the MM is the two digits for the month DD is the day of the month hh is the hour mm is the minute and the optional SS is the seconds CC is the century and YY the year If op is not specified jobs will be selected for which the Execution_Time and date_time values are equal Restricts selection to jobs whose Account_Name attribute matches the specified account_string Restricts selection to jobs whose Checkpoint interval attribute matches the specified relationship The values of the Check point attribute are def
31. system blocks that can be size used by all process in the job qsub 1 pf 1000 pmppt Maximum amount of wall clock time used on the MPP by time a single process in the job qsub 1 pmppt 4 00 00 pncpus Maximum number of processors used by any single unitary process in the job qsub 1 pncpus 4 ppf Maximum number of file system blocks that can be used size by a single process in the job qsub 1 ppf 500 procs Maximum number of processes in the job unitary qsub 1 procs 128 psds Maximum number of data blocks on the SDS secondary size data storage for any process in the job qsub 1 psds 300 sds Maximum number of data blocks on the SDS secondary size data storage for the job qsub 1 sds 1000 4 9 Job Submission Options There are many options to the qsub command The table below gives a quick summary of the available options the rest of this chapter explains how to use each one PBS Pro 5 4 33 User Guide Table 4 Options to the qsub Command Option Function and Page Reference account_string Specifying a local account on page 40 date_time Deferring execution on page 38 c interval Specifying job checkpoint interval on page 39 e path Redirecting output and error files on page 35 h Holding a job delaying execution on page 38 Interactive batch jobs on page 42 j join Merging output and err
32. the HOSTS region a boolean value true or false indicating whether or not to iconize the QUEUES region a boolean value true or false indicating whether or not to iconize the JOBS region a boolean value true or false indicating whether or not to iconize the INFO region a curly braced list of resource names as according to architec ture known to xpbs The format is as follows lt arch typel gt resnamel resname2 resnameN lt arch type2 gt resnamel resname2 resnameN lt arch typeN gt resnamel resname2 resnameN 66 Chapter 5 Using the xpbs GUI PBS Pro 5 4 67 User Guide Chapter 6 Checking Job System Status This chapter introduces several PBS commands useful for checking status of jobs queues and PBS Servers Examples for use are included as are instructions on how to accomplish the same task using the xpbs graphical interface 6 1 The qstat Command The qstat command is used to the request the status of jobs queues and the PBS Server The requested status is written to standard output stream usually the user s terminal When requesting job status any jobs for which the user does not have view privilege are not displayed 6 1 1 Checking Job Status Executing the qstat command without any options displays job information in the default format An alternative display format is also provided and is discussed below The default display includes the following information The j
33. this check is denied by the Scheduler If the submitter did not indicate that the submission command should wait for confirma tion or rejection I option he will have to periodically query the Server about the status of the reservation or wait for a mail message regarding its denial or confirmation 8 9 1 Submitting a PBS Reservation The pbs_rsub command is used to request a reservation of resources If the request is granted PBS provisions for the requested resources to be available for use during the specified future time interval A queue is dynamically allocated to service a confirmed res ervation Users who are listed as being allowed to run jobs using the resources of this res ervation will submit their jobs to this queue via the standard qsub command For details see Submitting a PBS Job on page 24 Although a confirmed resources reservation will accept jobs into its queue at any time the scheduler is not allowed to schedule jobs from the queue before the reservation period arrives Once the reservation period arrives these jobs will begin to run but they will not in aggregate use up more resources than the reservation requested The pbs_rsub command returns an ID string to use in referencing the reservation and an indication of its current status The actual specification of resources is done in the same way as it is for submission of a job Following is a list and description of options to the pbs_rsub command R d
34. to the standard output or the standard error file of the job Click the Send Message button to complete the process 88 Chapter 7 Working With PBS Jobs Send maup f 3 ee a Ei i piik nE MRH jobia E Shier 7 5 Sending Signals to Jobs The qsig command requests that a signal be sent to executing PBS jobs The signal is sent to the session leader of the job Usage syntax of the qsig command is qsig s signal job_identifier If the s option is not specified SIGTERM is sent If the s option is specified it declares which signal is sent to the job The signal argument is either a signal name e g SIGKILL the signal name without the SIG prefix e g KILL or a unsigned signal num ber e g 9 The signal name SIGNULL is allowed the Server will send the signal 0 to the job which will have no effect Not all signal names will be recognized by qsig If it doesn t recognize the signal name try issuing the signal number instead The request to signal a batch job will be rejected if The user is not authorized to signal the job The job is not in the running state The requested signal is not supported by the execution host The job is exiting Two special signal names suspend and resume note all lower case are used to sus pend and resume jobs When suspended a job continues to occupy system resources but is not executing and is not charged for walltime Manager or operator privilege is r
35. upon the Selection Criteria currently selected This is discussed in the xpbs portion of the next section 6 3 The qselect Command The qselect command provides a method to list the job identifier of those jobs which meet a list of selection criteria Jobs are selected from those owned by a single Server When qselect successfully completes it will have written to standard output a list of zero or more job identifiers which meet the criteria specified by the options Each option acts as a filter restricting the number of jobs which might be listed With no options the qselect command will list all jobs at the Server which the user is authorized to list query status of The u option may be used to limit the selection to jobs owned by this user or other specified users When an option is specified with a optional op component to the option argument then op specifies a relation between the value of a certain job attribute and the value compo nent of the option argument If an op is allowable on an option then the description of the option letter will indicate the op is allowable The only acceptable strings for the op com ponent and the relation the string indicates are shown in the following list eq The value represented by the attribute of the job is equal to the value represented by the option argument ne The value represented by the attribute of the job is not equal to the value represented by the option argument ge The value
36. virtual memory i e size for any single process in the job qsub 1 pvmem 2 00mb resc Single node variable resource specification string See resc_spec also section 4 10 Single Node Conditional Requests on page 43 software Allows a user to specify software required by the job string The allowable values and effect on job placement is site dependent Contact your PBS system administrator to learn what if any values for software have been configured at your site vmem Maximum aggregate amount of virtual memory used by size all concurrent processes in the job qsub 1 vmem 400mb walltime Maximum amount of real time wall clock elapsed time time which the job needs to execute run qsub 1 walltime 4 00 00 On Cray systems running UNICOS 8 or later there are additional resources that may be requested by PBS jobs as shown below Table 3 PBS Resources on Cray UNICOS Resource Meaning Units The number of processing elements used by a single mppe process in the job unitary qsub 1 mppe 512 mppt Maximum amount of wall clock time used on the MPP in time the job qsub 1 mppt 4 00 00 32 Chapter 4 Submitting a PBS Job Table 3 PBS Resources on Cray UNICOS Resource Meaning Units mta Maximum number of magnetic tape drives required in the unitary mtb mth corresponding device class of a or b qsub 1 mta 1 pf Maximum number of file
37. Altaire PBS Pro User Guide 5 4 for UNIXe Linux and Windowse Portable Batch System User Guide PBS 3BA01 Altair PBS Pro 5 4 Updated February 11 2004 Edited by James Patton Jones Copyright 2004 Altair Grid Technologies LLC All rights reserved Trademark Acknowledgements PBS Pro Portable Batch System and the PBS Jug gler logo are trademarks of Altair Grid Technologies LLC All other trademarks are the property of their respective owners Altair Grid Technologies is a subsidiary of Altair Engineering Inc For more information and for product sales and technical support contact Altair at URL www altair com www pbspro com Email sales pbspro com support pbspro com Location Telephone e mail North America 1 248 614 2425 pbssupport altair com China 86 0 21 5393 0011 support altair com cn France 33 0 1 4133 0990 francesupport altair com Germany 49 0 7031 6208 22 support altair de India 91 80 658 8540 91 80 658 8542 support altair eng soft net Italy 39 832 315573 39 800 905595 support altairtorino it Japan 81 3 5396 1341 aj support altairjp co jp Korea 82 31 728 8600 support altair co kr Scandinavia 46 0 46 286 2050 support altair se United Kingdom 44 0 1327 810 700 support uk altair com For online documentation purchases visit store pbspro com This document is proprietary information of Altair Grid Techn
38. S vey ereenn e aiaa e a 119 Summary of Node Specification Options 123 MPI Jobs with PBS scssccseccsaverccdvansdectieeseacasecaiys 124 PVM Jobs with PBS citesdanntekedietnesvencaress 124 OpenMP Jobs with PBS sessseeessessseesseessseesseee 124 10 Appendix A PBS Environment Variables 125 11 Appendix B Converting From NQS to PBS 127 DD TIMOR R T 129 vi Table of Contents PBS Pro 5 4 vii User Guide List of Tables PBS Pro User and Manager Commands 17 PBS Resources Available on All Systems 30 PBS Resources on Cray UNICOS 31 Options to the qsub Command 33 xpbs Server Column Headings 52 xpbs Queue Column Headings 53 xpbs Job Column Headings 0 eee 55 xpbs Buttons and PBS Commands 57 Job States Viewable by Users 0 79 qsub Options vs Globus RSL 103 PBS Job States vs Globus States 104 Node Specification Options 123 PBS Environment Variables 125 viii List of Tables PBS Pro 5 4 ix User Guide Preface Intended Audience PBS Pro is the professional workload management system from Altair that provides a uni fied queuing and job management interface to a set of computing resources This docu ment provides the user with the information required to use the Portable Batch System PBS including creating submitting
39. S_NODEFILE would contain the names of the three allocated nodes each listed twice as such A B C A B C bin sh PBS 1 nodes 3 ppn 2 A more elaborate and possibly contrived example would be the following which requests two virtual processors on one node plus an additional four virtual processors on a second node The PBS_NODEFILE would contain the two node names multiple times in the following order A B A B B B bin sh PBS 1 nodes 1 ppn 2 1 ppn 4 This allows a user to request varying numbers of processes on nodes and by setting the number of NPROC on the mpirun command to the total number of allocated nodes run one process on each node This is useful if one set of files have to be created on each node by a setup process regardless of the number of processes that will run on the nodes during the computation phase 9 3 3 Processes Tasks vs CPUs It is further possible to specify the number of CPU threads per process using the cpp modifier For example the node specification bin sh PBS 1 nodes 4 ppn 3 cpp 2 requests a total of four separate nodes three parallel processes tasks should be run on each node This means each node will appear in the PBS_NODEFILE three times and two CPUs should be allocated to each process so that each process can run two threads which results in the OMP_NUM_THREADS and NCPUS environment variables to be set to two The above specification yi
40. Server internally The attribute is visible to privileged clients such as the Scheduler PBS Pro 5 4 49 User Guide Chapter 5 Using the xpbs GUI The PBS graphical user interface is called xpbs and provides a user friendly point and click interface to the PBS commands xpbs utilizes the tcl tk graphics toolsuite while providing the user with the same functionality as the PBS CLI commands In this chapter we introduce xpbs and show how to create a PBS job using xpbs 5 1 User s xpbs Environment Depending on how PBS is installed at your site you may need to allow xpbs to display on your workstation However if the PBS client commands are installed locally on your workstation you can skip this section Ask your PBS administrator if you are unsure Make sure your X Windows session is set to permit the xpbs client to connect to your local X server Do this by running the xhost command with the name of the host from which you will be running xpbs as shown in the example below xhost server pbspro com Next on the system from which you will be running xpbs set your X Windows DIS PLAY variable to your local workstation For example if using the C shell 50 Chapter 5 Using the xpbs GUI setenv DISPLAY myWorkstation 0 0 However if you are using the Bourne or Korn shell type the following export DISPLAY myWorkstation 0 0 5 1 1 Starting xpbs Once your PBS environment is setup launch xpbs xpbs a
41. System Scheduling ensures that jobs do not have to be targeted to a specific com puter system Users may submit their job and have it run on the first available system that meets their resource requirements 6 Chapter 1 Introduction Job Priority allows users the ability to specify the priority of their jobs defaults can be provided at both the queue and system level Username Mapping provides support for mapping user account names on one system to the appropriate name on remote server systems This allows PBS to fully function in envi ronments where users do not have a consistent username across all the resources they have access to Fully Configurable PBS was designed to be easily tailored to meet the needs of different sites Much of this flexibility is due to the unique design of the scheduler module which permits complete customization Broad Platform Availability is achieved through support of Windows 2000 and every major version of UNIX and Linux from workstations and servers to supercomputers New platforms are being supported with each new release System Integration allows PBS to take advantage of vendor specific enhancements on dif ferent systems such as supporting cpusets on SGI systems PBS Pro 5 4 7 User Guide Chapter 2 Concepts and Terms PBS is a distributed workload management system As such PBS handles the manage ment and monitoring of the computational workload on a set of one or more computers Mo
42. You can combine multiple requests by separating them with a comma thusly qsub l ncpus 16 walltime 4 00 00 mysubrun 16389 cluster pbspro com The same rule applies to the job script as well as the next example shows bin sh PBS 1 walltime 1 00 00 mem 400mb PBS 1 ncpus 4 PBS j oe subrun 4 4 How PBS Parses a Job Script The qsub command scans the lines of the script file for directives An initial line in the script that begins with the characters or the character will be ignored and scan ning will start with the next line Scanning will continue until the first executable line that is a line that is not blank not a directive line nor a line whose first non white space char acter is If directives occur on subsequent lines they will be ignored 26 Chapter 4 Submitting a PBS Job A line in the script file will be processed as a directive to qsub if and only if the string of characters starting with the first non white space character on the line and of the same length as the directive prefix matches the directive prefix i e PBS The remainder of the directive line consists of the options to qsub in the same syntax as they appear on the command line The option character is to be preceded with the character If an option is present in both a directive and on the command line that option and its argument if any will be ignored in the directive The command line takes pr
43. a host to which the file will be returned and path_name is the path name on that host You may specify relative or absolute paths If you specify only a file name it is assumed to be relative to your home directory The following examples illus trate these various options bin sh PBS o u james myOutputFile PBS e u james myErrorFile myOutputFile mysubrun u james myOutputFile mysubrun myWorkstation u james myOutputFile mysubrun myErrorFile mysubrun u james myErrorFile mysubrun myWorkstation u james myErrorFile mysubrun 4 9 3 Exporting environment variables The V option declares that all environment variables in the gsub command s environ ment are to be exported to the batch job qsub V mysubrun bin sh PBS V 4 9 4 Expanding environment variables The v variable_list option to qsub allows you to specify additional environ ment variables to be exported to the job variable_list names environment variables from the qsub command environment which are made available to the job when it executes The variable_list is a comma separated list of strings of the form variable or vari able value These variables and their values are passed to the job 36 Chapter 4 Submitting a PBS Job qsub v DISPLAY myvariable 32 mysubrun 4 9 5 Specifying e mail notification The m MailOptions defines the set of conditions under which the execution server will send a mail message about the job
44. after jobs jobid have terminated with errors See previous csh warning afterany jobid jobid This job may be scheduled for execution after jobs jobid have ter minated with or without errors on count This job may be scheduled for execution after count dependencies on other jobs have been satisfied This form is used in conjunction with one of the before forms see below before jobid jobid When this job has begun execution then jobs jobid may begin 96 Chapter 8 Advanced PBS Features beforeok jobid jobid If this job terminates execution without errors then jobs jobid may begin See previous csh warning beforenotok jobid jobid If this job terminates execution with errors then jobs jobid may begin See previous csh warning beforeany jobid jobid When this job terminates execution jobs jobid may begin If any of the before forms are used the jobs referenced by jobid must have been submitted with a dependency type of on If any of the before forms are used the jobs referenced by jobid must have the same owner as the job being submitted Otherwise the dependency is ignored Error processing of the existence state or condition of the job on which the newly submit ted job depends is a deferred service i e the check is performed after the job is queued If an error is detected the new job will be deleted by the Server Mail will be sent to the job submitter stating the error The
45. and manipulating batch jobs querying status of jobs queues and systems and otherwise making effective use of the computer resources under the control of PBS Related Documents The following publications contain information that may also be useful to the user of PBS PBS 3BQ01 PBS Pro Quick Start Guide offers a short overview of the instal lation and use of PBS Pro PBS 3BA01 PBS Pro Administrator Guide provides the system administrator with information required to install configure and manage PBS as well as a thorough discussion of how the various components of PBS interoperate PBS 3BE01 PBS Pro External Reference Specification discusses in detail the PBS application programming interface API security within PBS and intra daemon communication x Preface Ordering Software and Publications To order additional copies of this and other PBS publications or to purchase additional software licenses contact an authorized reseller or the PBS Sales Department Contact information is included on the copyright page of this document Document Conventions PBS documentation uses the following typographic conventions abbreviation command input manpage x terms If a PBS command can be abbreviated such as sub commands to qmgr the shortest acceptable abbreviation is underlined This fixed width font is used to denote literal commands filena mes error messages and program output Literal user input is s
46. arated to automatically select highlight in the QUEUES listbox list of jobs space separated to automatically select highlight in the JOBS listbox list of owners checked when limiting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Own ers lt list_of_owners gt See u option in qselect 1B for format of lt list_of_owners gt list of job states to look for do not space separate when limit ing the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Job_States lt states_string gt See s option in qselect 1B for format of lt states_string gt list of resource amounts space separated to consult when lim iting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Resources lt res_string gt See 1 option in gselect 1B for format of lt res_string gt the Execution Time attribute to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Specify value as Queue_Time lt exec_time gt See a option in gselect 1B for format of lt exec_time gt the name of the account that will be checked when limiting the selectCheckpoint selectHold selectPriority selectRerun selectJobName iconizeHosts View iconizeQueues View iconizeJobs View iconizeInfo View jobResourceList PBS Pro 5 4 65 User Guide jobs appearing on the Jobs listbox in the main xpbs window Spe
47. aste countless hours learn ing the nuances of different computing environments rather than being able to focus on their core priorities PBS Pro addresses these problems for computing intensive industries such as science engineering finance and entertainment Now you can use the power of PBS Pro to better control your computing resources This allows you to unlock the potential in the valuable assets you already have while at the PBS Pro 5 4 3 User Guide same time reducing dependency on system administrators and operators freeing them to focus on other actives PBS Pro can also help you effectively manage growth by tracking real usage levels across your systems and enhancing effective utilization of future pur chases 1 3 History of PBS In the past UNIX systems were used in a completely interactive manner Background jobs were just processes with their input disconnected from the terminal However as UNIX moved onto larger and larger machines the need to be able to schedule tasks based on available resources increased in importance The advent of networked compute servers smaller general systems and workstations led to the requirement of a networked batch scheduling capability The first such UNIX based system was the Network Queueing Sys tem NQS funded by NASA Ames Research Center in 1986 NQS quickly became the de facto standard for batch queueing Over time distributed parallel systems began to emerge and NQS was inadeq
48. ated list of entries of the form group host Entries on this list help control the enqueuing of jobs into the reservation s queue Jobs owned by members belong ing to these groups are either allowed or denied entry into the queue Any group on the list is to be interpreted in the context of the Server s host not the context of the host from which qsub was sub mitted This list becomes the acl_groups list for the reserva tion s queue Specifies a comma separated list of entries of the form hostname These entries help control the enqueuing of jobs into the reservation s queue by allowing denying jobs submitted from these hosts This list becomes the acl_hosts list for the reserva tion s queue Declares a name for the reservation The name specified may be up to 15 characters in length It must consist of printable non white space characters with the first character alphabetic Specifies a list of resources required for the reservation These resources will be used for the limits on the queue that s dynamically created to service the reservation The aggregate amount of resources for currently running jobs from this queue will not exceed these resource limits In addition the queue inherits the value of any resource limit set on the Server if the reservation request itself is silent about that resource Interactive mode is specified if the submitter wants to wait for an answer to the request The pbs_rsu
49. atetime Specifies reservation starting time If the reservation s end time and duration are the only times specified this start time is calculated The datetime argument adhers to the POSIX time specification LL CC YY MM DD hhmm SS If the day DD is not specified it will default to today if the time 108 Chapter 8 Advanced PBS Features E datetime D timestring m mail_points M mail_list u user_list g group_list hhmm is in the future Otherwise the day will be set to tomor row For example if you submit a reservation having a specifi cation R 1110 at 11 15am it will be interpreted as being for 11 10am tomorrow If the month portion MM is not specified it defaults to the current month provided that the specified day DD is in the future Otherwise the month will be set to next month Similarly comments apply to the two other optional left hand components Specifies the reservation end time See the R flag for a description of the datetime string If start time and duration are the only times specified the end time value is calculated Specifies reservation duration Timestring can either be expressed as a total number of seconds of walltime or it can be expressed as a colon delimited timestring e g HH MM SS or MM SS If the start time and end time are the only times speci fied this duration time is calculated Specifies the set of events that cause the Server to send mai
50. ause it unlike some rcp implementations always exits with a non zero exits status for any error Thus MOM knows if the file was delivered or not Fortunately the secure copy program scp is also based on this version of rcp and exits with the proper status code If using rcp the copy of output or staged files can fail for at least two reasons 1 Ifthe user s cshrc script outputs any characters to standard out put e g contains an echo command pbs_rcp will fail 2 The user lacks authorization to access the specified system See discussion in User Authorization on page 26 98 Chapter 8 Advanced PBS Features If using Secure Copy scp then PBS will first try to deliver output or stage in out files using scp If scp fails PBS will try again using rcp assuming that scp might not exist on the remote host If rcp also fails the above cycle will be repeated after a delay in case the problem is caused by a temporary network problem All failures are logged in MOWM s log and an email containing the errors is sent to the job owner For delivery of output files on the local host PBS uses the bin cp command Local and remote Delivery of output may fail for the following additional reasons 1 A directory in the specified destination path does not exist 2 A directory in the specified destination path is not searchable by the user 3 The target directory is not writable by the user 8 6 Input Output File Staging
51. b command will block up to the number of seconds specified while waiting for the scheduler to 110 Chapter 8 Advanced PBS Features either confirm or deny the reservation request A negative num ber of seconds may be specified and is interpreted to mean if the confirm deny decision isn t made in the number of seconds specified automatically delete the reservation request from the system If automatic deletion isn t being requested and if the scheduler doesn t make a decision in the specified number of seconds the command will return the ID string for the reserva tion and show the status as unconfirmed The requester may periodically issue the pbs_rstat command with ID string as input to monitor the reservation s status W other attributes value This allows a site to define any extra attribute on the reserva tion The following example shows the submission of a reservation asking for 1 node 30 min utes of wall clock time and a start time of 11 30 Note that since an end time is not speci fied PBS will calculate the end time based on the reservation start time and duration pbs _rsub 1 nodes 1 walltime 30 00 R 1130 R226 south UNCONFIRMED A reservation queue named R226 was created on the local PBS Server Note that the res ervation is currently unconfirmed Email will be sent to the reservation owner either con firming the reservation or rejecting it The owner of the reservation can submit jo
52. b from the system Visible to any client The following attributes are read only they are established by the Server and are visible to the user but cannot be set or changed by a user alt_id ctime etime exec_host egroup For a few systems such as Irix 6 x running Array Services the ses sion id is insufficient to track which processes belong to the job Where a different identifier is required it is recorded in this attribute If set it will also be recorded in the end of job accounting record For Irix 6 x running Array Services the alt__id attribute is set to the Array Session Handle ASH assigned to the job The time that the job was created The time that the job became eligible to run i e in a queued state while residing in an execution queue If the job is running this is set to the name of the host or hosts on which the job is executing The format of the string is node N C where node is the name of a node N is process or task slot on that node and C is the number of CPUs allocated to the job C does not appear if it is one If the job is queued in an execution queue this attribute is set to the group name under which the job is to be run This attribute is avail able only to the batch administrator 48 Chapter 4 Submitting a PBS Job euser Ifthe job is queued in an execution queue this attribute is set to hashname interactive Job_Owner job_st m tate time
53. ble below lists the available resources that can be requested by PBS jobs on any system Following that is a table of additional PBS resources that may be requested on computer systems running the Cray UNICOS operating system Table 2 PBS Resources Available on All Systems Resource Meaning and Usage Units arch System architecture needed by job qsub 1 arch linux string cput Maximum aggregate CPU time required by all pro cesses in job qsub 1 cput 5 00 00 time file Maximum disk space requirements for any single file to be created by job qsub 1 file 300mb size mem Maximum amount of physical memory RAM required by job qsub 1 mem 512mb size ncpus Number of CPUs processors required by job qsub 1 ncpus 16 unitary nice Requested job priority e g equivalent to nice on UNIX qsub 1 nice 30 unitary nodes Number and or type of nodes needed by job See also section 9 1 Node Specification Syntax on page 117 node_spec pcput Per process maximum CPU time i e for any single pro cess in the job gqsub 1 pcput 3600 time pmem Per process maximum amount of physical memory i e for any single process of the job qsub 1 pmem 100mb size PBS Pro 5 4 31 User Guide Table 2 PBS Resources Available on All Systems Resource Meaning and Usage Units pvmem Per process maximum amount of
54. bs against the reservation using the qsub command naming the reservation queue on the command line with the q option e g qsub q R226 aims14 299 south Important The ability to submit query or delete advance reservations using the xpbs GUL is not available in the current release 8 9 2 Identification and Status When the user requests an advance reservation of resources via the pbs_rsub command an option I n is available to wait for confirmation response The value n that is specified is taken as the number of seconds that the command is willing to wait This value PBS Pro 5 4 111 User Guide can be either positive or negative A non negative value means that the Server scheduler response is needed in n or less seconds After that time the submitter will need to use pbs_rstat or some other means to descern success or failure of the request For a nega tive value the command will wait up to n seconds for the request to be either confirmed or denied If the response does not come back in n or fewer seconds the Server is to automatically delete the request from the system 8 9 3 Showing Status of PBS Reservations The pbs_rstat command is used to show the status of all the reservations on the PBS Server There are three different output formats brief short default and long The fol lowing examples illustrate these three options The short option S will show all the reservations in a s
55. cify name of 35 stage in 13 stage out 13 staging 5 12 26 98 G GASS 104 Global Grid Forum 4 Globus 9 103 defined 102 globusrun 102 103 Grid Toolkit 5 job states 104 jobs 103 RSL 103 Graphical user interface 16 Grid 3 4 5 Group defined 13 ID GID 13 GUI 16 H Hold defined 13 job 38 or release job 85 I Information Power Grid 4 Interactive batch jobs 42 Interdependency 5 J Job Account_Name 45 alt_id 47 batch 13 Checkpoint 45 comment 47 74 ctime 47 depend 45 dependencies 94 egroup 47 Error_Path 45 etime 47 euser 48 exec_host 47 Execution_Time 45 group_list 45 hashname 48 Hold_Types 45 identifier 24 interactive 48 Job_Name 46 Job_Owner 48 job_state 48 Join_Path 46 Keep_Files 46 Mail_ Points 46 Mail Users 46 management ix mtime 48 name 36 Output_Path 46 Priority 46 qtime 48 queue 48 queue_rank 48 queue_type 48 Rerunable 46 Resource_List 46 resources_used 48 selecting using xpbs 80 sending messages to 87 sending signals to 88 server 48 PBS Pro 5 4 131 Administrator Guide session_id 48 Shell_Path_List 46 stagein 47 stageout 47 states 64 79 80 104 substate 48 tracking 82 User_List 47 Variable_List 47 K Kerberos 114 KRB5 114 krb5 114 L Listbox 61 Load Balance 11 Load Leveling 5 M Manager 13 Message Passing Interface 124 meta computing 4 MOM 9 Monitoring 7 Moving jobs between queues 90 MP_HOSTFILE 119 MPI 124 MRJ Technology Solutions xi N NASA Ames Resea
56. cify value as Account_Name lt account_name gt See A option in qselect 1B for format of lt account_name gt the checkpoint attribute relationship including the logical oper ator to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Specify value as Checkpoint lt checkpoint_arg gt See c option in qse lect 1B for format of lt checkpoint_arg gt the hold types string to look for in a job when limiting the jobs appearing on the Jobs listbox in the main xpbs window Spec ify value as Hold_Types lt hold_string gt See h option in qselect 1B for format of lt hold_string gt the priority relationship including the logical operator to con sult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Specify value as Priority lt prior ity_value gt See p option in qselect 1B for format of lt priority_value gt the rerunnable attribute to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Spec ify value as Rerunnable lt rerun_val gt See r option in qse lect 1B for format of lt rerun_val gt name of the job that will be checked when limiting the jobs appearing on the Jobs listbox in the main xpbs window Spec ify value as Job_Name lt jobname gt See N option in qse lect 1B for format of lt jobname gt a boolean value true or false indicating whether or not to iconize
57. d the Output_Path attribute will be ignored If Keep_Files contains the values 0 KEEP_OUTPUT and or e KEEP_ERROR the corresponding streams of the batch job will be retained on the execution host upon job termination Keep_Files overrides the Output_Path and Error_Path attributes Identifies when the Server will send email about the job The set of users to whom mail may be sent when the job makes certain state changes The final path name for the file containing the job s standard output stream See the qsub and qalter command descrip tion for more detail The job scheduling priority assigned by the user The rerunable flag given by the user The resource list is a set of name value strings of the resources required by the job The value also establishes the limit of usage of that resource If not set the value for a resource may be de terminate by a queue or Server default established by the administrator A set of absolute paths of the program to process the job s script file stagein stageout User_List Variable Lis comment PBS Pro 5 4 47 User Guide The list of files to be staged in prior to job execution The list of files to be staged out after job execution The list of user hosts which determines the username under which the job is run on a given host This is the list of environment variables passed with the Queue Job batch request An attribute for displaying comments about the jo
58. de south started Thu Aug 23 south james workg subrun S Te Lo SS Not Running No available resources on nodes N South susan workgq solver 0 C C 6 1 13Display Queue Limits The q option to qstat displays any limits set on the requested or default queues Since PBS is shipped with no queue limits set any visible limits will be site specific The limits are listed in the format shown below o 3 qstat q server south Queue Memory CPU Time Walltime Node Run Que Im State workq 6 2 Viewing Job System Status with xpbs The main display of xpbs shows a brief listing of all selected Servers all queues on those Servers and any jobs in those queues that match the selection criteria discussed below Servers are listed in the HOST panel near the top of the display 76 Chapter 6 Checking Job System Status To view detailed information about a given Server i e similar to that produced by qstat fB select the Server in question then click the Detail button Likewise for details on a given queue i e similar to that produced by qstat 0 select the queue in question then click its corresponding Detail button The same applies for jobs as well i e qstat f You can view detailed information on any displayed job by selecting it and then clicking on the Detail button Note that the list of jobs displayed will be depen dent
59. de specification resource requirement but does specify a number of CPUs via the 1 ncpus syntax is allocated processors as if the job did have a node specification of the form 1 nodes 1 cpp PBS Pro 5 4 123 User Guide 9 4 Summary of Node Specification Options Table 12 Node Specification Options Desired Node Resulting ee CPU Process Node Specification Option to gsub PRS eee Layout NODEFILE Variable 1 CPU anywhere 1 ncpus 1 S 1 1 entire node 1 nodes 1 A 1 4 entire nodes 1 nodes 4 ABCD 1 3 CPUs on I node 1 nodes 1 ppn 3 AAA 3 3 CPUs oneachof 1 nodes 4 ppn 3 cpp 2 ABCDAB 2 4 nodes with 2 CDABCD CPUs allocated per process 2 nodes with 2 1 nodes 1 ppn 2 1 ppn 2 ABAB 23 virtual processors 2 per node 1 node with 2 vir 1 nodes 1 ppn 2 1 ppn 3 ABABB 2 tual processors 3 plus 1 node with 3 virtual processors 9 4 1 Time shared vs Cluster Nodes The difference between time shared and cluster nodes is Time share nodes may not be requested exclusively with the excl suffix More processes than CPUs can be run on time shared nodes but not on cluster nodes Allocation of cluster nodes remains based on the number of virtual processors 124 Chapter 9 Running Multi node Jobs 9 5 MPI Jobs with PBS On a typical system to execute a Message Passing Interface MPI program you would use the mpirun command For example here is a sample PBS scrip
60. dern workload management solutions like PBS Pro include the features of traditional batch queueing but offer greater flexibility and control than first generation batch systems such as the original UNIX batch system NQS Workload management systems have three primary roles Queuing The collecting together of work or tasks to be run on a computer Users submit tasks or jobs to the resource management system where they are queued up until the system is ready to run them Scheduling The process of selecting which jobs to run when and where according to a predetermined policy Sites balance competing needs and goals on the system s to maximize efficient use of resources both computer time and people time Monitoring The act of tracking and reserving system resources and enforcing usage policy This covers both user level and system level monitor ing as well as monitoring of the scheduling policies to see how well they are meeting the stated goals 8 Chapter 2 Concepts and Terms 2 1 PBS Components PBS consist of two major component types user level commands and system daemons A brief description of each is given here to help you understand how the pieces fit together and how they affect you Server PBS Commands MOM Batch Job Scheduler Commands Job Server PBS supplies both UNIX command line programs that are POSIX 1003 2d conforming and a graphical interface T
61. e a integer between 1024 and 1023 inclusive The default is no priority which is equiv alent to a priority of zero This option allows the user to specify a priority between jobs owned by that user Note that it is only advisory the Scheduler may choose to override your priorities in order to meet local scheduling policy If you need an absolute ordering of your jobs see Specifying Job Dependencies on page 94 o qsub p 120 mysubrun bin sh PBS p 300 4 9 11 Deferring execution The a date_time option declares the time after which the job is eligible for execu tion The date_time argument is in the form CC YY MM DD hhmm SS where CC is the first two digits of the year the century YY is the second two digits of the year MM is the two digits for the month DD is the day of the month hh is the hour mm is the minute and the optional SS is the seconds If the month MM is not specified it will default to the current month if the specified day DD is in the future Otherwise the month will be set to next month Likewise if the day DD is not specified it will default to today if the time hhmm is in the future Otherwise the day will be set to tomorrow For example if you submit a job at 11 15am with a time of 1110 the job will be eligible to run at 11 10am tomorrow Other examples include qsub a 0700 mysubrun bin sh PBS a 10220700 4 9 12 Holding a job d
62. e complete without special recog nition of the first two beta test sites Thomas Milliman of the Space Sciences Center of the University of New Hampshire was the first beta tester Wendy Lin of Purdue University was the second beta tester and holds the honor of submitting more problem reports than anyone else outside of NASA xii Acknowledgements PBS Pro 5 4 1 User Guide Chapter 1 Introduction This book the User Guide to the Portable Batch System Professional Edition PBS Pro is intended as your knowledgeable companion to the PBS Pro software The information herein pertains to PBS in general with specific information for PBS Pro 5 4 1 1 Book organization This book is organized into 9 chapters plus two appendices Depending on your intended use of PBS some chapters will be critical to you and others may be safely skipped Chapter 1 gives an overview of this book PBS and the PBS team Chapter 2 discusses the various components of PBS and how they interact fol lowed by definitions of terms used in PBS and in distributed work load management Chapter 3 introduces the user to PBS describing the user interfaces and the user s UNIX environment Chapter 4 describes the structure and components of a PBS job and explains how to create and submit a PBS job 2 Chapter 1 Introduction Chapter 5 introduces the xpbs graphical user interface and shows how to submit a PBS job using xpbs Chapter 6 describes how to c
63. ecedence If an option is present in a directive and not on the command line that option and its argu ment if any will be taken from there 4 5 User Authorization When the user submits a job from a system other than the one on which the PBS Server is running system level user authorization is required This authorization is needed for sub mitting the job for PBS to return output files see also Delivery of Output Files on page 97 and for file staging see also Input Output File Staging on page 98 Important The username under which the job is to be executed is selected according to the rules listed under the u option to qsub see Specifying job userID on page 39 The user submitting the job must be authorized to run the job under the execution user name whether explicitly specified or not Such authorization is provided by any of the following three methods 1 The host on which qsub is run i e the submission host is trusted by the execution host This permission may be granted at the system level by having the submission host as one of the entries in the execution host s etc host equiv file nam ing the submission host For file delivery and file staging the host representing the source of the file must be in the receiving host s etc host equiv file Such entries require system administrator access 2 The host on which qsub is run i e the submission host is explicitly trusted by each ex
64. ecified destination order for exchanging order of two selected jobs in a queue run for running selected job s admin only rerun for requeueing selected job s that are running admin only PBS Pro 5 4 55 User Guide The middle portion of the Jobs Panel has abbreviated column names indicating the infor mation being displayed as the following table shows Table 7 xpbs Job Column Headings Heading Meaning Job id Job Identifier Name Name assigned to job or script name User User name under which job is running PEs Number of Processing Elements CPUs requested CputUse Amount of CPU time used WalltUse Amount of wall clock time used S State of job Queue Queue in which job resides 5 2 5 xpbs Info Panel The Info panel shows the progress of the commands executed by xpbs Any errors are written to this area The INFO panel also contains a minimize maximize button for dis playing or iconizing the Info panel 5 2 6 xpbs Keyboard Tips There are a number of shortcuts and key sequences that can be used to speed up using xpbs These include Tip 1 All buttons which appear to be depressed into the dialog box sub window can be activated by pressing the return enter key Tip 2 Pressing the tab key will move the blinking cursor from one text field to another Tip 3 To contiguously select more than one entry click lt left mouse but ton gt then drag the mouse acros
65. ecution host via the user s rho sts file in his her home directory The rhost s must contain an entry for the system on which the job executed with the user name portion set to the name under which the job was executed PBS Pro 5 4 27 User Guide For file delivery and file staging the host representing the source of the file must be in the user s rhosts file on the receiving host It is recommended to have two lines per host one with just the base host name and one with the full host domain name 3 PBS may be configured to use the Secure Shell ssh scp for system access and file transfers If so configured the user should set up his her shosts file or SSH keys in a similar manner as described in method 2 above For further discussion on using ssh scp see Delivery of Output Files on page 97 For example the following entry in user susan s rhosts file on the execution host serverl would permit user susan to run jobs submitted from her workstation wks031 cat rhosts wks031 susan Furthermore in order for Susan s output files from her job to be returned to her automati cally by PBS she would need to add an entry to her rhosts file on her workstation naming the execution host serverl cat rhosts serverl susan If instead Susan has access to several execution hosts she would need to add all of them toher rhosts file cat rhosts serverl susan server2 susan server3 susan
66. elaying execution The h option specifies that a user hold be applied to the job at submission time The job will be submitted then placed in a hold state The job will remain ineligible to run until the hold is released For details on releasing a held job see Holding and Releasing Jobs on page 85 PBS Pro 5 4 39 User Guide o qsub h mysubrun bin sh PBS h 4 9 13 Specifying job checkpoint interval The c interval option defines the interval in minutes at which the job will be checkpointed if this capability is provided by the operating system i e under SGI IRIX and Cray Unicos If the job executes upon a host which does not support checkpointing this option will be ignored The interval argument is specified as n No checkpointing is to be performed s Checkpointing is to be performed only when the Server executing the job is shutdown c Checkpointing is to be performed at the default minimum time for the Server executing the job c minutes Checkpointing is to be performed at an interval of minutes which is the integer number of minutes of CPU time used by the job This value must be greater than zero u Checkpointing is unspecified thus resulting in the same behavior as moe S oe 29 If c is not specified the checkpoint attribute is set to the value u qsub c c mysubrun bin sh PBS c c 10 4 9 14 Specifying job userID The u user_1list
67. elds 4 nodes 3 processes 2 CPUs for a total 24 CPUs PBS Pro 5 4 121 User Guide 9 3 4 Order of Nodes in PBS_ NODEFILE If the job only requests one process per node the order of the nodes will match the order requested However if multiple processes are placed per node the file will contain each separate node first listed in order to match the request followed by the required number of repeating occurrences of each node For example if a user requests the following nodes bin sh PBS 1 nodes 1 ppn 3 1 ppn 2 1 ppn 1 then the PBS_ NODEFILE will contain A B C A B A This allows the user to have a parallel job step that runs only one process on each node e g by setting nproc 3 This is useful if the job requires files to setup one per node on each node before the main computation is preformed If a user specifies 1 nodes 1 ppn 3 cpp 2 then node A will be listed 3 times and the environment variables OMP_NUM_THREADS and NCPUS will both be set to 2 If the user specifies 1 nodes 1 ncpus 2 then node A will be listed in the PBS_NODEFILE once and the environment variables OMP_NUM_THREADS and NCPUS will both be set to 2 Which method is used depends on the individual application and how the user wants the processes to be distributed across the nodes allocated to the job 9 3 5 Interaction of nodes vs ncpus If a job is submitted to PBS with a nodes resource specification 1 nodes X and wit
68. entry is meant to be selected via a single lt left mouse button gt click lt shift key gt plus lt left mouse button gt click for contigu ous selection or lt cntrl key gt plus lt left mouse button gt click for non contiguous selection To the right of the Hosts Panel are a series of buttons that represent actions that can be per 52 Chapter 5 Using the xpbs GUI formed on selected hosts s Use of these buttons will be explained in detail below The buttons are detail provides information about selected Server host s This func tionality can also be achieved by double clicking on an entry in the Hosts listbox submit for submitting a job to any of the queues managed by the selected host s terminate for terminating PBS Servers on selected host s admin only Note that some buttons are only visible if xpbs is started with the admin option which requires manager or operator privilege to function The middle portion of the Hosts Panel has abbreviated column names indicating the infor mation being displayed as the following table shows Table 5 xpbs Server Column Headings Heading Meaning Max Maximum number of jobs permitted Tot Count of jobs currently enqueued in any state Que Count of jobs in the Queued state Run Count of jobs in the Running state Hd Count of jobs in the Held state Wat Count of jobs in the Waiting state Trn Count of jobs in the Transiting state Ex
69. equired to suspend or resume a job The three examples below all send a signal 9 SIGKILL to job 34 PBS Pro 5 4 89 User Guide qsig s SIGKILL 34 qsig s KILL 34 qsig s 9 34 Important On most systems the command kill 1 that s minus ell will list all the available signals The UNIX manual page for kill 1 usually also lists the available signals To send a signal to a job using xpbs first select the job s of interest then click the signal button Doing so will launch the Signal Running Job dialog box shown below From this window you may click on any of the common signals or you may enter the sig nal number or signal name you wish to send to the job Click the Signal button to complete the process 7 6 Changing Order of Jobs Within Queue PBS provides the qorder command to change the order or reorder two jobs To order two jobs is to exchange the jobs positions in the queue or queues in which the jobs 90 Chapter 7 Working With PBS Jobs resides The two jobs must be located at the same Server and both jobs must be owned by the user No attribute of the job such as priority is changed The impact of changing the order within the queue s is dependent on local job scheduling policy contact your sys tems administrator for details for details Important A job in the running state cannot be reordered Usage of the gorder command is gorder job_identifierl job_identifier2 Bo
70. estrictions on use see Interactive batch jobs on page 42 PBS Pro 5 4 61 User Guide aaa WaT img jjones 216 qeub I Jqaubs waiting for b 1634 origin arh coa ta start gal Job 16347 origin Arj Gih ready Job 1624 originoarj oom started on Thu Jet 7 OSteetS0 POT 1999 as origin 6 5 4 00 ory fo jonesi POS teat Lei PRS t4cpusborigin 2 dbx subrun w PRS 4cpushorigin 2022 logout Jyzubs Job 163442 0 ipin arj coh cone ete 5 6 Exiting xpbs Click on the Close button located in the Menu bar to leave xpbs If any settings have been changed xpbs will bring up a dialog box asking for a confirmation in regards to saving state information The settings will be saved in the xpbs configuration file and will be used the next time you run xpbs 5 7 The xpbs Configuration File Upon exit the xpbs state may be written to the user s SHOME xpbsrc file Informa tion saved includes the selected host s queue s and job s the different jobs listing cri teria the view states i e minimized maximized of the Hosts Queues Jobs and INFO regions and all settings in the Preferences section In addition there is a system wide xpbs configuration file maintained by the PBS Administrator which is used in the absence of a user s personal xpbsrc file 5 8 Widgets Used in xpbs The various panels boxes and regions collectively called widgets of xpbs and how they are manipulated are described in the fol
71. fiers which specify the jobs to be moved to the new destination To move jobs between queues or between Servers using xpbs select the job s of inter est and then click the move button Doing so will launch the Move Job dialog box from which you can select the queue and or Server to which you want the job s moved 92 Chapter 7 Working With PBS Jobs PBS Pro 5 4 93 User Guide Chapter 8 Advanced PBS Features This chapter covers the less commonly used commands and more complex topics which will add substantial functionality to your use of PBS The reader is advised to read chap ters 5 7 of this manual first 8 1 Job Exit Status The exit status of a job is normally the exit status of the shell executing the job script If a user is using csh and has a logout file in the home directory the exit status of csh becomes the exit status of the last command in logout This may impact the use of job dependencies which depend on the job s exit status To preserve the job s status the user may either remove logout or edit it as shown in this example set EXITVAL Sstatus previous contents remain unchanged exit SEXITVAL Doing so will ensure that the exit status of the job persists across the invocation of the Logout file 94 Chapter 8 Advanced PBS Features 8 2 Changing Job umask The W umask nnn option to qsub allows you to specify what umask PBS should use when creating and or co
72. file and to redirect output o Use the Environment Variables to Export subwindow to have current environment variables exported to the job o Use the Job Name field in the OPTIONS subwindow to give the job a name o Use the Notify email address and one of the buttons in the OPTIONS subwindow to have PBS send you mail when the job terminates Now that the script is built you have four options of what to do next Reset options to default Save the script to a file Submit the job as a batch job Submit the job as an interactive batch job Reset clears all the information from the submit job dialog box allowing you to create a job from a fresh start Use the FILE field in the upper left corner to define a filename for the script Then press the Save button This will cause a PBS script file to be generated and written to the named file Pressing the Confirm Submit button at the bottom of the Submit window will submit the PBS job to the selected destination xpbs will display a small window containing the job identifier returned for this job Clicking OK on this window will cause it and the Submit window to be removed from your screen Alternatively you can submit the job as an interactive batch job by clicking the Interac tive button at the bottom of the Submit Job window Doing so will cause a xterminal win dow xterm to be launched and within that window a PBS interactive batch job submitted For details and r
73. file at host over to localfile at the executing Globus machine The same process is used for a stageout directive localfile host outputfile PBS will take care of copying the localfile on the executing Globus host over to the output file at host Globus mechanisms are used for transferring files to hosts that run Globus otherwise pbs_scp pbs_rcp or cp is used This means that if the host given in the argument runs Globus then Globus communication will be opened to that system 8 8 4 Limitation PBS does not currently support co allocated Globus jobs where two or more jobs are simultaneously run distributed over two or more Globus resource managers PBS Pro 5 4 105 User Guide 8 8 5 Examples Here are some examples of using PBS with Globus Example 1 If you want to run a single processor job on globus gatekeeper mars pbspro com using whatever job manager is currently configured at that site then you could create a PBS script like the following example cat job script PBS 1 site globus mars pbspro com echo hostname Hello world Globus style Upon execution this will give the sample output mars Hello world Globus style Example 2 If you want to run a multi processor job on globus gatekeeper pluto pbspro com jobmanager fork with cpu count set to 4 and shipping the architecture compatible executable mpit est over to the Globus host pluto for execution then compose a script and submit as fo
74. following examples illustrate the most common uses for job dependencies Suppose you have three jobs job1 job2 and job3 and you want job3 to start after job1 and job2 have ended The first example below illustrates the options you would use on the qsub command line to implement these job dependencies qsub jobl 16394 jupiter pbspro com o qsub job2 16395 jupiter pbspro com qsub W depend afterany 16394 16395 job3 16396 jupiter pbspro com As another example suppose instead you want job2 to start only if jobl ends with no errors i e it exits with a no error status qsub jobl 16397 jupiter pbspro com qsub W depend afterok 16397 job2 16396 jupiter pbspro com PBS Pro 5 4 97 User Guide You can use xpbs to specify job dependencies as well On the Submit Job window in the other options section far left center of window click on one of the three dependency but tons after depend before depend or concurrency These will launch a Dependency win dow in which you will be able to set up the dependencies you wish The After Dependency dialog box is shown below 8 5 Delivery of Output Files To transfer output files or to transfer staged in or staged out files to from a remote destina tion PBS uses either rcp or scp depending on the configuration options PBS includes a version of the rcp 1 command from the BSD 4 4 lite distribution renamed pbs_rcp This version of rcp is provided bec
75. formation being displayed as the following table shows Table 6 xpbs Queue Column Headings Heading Meaning Max Maximum number of jobs permitted Tot Count of jobs currently enqueued in any state Ena Is queue enabled yes or no Str Is queue started yes or no Que Count of jobs in the Queued state Run Count of jobs in the Running state Hd Count of jobs in the Held state Wat Count of jobs in the Waiting state Trn Count of jobs in the Transiting state Ext Count of jobs in the Exiting state Type Type of queue execution or route Server Name of Server on which queue exists 54 Chapter 5 Using the xpbs GUI 5 2 4 xpbs Jobs Panel The Jobs panel is composed of a leading horizontal JOBS bar a listbox and a set of com mand buttons The JOBS bar lists the queues that are consulted when listing jobs the bar also contains a minimize maximize button for displaying or iconizing the Jobs region The listbox displays information about jobs that are found in the queue s selected from the Queues listbox each listbox entry is meant to be selected highlighted via a single lt left mouse button gt click lt shift key gt plus lt left mouse button gt click for contiguous selection or lt cntrl key gt plus lt left mouse button gt click for non contiguous selection The region just above the Jobs listbox shows a collection of command buttons whose labels describe criteria used for filtering t
76. h NQS and PBS The existing script is copied and PBS directives PBS are inserted prior to each NQS directive either QSUB or Q in the original script nqs2pbs existing NQS script new PBS script Important Converting NQS date specifications to the PBS form may result in a warning message and an incomplete converted date PBS does not support date specifications of today tomorrow or the name of the days of the week such as Monday If any of these are encoun tered in a script the PBS specification will contain only the time portion of the NQS specification i e PBS a hhmm ss It is suggested that you specify the execution time on the qsub com mand line rather than in the script All times are taken as local time If any unrecognizable NQS directives are encountered an error message is displayed The new PBS script will be deleted if any errors occur Section Setting Up Your UNIX Linux Environment on page 18 discusses PBS environ ment variables For NQS compatibility the variable ENVIRONMENT is provided set to the same value as PBS_ ENVIRONMENT A queue complex in NQS was a grouping of queues within a batch Server The purpose of a complex was to provide additional control over resource usage The advanced schedul ing features of PBS eliminates the requirement for queue complexes 128 Appendix B Converting From NQS NQE to PBS Index A Access Control 4 113 Account 12 Accoun
77. hat the user wishes to run In this case PBS sees lines 6 7 as being user commands We will see shortly how to use the qsub command to submit PBS jobs Any option that you specify to the gsub command line can also be provided as a PBS directive inside the PBS script PBS directives come in two types resource requirements and job control options 24 Chapter 4 Submitting a PBS Job In our example above lines 2 4 specify the 1 resource list option followed by a spe cific resource request Specifically lines 2 4 request hour of wall clock time 400 mega bytes MB of memory and 4 CPUs Line 5 is not a resource directive Instead it specifies how PBS should handle some aspect of this job Specifically the j oe requests that PBS join the stdout and stderr output streams of the job into a single stream Finally line 7 is the command line for executing the program we wish to run our example submarine simulation application subrun While only a single command is shown in this example e g subrun you can specify as many programs tasks or job steps as you need 4 2 Creating a PBS Job There are several ways to create a PBS job The most common are by using your favorite text editor and by using the PBS graphical user interface GUI The rest of this chapter discusses creating and submitting jobs using the command line interface The next chapter explains in detail how to use the xpbs GUI to create and s
78. he Jobs listbox contents The list of jobs can be selected according to the owner of jobs Owners job state Job_States name of the job Job_Name type of hold placed on the job Hold_Types the account name associated with the job Account_Name checkpoint attribute Checkpoint time the job is eligible for queueing execution Queue_Time resources requested by the job Resources prior ity attached to the job Priority and whether or not the job is rerunnable Rerunnable The selection criteria can be modified by clicking on any of the appropriate command but tons to bring up a selection box The criteria command buttons are accompanied by a Select Jobs button which when clicked will update the contents of the Jobs listbox based on the new selection criteria Note that only jobs that meet all the selected criteria will be displayed Finally to the right of the Jobs panel are the following command buttons for operating on selected job s detail provides information about selected job s This functionality can also be achieved by double clicking on a Jobs listbox entry modify for modifying attributes of the selected job s delete for deleting the selected job s hold for placing some type of hold on selected job s release for releasing held job s signal for sending signals to selected job s that are running msg for writing a message into the output streams of selected job s move for moving selected job s into some sp
79. he PBS Products unit as a subsidiary company named Altair Grid Technologies focused on PBS Pro and related Grid software 4 Chapter 1 Introduction 1 4 About the PBS Team The PBS Pro product is being developed by the same team that originally designed PBS for NASA In addition to the core engineering team Altair Grid Technologies includes individuals who have supported PBS on computers all around the world including some of the largest supercomputers in existence The staff includes internationally recognized experts in resource and job scheduling supercomputer optimization message passing programming parallel computation and distributed high performance computing In addition the PBS team includes co architects of the NASA Metacenter the first full pro duction geographically distributed meta computing grid co architects of the Department of Defense MetaQueueing prototype Grid Project co architects of the NASA Informa tion Power Grid and co chair of the Global Grid Forum s Scheduling Group 1 5 About Altair Engineering Through engineering consulting and high performance computing technologies Altair Engineering increases innovation for more than 1 500 clients around the globe Founded in 1985 Altair s unparalleled knowledge and expertise in product development and manu facturing extend throughout North America Europe and Asia Altair specializes in the development of high end open CAE software solutions for modeling vi
80. heck status of a job and request status of queues nodes systems or PBS Servers Chapter 7 discusses commonly used commands and features of PBS and explains how to use each one Chapter 8 describes and explains how to use the more advanced features of PBS Chapter 9 explains how PBS interacts with multi node and parallel appli cations and illustrates how to run such applications under PBS Appendix A provides a quick reference summary of PBS environment vari ables Appendix B includes information for converting from NQS NQE to PBS 1 2 What is PBS Pro PBS Pro is the professional version of the Portable Batch System PBS a flexible work load management system originally developed to manage aerospace computing resources at NASA PBS has since become the leader in supercomputer workload management and the de facto standard on Linux clusters Today growing enterprises often support hundreds of users running thousands of jobs across different types of machines in different geographical locations In this distributed heterogeneous environment it can be extremely difficult for administrators to collect detailed accurate usage data or to set system wide resource priorities As a result many computing resource are left under utilized while other are over utilized At the same time users are confronted with an ever expanding array of operating systems and platforms Each year scientists engineers designers and analysts must w
81. hese are used to submit monitor modify and delete jobs These cli ent commands can be installed on any system type supported by PBS and do not require the local presence of any of the other components of PBS There are three command classifications user commands which any authorized user can use operator commands and manager or administrator commands Operator and manager commands which require specific access privileges are dis cussed in the PBS Pro Administrator Guide The Job Server daemon is the central focus for PBS Within this document it is generally referred to as the Server or by the execution name pbs_server All commands and the other dae Job Executor MOM Job Scheduler PBS Pro 5 4 9 User Guide mons communicate with the Server via an Internet Protocol IP network The Server s main function is to provide the basic batch services such as receiving creating a batch job modifying the job protecting the job against system crashes and running the job Nor mally there is one Server managing a given set of resources How ever if the Server Failover feature is enabled there will be two Servers The Job Executor is the daemon which actually places the job into execution This daemon pbs_mom is informally called MOM as it is the mother of all executing jobs MOM is a reverse engineered acronym that stands for Machine Oriented Mini server MOM places a job into execution when it receives a copy of the j
82. hese definitions before beginning the planning process prior to installation of PBS The terms are defined in an order that best allows the definitions to build on previous terms Node Cluster Node Timeshared Node Cluster Exclusive VP A node to PBS is a computer system with a single operating system OS image a unified virtual memory space one or more CPUs and one or more IP addresses Frequently the term execution host is used for node A computer such as the SGI Origin 3000 which contains multiple CPUs running under a single OS is one node Systems like the IBM SP and Linux clusters which contain separate computational units each with their own OS are collections of nodes Nodes can be defined as either cluster nodes or timeshared nodes as discussed below A node whose purpose is geared toward running multi node or parallel jobs is called a cluster node If a cluster node has more than one virtual processor the VPs may be assigned to different jobs job shared or used to satisfy the requirements of a single job exclusive This ability to temporally allocate the entire node to the exclusive use of a single job is important for some multi node parallel applications Note that PBS enforces a one to one allocation scheme of cluster node VPs ensuring that the VPs are not over allocated or over subscribed between multiple jobs See also node and virtual processors In contrast to cluster nodes are hosts that always service multi
83. hort concise form This is the default display if no options are given The information provided is the identifier of the reservation name of the queue that got created for the reservation user who owns the res ervation the state the start time duration in seconds and the end time pbs_rstat S Name Queue User State Start Duration End R226 R226 james CO Today 11 30 1800 Today 1 R302 R302 barry CO Today 15 50 1800 Today 1 R304 R304 james CO Today 15 46 1800 Today 1 The brief option B will only show the identifiers of all the reservations pbs_rstat B Name R226 sout Name R302 sout Name R304 sout The full option f will print out the name of the reservation followed by all the attributes of the reservation 112 Chapter 8 Advanced PBS Features pbs_rstat f R226 Name R226 south Reserve_Owner james south reserve_type 2 reserve_state RESV_CONFIRME reserve_substate 2 reserve_start Fri Aug 24 11 30 00 2001 reserve_end Fri Aug 24 12 00 00 2001 reserve_duration 1800 queue R226 Resource_List Resource_List neednodes Resource_List nodect 1 nepus 1 Resource_List nodes 1 Resource_List walltime 00 30 00 uthorized_Users james south south Fri Aug 24 06 30 53 2001 Fri Aug 24 06 30 53 2001 ariable_List PBS_O_LOGNAME james PBS_O_HOST south user james egroup pbs 8 9 4 Delete PBS Reservations The pb
84. hout an ncpus specification no 1 ncpus y then PBS will set the ncpus resource to match the number of CPUs required by the nodes specification The following relates what happens if a job is submitted with a combination of 1 ncpus value and 1l nodes value The variable X is used to indicate any integer larger than 1 Y and Z can be any legal inte ger The variable N is any integer equal to or greater than 1 Given 1 ncpus N 1 nodes value if value is 122 Chapter 9 Running Multi node Jobs 1 Then ok and then nodes value is equivalent to 1 ncpus N Xi Xj Then let X sum Xi Xj The value of ncpus must be a multiple of the number of nodes if N X 0 then ncpus N and the nodes value becomes X cpp N X That is N X cpus per node If N X 0 then this is an error If value includes ppn and does not include cpp then the results are the same as in the above case cpp N number of tasks For any other case X ppn Y cpp Z a default N value is replaced by sum of X Y Z across the node spec E g 1 nodes 2 blue ppn 2 cpp 3 3 is 15 cpus so ncpus is reset to 15 A non default ncpus value is an error when cpp is specified in the node specification Here are a few more examples lncpus 6 lnodes 3 blue 3 red is correct lncpus 12 lnodes 3 blue 3 red is also correct but lIncpus 10 lnodes 3 bluet 3 red is not because 10 CPUs cannot be spread evenly across 6 nodes A job that does not have a no
85. hown in this bold fixed width font Following UNIX tradition manual page references include the corresponding section number in parentheses appended to the man page name Words or terms being defined as well as variable names are in italics PBS Pro 5 4 xi User Guide Acknowledgements PBS Pro is the enhanced commercial version of the PBS software originally developed for NASA The NASA version had a number of corporate and individual contributors over the years for which the PBS developers and PBS community is most grateful Below we pro vide formal legal acknowledgements to corporate and government entities then special thanks to individuals The NASA version of PBS contained software developed by NASA Ames Research Cen ter Lawrence Livermore National Laboratory and MRJ Technology Solutions In addi tion it included software developed by the NetBSD Foundation Inc and its contributors as well as software developed by the University of California Berkeley and its contribu tors Other contributors to the NASA version of PBS include Bruce Kelly and Clark Streeter of NERSC Kent Crispin and Terry Heidelberg of LLNL John Kochmar and Rob Penning ton of Pittsburgh Supercomputing Center and Dirk Grunwald of University of Colorado Boulder The ports of PBS to the Cray T3e and the IBM SP SMP were funded by DoD USAERDC the port of PBS to the Cray SV1 was funded by DoD MSIC No list of acknowledgements for PBS would possibly b
86. ick the hold or release button 7 4 Sending Messages to Jobs To send a message to a job is to write a message string into one or more output files of the job Typically this is done to leave an informative message in the output of the job Such messages can be written using the qmsg command Important A message can only be sent to running jobs The usage syntax of the qmsg command is gmsg E O message_string job_identifier The E option writes the message into the error file of the specified job s The O option writes the message into the output file of the specified job s If neither option is specified the message will be written to the error file of the job The first operand message_st ring is the message to be written If the string contains blanks the string must be quoted If the final character of the string is not a newline a newline character will be added when written to the job s file All remaining operands are job_identifiers which specify the jobs to receive the message string For example sg E hello to my error e file 55 sg O hello to my output o file 55 sg this too will go to my error e file 55 To send a message to a job using xpbs first select the job s of interest then click the msg button Doing so will launch the Send Message to Job dialog box as shown below From this window you may enter the message you wish to send and indicate whether it should be written
87. iew as the mouse is moved Causes the view in the associated window to shift down by one unit i e the object appears to move up one unit in its window If the button is held down the action will auto repeat The area between the top arrow and the slider Causes the view in the associated window to shift up by one less than the number of units in the window i e the portion of the object that used to appear at the very top of the window will now appear at the very bottom If the button is held down the action will auto repeat The area between the bottom arrow and the slider Causes the view in the associated window to shift down by one less than the number of units in the window i e the portion of the object that used to appear at the very bottom of the window will now appear at the very top If the button is held down the action will auto repeat PBS Pro 5 4 63 User Guide An entry widget brought into focus with a click of the left mouse button To manipulate this widget simply type in the text value Use of arrow keys mouse selection of text for deletion or overwrite copying and pasting with sole use of mouse buttons are permitted This widget is usually accompanied by a scrollbar for horizontally scanning a long text entry string A matrix of entry boxes is usually shown as several rows of entry widgets where a number of entries called fields can be found per row The matrix is accompanied by up down arrow buttons for
88. ined to have the following ordered rela tionship n gt s gt c minutes gt c gt u If the optional op is not specified jobs will be selected whose Checkpoint attribute is equal to the interval argument Restricts the selection of jobs to those with a specific set of hold types Only those jobs will be selected whose Hold_Types attribute exactly match the value of the hold_list argument The hold _list argument is a string consisting of one or more occur rences the single letter n or one or more of the letters u o or s in any combination If letters are duplicated they are treated as if they occurred once The letters represent the hold types 78 Chapter 6 Checking Job System Status resource_list N name p op priority q destination Letter Meaning n none u user o operator S system Restricts selection of jobs to those with specified resource amounts Only those jobs will be selected whose Resource_List attribute matches the specified relation with each resource and value listed in the resource_list argument The relation operator op must be present The resource_list is in the following for mat resource_nameopvalue resource_nameopval Restricts selection of jobs to those with a specific name Restricts selection of jobs to those with a priority that matches the specified relationship If op is not specified jobs are selected for which the job Priority attribute
89. isplays the status of all running jobs at the optionally speci fied PBS Server Running jobs include those that are running and suspended One line of output is generated for each job reported and the information is presented in the alterna tive display 6 1 8 List Non Running Jobs The i option to qstat displays the status of all non running jobs at the optionally specified PBS Server Non running jobs include those that are queued held and waiting One line of output is generated for each job reported and the information is presented in the alternative display see description above 74 Chapter 6 Checking Job System Status 6 1 9 Display Size in Gigabytes The G option to gst at displays all jobs at the requested or default Server using the alternative display showing all size information in gigabytes GB rather than the default of smallest displayable units Note that if the size specified is less than 1 GB then the amount if rounded up to 1 GB 6 1 10Display Size in Megawords The M option to gst at displays all jobs at the requested or default Server using the alternative display showing all size information in megawords MW rather than the default of smallest displayable units A word is considered to be 8 bytes 6 1 11List Nodes Assigned to Jobs The n option to qstat displays the nodes allocated to any running job at the option ally specified PBS Server in addition to the
90. job 3 Holding jobs until a particular job starts or completes execution The w depend dependency_list option to qsub defines the dependency between multiple jobs The dependency_list has the format PBS Pro 5 4 95 User Guide type argument argument type argument The argument is either a numeric count or a PBS job identifier according to type If argu ment is a count it must be greater than 0 If it is a job identifier and not fully specified in the form seq_number server name it will be expanded according to the default Server rules which apply to job identifiers on most commands If argument is null the proceeding colon need not be specified the dependency of the corresponding type is cleared unset synccount count This job is the first in a set of jobs to be executed at the same time count is the number of additional jobs in the set syncwith jobid This job is an additional member of a set of jobs to be executed at the same time In the above and following dependency types jobid is the job identifier of the first job in the set after jobid jobid This job may be scheduled for execution at any point after jobs jobid have started execution afterok jobid jobid This job may be scheduled for execution only after jobs jobid have terminated with no errors See the csh warning under User s PBS Environment on page 18 afternotok jobid jobid This job may be scheduled for execution only
91. job is treated just like a regular batch job in that it is queued up and has to wait for resources to become available before it can run Once it is started however the user s ter minal input and output are connected to the job in what appears to be an r login session It appears that the user is logged into one of the available execution machines and the resources requested by the job are reserved for that job Many users find this useful for debugging their applications or for computational steering 3 3 The Two Faces of PBS PBS provides two user interfaces a command line interface CLI and a graphical user interface GUI The CLI lets you type commands at the system prompt The GUI is a graphical point and click interface Table 1 lists all the PBS Pro user and administrator commands The user commands are discussed in this book the administrator com mands are discussed in the PBS Pro Administrator Guide The subsequent chapters of this book will explain how to use both the CLI and GUI versions of the user commands to create submit and manipulate PBS jobs PBS Pro 5 4 17 User Guide Table 1 PBS Pro User and Manager Commands User Commands Administrator Commands Command Purpose Command Purpose nqs2pbs Convert from NQS pbs report Report job statistics pbs_rdel Delete Adv Reservation pbs_hostid Report host identifier pbs_rstat Sta
92. jobs 4 7 Single node vs Multi node Jobs In PBS jobs can be run either on a single system single node jobs or on two or more sys tems multi node jobs The basic usage of the two types of jobs is the same The primary difference is in how you request resources for your job Single node jobs can simply state the number of CPUs and amount of individual resources needed Multi node jobs how ever given their nature have a more complex method of requesting resources which enables you to specify how many different systems nodes your job needs as well as the number of CPUs on each and even how many tasks per CPU you wish to use etc The examples in this chapter apply to both types of jobs however the resource requests given are for single node jobs Additional details for multi node jobs are given in Chapter 9 Running Multi node Jobs on page 117 4 8 PBS System Resources You can request a variety of resources that can be allocated and used by your job includ ing CPUs memory time walltime or cputime and or disk space As we saw above resources are specified using the 1 resource_list option to qsub or in your job PBS Pro 5 4 29 User Guide script Doing so defines the resources that are required by the job and establishes a limit to the amount of resource that can be consumed If not set for a generally available resource the limit is infinite The resource_list argument is of the form resource_name value
93. l messages to the specified list of users This option takes a string consisting of any combination of a b c or e Default is 29 ac a notify if the reservation is terminated for any reason b notify when the reservation period begins e notify when the reservation period ends c notify when the reservation is confirmed Specifies the list of users to whom the Server will attempt to send a mail message whenever the reservation transitions to one of the mail states specified in the m option Default reserva tion s owner Specifies a comma separated list of entries of the form user host Entries on this list are used by the Server in con junction with an ordered set of rules to associate a user name with the reservation Specifies a comma separated list of entries of the form U auth_user_list G auth_group_list H auth_host_list N reserv ation_name resource_list I seconds PBS Pro 5 4 109 User Guide group host names Entries on this list are used by the Server in conjunction with an ordered set of rules to associate a group name with the reservation Specifies a comma separated list of entries of the form user host These are the users who are allowed or denied permission to submit jobs to the queue associated with this reserva tion This list becomes the acl_users attribute for the reserva tion s queue Specifies is a comma separ
94. le is passed to the job in the environment variable PBS_NODEFILE For IBM SP systems it is also in the variable MP_HOSTF ILE The order that the nodes are listed in the PBS_NODEFILE depends on the node_spec request This is further illustrated in the Examples section below 9 3 Examples This section contains several examples for using the various options of the node specifica tion syntax 9 3 1 Basic node_spec Usage A simple node_spec requests a specific number of nodes This is the most common case for single and dual processor Linux clusters where all the nodes are identical It doesn t matter to your job which nodes are allocated to it Thus requesting simply the number of nodes needed is sufficient The following examples each request eight nodes qsub 1 nodes 8 mysubrun bin sh PBS 1 nodes 8 9 3 2 Requesting Multiple Processes Per Node A user may request to run multiple processes per node by adding the terms ppn for processes per node to each node expression As a simple case consider bin sh PBS 1 nodes 1 ppn 3 which requests to run three processes on a single node One node would be allocated to the job as would three virtual processors on that node Furthermore the allocated node will be listed in the PBS_NODEF ILE three times one per line 120 Chapter 9 Running Multi node Jobs The following example request two virtual processors on each of three nodes and the PB
95. le the Track Job feature xpbs will monitor your jobs looking for the out put files that signal completion of the job The Track Job button will flash red on the xpbs main display and if you then click it xpbs will display a list of all completed jobs that you were previously tracking Selecting one of those jobs will launch a window con taining the standard output and standard error files associated with the job To enable xpbs job tracking click on the Track Job button at the top center of the main xpbs display Doing so will bring up the Track Job dialog box shown below From this window you can name the users whose jobs you wish to monitor You also need to specify where you expect the output files to be either local or remote e g will the files be retained on the Server host or did you request them to be delivered to another host Next click the start reset tracking button and then the close window button Note that you can disable job tracking at any time by clicking the Track Job button on the main xpbs display and then clicking the stop tracking button PBS Pro 5 4 83 User Guide Chapter 7 Working With PBS Jobs This chapter introduces the reader to various commands useful in working with PBS jobs Covered topics include modifying job attributes holding and releasing jobs sending mes sages to jobs changing order of jobs within a queue sending signals to jobs and deleting jobs In each section below the command
96. llows cat job script PBS 1 site globus pluto pbspro com 763 jobmanager fork C US O Communications Package OU Stellar Divi sion CN shirley com org PBS 1 ncpus 4 PBS W stagein mpitest earth pbspro com progs mpitest mpirun n 4 mpitest wait Upon execution this sample script would produce the following output 106 Chapter 8 Advanced PBS Features Process 2 of 4 on host pluto at time Mon Aug 29 17 39 01 2000 Process 3 of 4 on host pluto at time Mon Aug 29 17 39 01 2000 Process 1 of 4 on host pluto at time Mon Aug 29 17 39 01 2000 Process 0 of 4 on host pluto at time Mon Aug 29 17 39 01 2000 Example 3 Here is a more complicated example If you want to run a SGI specified MPI job on a host e g sgi galaxey com which is running a different batch system via the Globus gatekeeper with a cpu count of 4 and shipping the architecture compatible executable to the Globus host and sending the output file back to the submitting host then do cat job script PBS 1 site globus sgi galaxey com jobmanager Il1sf ncpus 4 PBS W stagein u jill mpi_sgi earth progs mpi_sgi PBS W stageout mpi_sgi out earth mpi_sgi out mpirun np 4 u jill mpi_sgi gt gt mpi_sgi out echo o o Done it Upon execution the sample output is Done it And the output of the run would have been written to the file mpi_sgi out and returned to the user
97. lowing sections A listbox can be multi select able a number of entries can be selected highlighted using a mouse click or single select able one entry can be highlighted at a time 62 Chapter 5 Using the xpbs GUI For a multi selectable listbox the following operations are allowed a single click with mouse button 1 to select highlight an entry b shift key mouse button to contiguously select more than one entry c cntrl key mouse button 1 to non contiguously select more than one entry NOTE For systems running Tk versions prior to 4 0 the newly selected item is reshuffled to appear next to already selected items d click the Select All Deselect All button to select all entries or deselect all entries at once e double clicking an entry usually activates some action that uses the selected entry as a parameter A scrollbar usually appears either vertically or horizontally and contains 5 distinct areas that are mouse clicked to achieve different effects top arrow slider bottom arrow top gap bottom gap Causes the view in the associated widget to shift up by one unit i e the object appears to move down one unit in its window If the button is held down the action will auto repeat Pressing button 1 in this area has no immediate effect except to cause the slider to appear sunken rather than raised However if the mouse is moved with the button down then the slider will be dragged adjusting the v
98. lt is defined by an administrator established file usu ally etc pbs conf The environment variable PBS_ DPREFIX determines the pre fix string which identifies directives in the job script The default prefix string is PBS 3 8 Temporary Scratch Space TMPDIR PBS creates an environment variable TMPDIR which contains the full path name to a temporary scratch directory created for each PBS job The directory will be removed when the job terminates PBS Pro 5 4 23 User Guide Chapter 4 Submitting a PBS Job This chapter discusses the different parts of a PBS job and how to create and submit a PBS job Topics such as requesting resources and specifying limits on jobs are also covered 4 1 A Sample PBS Job As we saw in the previous chapter a PBS job is a shell script containing the resource requirements job attributes and the set of commands you wish to execute Let s look at an example PBS job in detail bin sh PBS 1 walltime 1 00 00 PBS 1 mem 400mb PBS 1 ncpus 4 PBS j oe NYO OB WNE subrun The first line is standard for any shell script it specifies which shell to use to execute the script The Bourne shell sh is the default but you can change this to your favorite shell Lines 2 5 are PBS directives PBS reads down the shell script until it finds the first line that is not a valid PBS directive then stops It assumes the rest of the script is the list of commands or tasks t
99. ly works under the C shell asusp is the auxiliary suspend character usually CNTL Y 4 10 Single Node Conditional Requests PBS Pro offers the ability to use boolean logic in the specification of certain resources such as architecture memory wallclock time and CPU count within a single node A new resource specification string resc_spec attribute has been added called resc Used with the resource list option 1 to the qsub command this feature provides more control over selecting nodes on which to run your job Important At this time this feature controls the selection of single nodes with the meaning of allocate my job a node with the following proper ties This feature does not apply to multi node jobs For example say you wanted to submit a job that can run on either the Solaris or Irix oper ating system and you want PBS to run the job on the first available node of either type You could add the following resc specification to your qsub command line or your job qsub lresc arch solaris7 arch irix mysubrun bin sh PBS 1 resc arch solaris7 arch irix PBS 1 mem 100MB PBS 1 ncpus 4 44 Chapter 4 Submitting a PBS Job You could in fact combine all three of the lines in the above example into a single resc_spec specification if you so desired qsub l resc arch solaris7 Il arch irix amp amp mem 100MB amp amp ncpus 4
100. me 51 south barry workgq airfoil 930 lt 1 0 13 R 0 01 54 south barry workq airfoil Sa 4 qalter 1 walltime 20 00 N engine 54 qstat a 54 Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 54 south barry workq engin dee SSS QE2 0 Or To alter a job attribute via xpbs first select the job s of interest and the click on modify button Doing so will bring up the Modify Job Attributes dialog box From this window you may set the new values for any attribute you are permitted to change Then click on the confirm modify button at the lower left of the widow 7 2 Deleting Jobs PBS provides the qde1 command for deleting jobs from the system The qde1 command deletes jobs in the order in which their job identifiers are presented to the command A job that has been deleted is no longer subject to management by PBS A batch job may be deleted by its owner a PBS operator or a PBS administrator PBS Pro 5 4 User Guide qdel 17 To delete a job using xpbs first select the job s of interest then click the delete button 85 7 3 Holding and Releasing Jobs PBS provides a pair of commands to hold and release jobs To hold a job is to mark it as ineligible to run until the hold on the job is released The qhold command requests that a Server place one or more holds on a job A job that has a hold is not eligible for execution There are three types of holds
101. mp Doing so will bring up the main xpbs window as shown below 5 2 Introducing the xpbs Main Display The main window or display of xpbs is comprised of five collapsible subwindows or panels Each panel contains specific information Top to bottom these panel are the Menu Bar Hosts panel Queues panel Jobs panel and the Info panel 5 2 1 xpbs Menu Bar The Menu Bar is composed of a row of command buttons that signal some action with a click of the left mouse button The buttons are Manual Update forces an update of the information on hosts queues and jobs Auto Update sets an automatic update of information every user specified number of minutes Track Job for periodically checking for returned output files of jobs Preferences for setting parameters such as the list of server host s to query Help contains some help information About givens general information about the xpbs developer Close for exiting xpbs plus saving the current setup information PBS Pro 5 4 User Guide 51 Marmal Update Ate Uplate Track Joh Peefiemnces Help about elewe pao SRS a N 5 2 2 xpbs Hosts Panel The Hosts panel is composed of a leading horizontal HOSTS bar a listbox and a set of command buttons The HOSTS bar contains a minimize maximize button identified by a dot or a rectangular image for displaying or iconizing the Hosts region The listbox dis plays information about favorite Server host s and each
102. n the FILE entry box and then click on the load button Alternatively you may click on the FILE button which will display a File Selec tion browse window from which you may point and click to select the file you wish to open The File Selection Dialog window is shown below Clicking on the Select File but ton will load the file into xpbs just as does the load button described above The various fields in the Submit window will get loaded with values found in the script file The script file text box will only be loaded with executable lines non PBS found in the script The job script header box has a Prefix entry box that can be modified to specify the PBS directive to look for when parsing a script file for PBS options If you don t have a existing script file to load into xpbs you can start typing the execut able lines of the job in the file text box Next review the Destination listbox This box lists all the queues found in the host that you selected A special entry called host refers to the default queue at the indicated host Select appropriately the destination queue for the job Next define any required resources in the Resource List subwindow Finally review the optional settings to see if any should apply to this job 60 Chapter 5 Using the xpbs GUI For example o Use the one of the buttons in the Output region to merge out put and error files o Use Stdout File Name to define standard output
103. ng the user name and by a unique number the user id Task is a POSIX session started by MOM on behalf of a job Privilege to access system resources and services is typically established by the user id which is a numeric identifier uniquely assigned to each user see User A node may be declared to consist of one or more virtual pro cessors VPs The term virtual is used because the number of VPs declared does not have to equal the number of real proces sors CPUs on the physical node The default number of vir tual processors on a node is the number of currently functioning physical processors the PBS Manager can change the number of VPs as required by local policy See also cluster node and timeshared node PBS Pro 5 4 15 User Guide Chapter 3 Getting Started With PBS This chapter introduces the user to the Portable Batch System PBS It describes new user level features in this release explains the different user interfaces introduces the concept of a PBS job and shows how to set up your environment for running batch jobs with PBS 3 1 New Features in PBS Pro 5 4 For users already familiar with PBS the following is a list of new features and changes in PBS Pro release 5 4 which affect users More detail is given in the indicated sections 1 New blocking qsub option See Requesting qsub Wait for Job Completion on page 94 2 New option to control output file permissions See Changi
104. ng Job umask on page 94 3 Features for using PBS within a DCE environment See Running PBS in a DCE Environment on page 114 Important The full list of new features in this release of PBS is given in the PBS Pro Administrator Guide 16 Chapter 3 Getting Started With PBS 3 2 Introducing PBS Pro From the user s perspective a workload management system allows you to make more efficient use of your time You specify the tasks you need executed The system takes care of running these tasks and returning the results back to you If the available computers are full then the workload management system holds your work and runs it when the resources are available With PBS you create a batch job which you then submit to PBS A batch job is simply a shell script containing the set of commands you want run on some set of execution machines It also contains directives which specify the characteristics attributes of the job and resource requirements e g memory or CPU time that your job needs Once you create your PBS job you can reuse it if you wish Or you can modify it for subsequent runs For example here is a simple PBS batch job bin sh PBS 1 walltime 1 00 00 PBS 1 mem 400mb ncpus 4 subrun Don t worry about the details just yet the next chapter will explain how to create a batch job of your own PBS also provides a special kind of batch job called interactive batch An interactive batch
105. ny files as you need to stage When you are done selecting files click the OK button PBS Pro 5 4 101 User Guide 8 7 The pbsdsh Command The pbsdsh command allows you to distribute and execute a task to all the nodes assigned to your job by PBS pbsdsh uses the PBS Task Manager API see tm 3 to distribute the program on the allocated nodes Usage of the pbsdsh command is pbsdsh c N n N o s v command args The available options are 102 Chapter 8 Advanced PBS Features c N The program is spawned on the first N nodes allocated If the value of N is greater than the number of nodes it will wrap around running multiple copies on the nodes This option is mutually exclusive with n n N The program is spawned on a single node which is the N th node allocated This option is mutual exclusive with c o The program will not wait for the tasks to finish s f this option is given the program is run sequentially on each node one after the other v Verbose output about error messages and task exit status is pro duced When run without the c or the n option pbsdsh will spawn the program on all nodes allocated to the PBS job The execution take place concurrently all copies of the task exe cute at about the same time The following example shows the pbsdsh command inside of a PBS batch job The options indicate that the user wants pbsdsh to run the myapp program with one argu ment
106. ob from a Server MOM creates a new session that is as identical to a user login session as is possible For example if the user s login shell is csh then MOM creates a session in which login is run as well as cshrc MOM also has the responsibility for returning the job s output to the user when directed to do so by the Server One MOM daemon runs on each computer which will execute PBS jobs A special version of MOM called the Globus MOM is available if it is enabled during the installation of PBS It handles submission of jobs to the Globus environment Globus is a software infrastructure that integrates geographically distributed computational and infor mation resources Globus is discussed in more detail in the PBS Pro Administrator Guide To find out if Globus support is enabled at your site contact your PBS system administrator The Job Scheduler daemon pbs_sched implements the site s pol icy controlling when each job is run and on which resources The Scheduler communicates with the various MOMs to query the state of system resources and with the Server for availability of jobs to execute The interface to the Server is through the same API as used by the client commands Note that the Scheduler interfaces with the Server with the same privilege as the PBS manager 10 Chapter 2 Concepts and Terms 2 2 Defining PBS Terms The following section defines important terms and concepts of PBS The reader should review t
107. ob identifier assigned by PBS The job name given by the submitter The job owner The CPU time used 68 Chapter 6 Checking Job System Status The job state The queue in which the job resides The job state is abbreviated to a single character Job is exiting after having run Job is held Job is queued eligible to run or be routed Job is running Job is in transition being moved to a new location Job is waiting for its requested execution time to be reached or job specified a stage in request which failed for some reason S Job is suspended J wonm The following example illustrates the default display of qstat Name Time Use aims14 0 aims14 0 airfoil 00 21 03 airfoil 21 09 12 subrun 0 tns3d O airfoil 0 seq_35_3 O Oooo ADS a An alternative display accessed via the a option is also provided that includes extra information about jobs including the following additional fields Session ID Number of nodes requested Number of parallel tasks or CPUs Requested amount of memory Requested amount of wallclock time Elapsed time in the current job state PBS Pro 5 4 69 User Guide Req d Elap Jobname Sess NDS TSK Mem Time aimsl aimsl airfoil 930 subrun tns3d airfoil seq_35_ Wi OO OOO Other options which utilize the alternative display are discussed in subsequent sections of this chapter 6 1
108. ologies PBS Pro 5 4 User Guide Table of Contents Listof LANES 55 cscs cinsesnssoncslensccon consasdasoousob sas ekecipseesseaseoneasins Vii PQTACE suis E E T ix Acknowledgements oesssoesssosccesssoossssoocsssosesesoscossssoosesso xi 1 Introduction essesssescoscorseesoesoossoesoesoossoesoesoossossoessossee 1 Book organization esseesssseesseessessereseresseeessresseesse 1 What is PBS PrO c enren culties aaa aee 2 History of PB Saarani pni a E e 3 Abo t th PBS TSAI sci catieg sa sah e te cases neninn 4 About Altair Engineering 0 eee eee eeneeeeeeeeeee 4 Why Use PBS sasssssacadisicscntesedtaunidaniedtiassiaa cues 4 2 Concepts and Terms vs sccssccecsssceoeceonsesesnoserventoerveesnesaes 7 PBS Components oscccccscsiesssccessaccedascavedeadearccdessdceevarens 8 Defining PBS Terms 222 04 seein Meas 10 3 Getting Started With PBS scsssccsssscsssccsees 15 New Features in PBS Pro 3 4 jac scedecadscsnieecentielss 15 Introducing PBS Pro ssssssesssesssessssseessresserssesesees 16 The Two Faces of PBS eeeeeerereererrererrrersn 16 User s PBS Environment escceseeeeeeeeseeteeeee 18 Setting Up Your UNIX Linux Environment 18 Setting Up Your Windows Environment 19 Environment Variables ccceescceesseceeeseeeesteeees 21 Temporary Scratch Space TMPDIR e ee 22 4 Submitting a PBS JOb ssesssesccesocesoocsssecssccesoossooseoo 23 A Sam
109. ontaining a list of nodes assigned to the job PBS_NODENUM Logical node number of this node allocated to the job PBS_O_HOME Value of HOME from submission environment PBS_O_HOST The host name on which the gsub command was executed PBS_O_LANG Value of LANG from submission environment PBS_O_LOGNAME Value of LOGNAME from submission environment PBS_O_MAIL Value of MAIL from submission environment PBS_O_PATH Value of PATH from submission environment PBS_O_QUEUE The original queue name to which the job was submitted PBS_O_SHELL Value of SHELL from submission environment PBS_O_SYSTEM The operating system name where qsub was executed 126 Appendix A PBS Environment Variables Table 13 PBS Environment Variables Variable Meaning PBS_O_TZ Value of TZ from submission environment PBS_O_WORKDIR The absolute path of directory where qsub was executed PBS QUEUE The name of the queue from which the job is executed PBS _TASKNUM The task process number for the job on this node TMPDIR The job specific temporary directory for this job PBS Pro 5 4 User Guide 127 Appendix B Converting From NQS to PBS For those converting to PBS from NQS or NQE PBS includes a utility called nqs2pbs which converts an existing NQS job script so that it will work with PBS In fact the resulting script will be valid to bot
110. or files on page 41 k keep Retaining output and error files on execution host on page 41 resources_list l resc resc_spec l nodes node_spec PBS System Resources on page 28 Single Node Conditional Requests on page 43 Running Multi node Jobs on page 117 M user_list Setting e mail recipient list on page 36 m MailOptions Specifying e mail notification on page 36 N name Specifying a job name on page 36 0 path Redirecting output and error files on page 35 P spraoraty Setting a job s priority on page 38 q destination Specifying Queue and or Server on page 34 r value Marking a job as rerunnable or not on page 37 S path_list Specifying which shell to use on page 37 u user_list Specifying job userID on page 39 V Exporting environment variables on page 35 v variable_list Expanding environment variables on page 35 34 Chapter 4 Submitting a PBS Job Table 4 Options to the qsub Command Option Function and Page Reference W depend list Specifying Job Dependencies on page 94 W group_list list Specifying job groupID on page 40 W stagein list Input Output File Staging on page 98 W stageout list Input Output File Staging on page 98 W cred dc Running PBS in a DCE Environment on page 114
111. other information presented in the alternative display The node information is printed immediately below the job see job 51 in the example below and includes the node name and number of virtual processors assigned to the job i e south 0 where south is the node name followed by the virtual pro cessor s assigned A text string of is printed for non running jobs Notice the differ ences between the queued and running jobs in the example below Req d Elap Jobname Sess NI TSK Mem Time Time aimsl 1 01 aims1 airfoil subrun 6 1 12Display Job Comments The s option to qstat displays the job comments in addition to the other informa tion presented in the alternative display The job comment is printed immediately below PBS Pro 5 4 75 User Guide the job By default the job comment is updated by the Scheduler with the reason why a given job is not running or when the job began executing A text string of is printed for jobs whose comment has not yet been set The example below illustrates the different type of messages that may be displayed o qstat s Req d Elap Job II User Que Jobname Sess NDS TSK Mem Time Time 16 south james workg aims14 e zs l gt Job held by james on Wed Aug 22 13 06 11 20 south james workg aims14 RE Waiting on user requested start time south barry workg airfoil 930 a Job run on no
112. ping your stdout and stderr files and any other files you direct PBS to transfer on your behalf The following example illustrates how to set your umask to 022 i e to have files created with write permission for owner only rw r r qsub W umask 022 mysubrun bin sh PBS W umask 022 8 3 Requesting qsub Wait for Job Completion The W block true option to qsub allows you to specify that you want qsub to wait for the job to complete i e block and report the exit value of the job If job sub mission fails no special processesing will take place If the job is successfully submitted qsub will block until the job terminates or an error occurs If qsub receives one of the signals SIGHUP SIGINT SIGQUIT or SIGTERM it will print a message and then exit with the exit status 2 If the job is deleted before running to completion or an internal PBS error occurs an error message describing the error will be printed and qsub will exit with an exit status of 3 If the job runs to completion qsub will exit with the exit status of the job See also sec tion 8 1 Job Exit Status on page 93 for further discussion of the job exit status 8 4 Specifying Job Dependencies PBS allows you to specify dependencies between two or more jobs Dependencies are use ful for a variety of tasks such as 1 Specifying the order in which jobs in a set should execute 2 Requesting a job run only if an error occurs in another
113. ple PBS JOD is 5scieiseaesstensdcatisssicantsdeactaneneaaes 23 Creating a PBS J b nn a 24 Submitting a PBS JOb ssesseesseesseessessseresssressee 24 How PBS Parses a Job Script eeeeeeeeeeeeteeees 25 User Authorization eeeeeseseeeeeseeereereesererrsrrrsreseee 26 User Authorization Windows Security Tokens 27 Single node vs Multi node Jobs 0 eee eeeeeeeee 28 PBS System ResOurces lt s 00 c10 jo deise se ceaseoedetee ees 28 Job Submission Options 00 0 ceeeeseesseceseeeeeeeeees 32 Single Node Conditional Requests 0 c eee 43 iii iv Table of Contents Job Attributes sciccicingacint tiated aia ie 45 5 Using the xpbs GUI seessoessosessecssocesocesoocssocssseessoee 49 User s xpbs Environment cccceesceeeeseeeeteeees 49 Introducing the xpbs Main Display 50 Setting xpbs Preferences 5 34 siccsececdscecicoteasetdncccosse 56 Relationship Between PBS and xpbs 4 56 How to Submit a Job Using xpbs eee 58 ERINE KPDS inonsan 61 The xpbs Configuration File eee eeeeeeeeeeee 61 Widgets Used in Xpbs oiss sccsstsesdesssrteorte eee maees 61 xpbs X Windows Preferences 63 6 Checking Job System Status scccssssessscesees 67 The gstat Command n 3 si6sscacesussteaaseseacaediacess 67 Viewing Job System Status with xpbs 0 75 The qselect Command i ci08 ccscscaversceetesceorsreaczeresess 76 Selecting Jobs Using xpbs
114. ppings of options from PBS to an RSL string may be of use to you Table 10 qsub Options vs Globus RSL PBS Option Globus RSL Mapping 1 site globus lt globus_gatekeeper gt specifies the gatekeeper to contact 1 ncpus yy count yy A lt account_name gt project lt account_name gt 1 cput yy maxcputime yy in minutes 1 pcput yy maxtime yy in minutes 1 walltime yy maxwalltime yy in minutes 1 mem zz maxmemory zz in megabytes 104 Chapter 8 Advanced PBS Features Table 10 qsub Options vs Globus RSL PBS Option Globus RSL Mapping o lt output_path gt stdout lt local_output_path gt e lt error_path gt stderr lt local_error_path gt PBS will deliver from ocal_path to user specified output_path and stderr_path v lt variable_list gt environment lt variable_list gt job type single When the job gets submitted to Globus PBS qstat will report various state changes according to the following mapping Table 11 PBS Job States vs Globus States PBS State Globus State TRANSIT T PENDING RUNNING R ACTIVE EXITING E FAILED EXITING E DONE 8 8 3 PBS File Staging through GASS The stagein stageout feature of Globus aware PBS works with Global Access to Sec ondary Storage GASS software Given a stagein directive local file host input file PBS will take care of copying input
115. put to qstat can also be used to supply input to other PBS commands as well 6 4 Selecting Jobs Using xpbs The xpbs command provides a graphical means of specifying job selection criteria offer ing the flexibility of the gselect command in a point and click interface Above the JOBS panel in the main xpbs display is the Other Criteria button Clicking it will bring up a menu that lets you choose and select any job selection criteria you wish The example below shows a user clicking on the Other Criteria button then selecting Job States to reveal that all job states are currently selected Clicking on any of these job states would remove that state from the selection criteria PBS Pro 5 4 User Guide 81 Namal Updats Auto Update Track Job references __ Melp Abawt clove ETE m l You may specify as many or as few selection criteria as you wish When you have com pleted your selection click on the Select Jobs button above the HOSTS panel to have xpbs refresh the display with the jobs that match your selection criteria The selected cri teria will remain in effect until you change them again If you exit xpbs you will be prompted if you wish to save your configuration information this includes the job selec tion criteria 82 Chapter 6 Checking Job System Status 6 5 Using xpbs TrackJob Feature The xpbs command includes a feature that allows you to track the progress of your jobs When you enab
116. rch Center 3 and PBS xi 2 Information Power Grid 4 Metacenter 4 NCPUS 125 Network Queueing System NQS 3 127 nqs2pbs 127 Node 132 Index attribute 11 defined 10 node_spec 29 30 33 nodes 30 property 11 nqs2pbs 17 O OM_NUM_THREADS 125 OMP_NUM_THREADS 124 OpenMP 124 Operator 13 Ordering Software and Publications x output files 26 Owner 13 P Parallel job support 5 jobs 117 Virtual Machine PVM 124 PBS availability 6 PBS_DEFAULT 22 PBS_ENVIRONMENT 18 125 pbs_hostid 17 pbs_hostn 17 PBS_JOBCOOKIE 125 PBS_JOBID 125 PBS_JOBNAME 125 pbs_mom_globus 103 PBS_MOMPORT 125 PBS_NODEFILE 119 120 121 125 PBS_NODENUM 125 PBS_O_HOME 125 PBS_O_HOST 125 PBS_O_LANG 125 PBS_O_LOGNAME 125 PBS_O_MAIL 125 PBS_O_PATH 125 PBS_O_QUEUE 125 PBS_O_SHELL 125 PBS_O_SYSTEM 125 PBS_O_TZ 126 PBS_O_WORKDIR 126 pbs_probe 17 PBS_QUEUE 126 pbs_rcp 17 97 pbs_rdel 17 112 pbs_rstat 17 111 pbs_rsub 17 107 PBS_TASKNUM 126 pbs_tclsh 17 pbsdsh 17 101 pbsfs 17 pbsnodes 17 pbspoe 17 pbs report 17 Portable Batch System 11 POSIX defined 13 printjob 17 Processes Tasks vs CPUs 120 procs 32 PVM 124 Q qalter 17 83 qdel 17 84 qdisable 17 qenable 17 ghold 17 85 qmer 17 qmove 17 90 qmsg 17 87 gorder 17 89 qrerun 17 qrls 17 86 qrun 17 qselect 17 64 76 qsig 17 88 qstart 17 qstat 17 67 qstop 17 qsub 17 24 32 94 114 qterm 17 Queue defined 11 Queuing ix 7 Quick Start Guide ix R resc
117. ring option defines the account string associated with the job The account_string is an opaque string of characters and is not interpreted by the Server which executes the job This value is often used by sites to track usage by locally defined account names Important Under IRIX and Unicos if the Account string is specified it must be a valid account as defined in the system User Data Base UDB PBS Pro 5 4 41 User Guide qsub A Math312 mysubrun bin sh PBS A accountNumber 4 9 17 Merging output and error files The j join option declares if the standard error stream of the job will be merged with the standard output stream of the job A join argument value of oe directs that the two streams will be merged intermixed as standard output A join argument value of eo directs that the two streams will be merged intermixed as standard error If the join argu ment is n or the option is not specified the two streams will be two separate files o qsub j oe mysubrun bin sh PBS j eo 4 9 18 Retaining output and error files on execution host The k keep option defines which if either of standard output STDOUT or stan dard error STDERR of the job will be retained on the execution host If set this option overrides the path name for the corresponding file If not set neither file is retained on the execution host The argument is either the single letter e or o or the letters e and
118. ronment By optional we mean that the customer may acquire a copy of PBS Pro with the standard security and authentication module replaced with the KRB5 module To use a forwardable renewable Kerberos V5 TGT specify the W cred krb5 option to qsub This will cause qsub to check the user s credential cache for a valid forwardable renewable TGT which it will send to the Server and then eventually to the execution PBS Pro 5 4 115 User Guide MOM While it s at the Server and the MOM this TGT will be periodically refreshed until either the job finishes or the maximum refresh time on the TGT is exceeded which ever comes first If the maximum refresh time on the TGT is exceeded no KRBS services will be available to the job even though it will continue to run 116 Chapter 8 Advanced PBS Features PBS Pro 5 4 117 User Guide Chapter 9 Running Multi node Jobs If PBS has been set up to manage a cluster of computers or on a parallel system it is likely with the intent of managing multi node parallel jobs PBS can allocate individual nodes to multiple jobs at the same time called time sharing or to a single job at a time providing exclusive access also called space sharing When a job requests exclusive access the entire node is allocated to the job regardless of the number of processors or the amount of memory in the node Users should explicitly request such exclusive access if desired To have PBS allocate nodes to a user
119. s e g via rcp or scp 3 5 Setting Up Your UNIX Linux Environment A user s job may not run if the user s start up files i e cshrc login or profile contain commands which attempt to set terminal characteristics Any such command sequence within these files should be skipped by testing for the environment variable PBS_ENVIRONMENT This can be done as shown in the following sample Login setenv MANPATH usr man usr local man SMANPATH if PBS_ ENVIRONMENT then do terminal settings here endif You should also be aware that commands in your startup files should not generate output when run under PBS As in the previous example commands that write to stdout should not be run for a PBS job This can be done as shown in the following sample login setenv MANPATH usr man usr local man SMANPATH if PBS_ ENVIRONMENT then do terminal settings here run command with output here endif When a PBS job runs the exit status of the last command executed in the job is reported by the job s shell to PBS as the exit status of the job We will see later that this is impor PBS Pro 5 4 19 User Guide tant for job dependencies and job chaining However the last command executed might not be the last command in your job This can happen if your job s shell is csh on the exe cution host and you have a logout there In that case the last command executed is from the logout and
120. s home directory on host earth as specified Note Just like a regular PBS job a Globus job can be deleted signaled held released rerun have text appended to its output error files and be moved from one location to another 8 9 Advance Reservation of Resources An Advance Reservation is a set of resources with availability limited to a specific user or group of users a specific start time and a specified duration Advance Reservations are implemented in PBS by a user submitting a reservation with the pbs_rsub command PBS will then confirm that the reservation can be met or else reject the request Once the scheduler has confirmed the reservation the queue that was created to support this reser PBS Pro 5 4 107 User Guide vation will be enabled allowing jobs to be submitted to it The queue will have an user level access control list set to the user who submitted the reservation and any other users the owner specified The queue will accept jobs in the same manner as normal queues When the reservation start time is reached the queue will be started Once the reservation is complete any jobs remaining in the queue or still running will be deleted and the reser vation removed from the Server When a reservation is requested and confirmed it means that a check was made to see if the reservation would conflict with currently running jobs other confirmed reservations and dedicated time A reservation request that fails
121. s multiple entries Tip 4 To non contiguously select more than one entry hold the lt cntrl key gt while clicking the lt left mouse button gt on the desired entries 56 Chapter 5 Using the xpbs GUI 5 3 Setting xpbs Preferences In the Menu Bar at the top of the main xpbs window is the Preferences button Clicking it will bring up a dialog box that allows you to customize the behavior of xpbs 2 3 4 Define Server hosts to query Select wait timeout in seconds Specify which xterm command to use Specify which rsh ssh command to use 5 4 Relationship Between PBS and xpbs xpbs is built on top of the PBS client commands such that all the features of the com mand line interface are available thru the GUI Each task that you perform using xpbs is converted into the necessary PBS command and then run on your behalf PBS Pro 5 4 57 User Guide Table 8 xpbs Buttons and PBS Commands Location Command Button PBS Command Hosts Panel detail qstat B f selected server_host s Hosts Panel submit qsub options selected server s Hosts Panel terminate qterm selected server_host s Queues Panel detail qstat Q f selected queue s Queues Panel stop qstop selected queue s Queues Panel start qstart selected queue s Queues Panel enable qenable selected queue s Queues Panel disable qdisable selected queue s Jobs Panel detail qstat f
122. s_rdel command deletes reservations in the order in which their reservation identifiers are presented to the command A reservation may be deleted by its owner a PBS operator or PBS Manager Note that when a reservation is deleted all jobs belonging to the reservation are deleted as well regardless of whether or not they are currently run ning pbs_rdel R304 8 9 5 Accounting Accounting records for advance resource reservations are available in the Server s job accounting file The format of such records closely follows the format that exists for job PBS Pro 5 4 113 User Guide records In addition any job that belong to an advance reservation will have the reserva tion ID recorded in the accounting records for the job 8 9 6 Access Control A site administrator can inform the Server as to those hosts groups and users whose advance resource reservation requests are or are not to be considered The philosophy in this regard is same as that which currently exists for jobs In a similar vein the user who submits the advance resource reservation request can spec ify to the system those other parties user s or group s that are authorized to submit jobs to the reservation queue that s to be created When this queue is instantiated these specifications will supply the values for the queue s user group access control lists Likewise the party who submits the reservation can if desired control the username and group name a
123. selected job s Jobs Panel modify qalter selected job s Jobs Panel delete qdel selected job s Jobs Panel hold qhold selected job s Jobs Panel release qris selected job s Jobs Panel run qrun Selected job s Jobs Panel rerun qrerun selected job s Jobs Panel signal qsig selected job s Jobs Panel msg qmsg selected job s Jobs Panel move qmove selected job s Jobs Panel order qorder selected job s Indicates command button is visible only if xpbs is started with the admin option 58 Chapter 5 Using the xpbs GUI 5 5 How to Submit a Job Using xpbs To submit a job using xpbs perform the following steps First select a host from the HOSTS listbox in the main xpbs display to which you wish to submit the job Next click on the Submit button located next to the HOSTS panel The Submit button brings up the Submit Job Dialog box see below which is composed of four distinct regions The Job Script File region is at the upper left The OPTIONS region containing various widgets for setting job attributes is scattered all over the dialog box The OTHER OPTIONS is located just below the Job Script file region and COMMAND BUTTONS region is at the bottom The job script region is composed of a header box the text box FILE entry box and two buttons labeled load and save If you have a script file containing PBS options and execut PBS Pro 5 4 59 User Guide able lines then type the name of the file o
124. sualization opti mization and process automation 1 6 Why Use PBS PBS Pro provides many features and benefits to both the computer system user and to companies as a whole A few of the more important features are listed below to give the reader both an indication of the power of PBS and an overview of the material that will be covered in later chapters in this book Enterprise wide Resource Sharing provides transparent job scheduling on any PBS sys tem by any authorized user Jobs can be submitted from any client system both local and remote crossing domains where needed Multiple User Interfaces provides a graphical user interface for submitting batch and interactive jobs querying job queue and system status and monitoring job progress Also provides a traditional command line interface Security and Access Control Lists permit the administrator to allow or deny access to PBS systems on the basis of username group host and or network domain PBS Pro 5 4 5 User Guide Job Accounting offers detailed logs of system activities for charge back or usage analysis per user per group per project and per compute host Automatic File Staging provides users with the ability to specify any files that need to be copied onto the execution host before the job runs and any that need to be copied off after the job completes The job will be scheduled to run only after the required files have been successfully transferred Parallel Job
125. t Count of jobs in the Exiting state Status Status of the corresponding Server PEsInUse Count of Processing Elements CPUs PEs Nodes in Use 5 2 3 xpbs Queues Panel The Queues panel is composed of a leading horizontal QUEUES bar a listbox and a set of command buttons The QUEUES bar lists the hosts that are consulted when listing queues the bar also contains a minimize maximize button for displaying or iconizing the Queues panel The listbox displays information about queues managed by the Server host s PBS Pro 5 4 53 User Guide selected from the Hosts panel each listbox entry is meant to be selected highlighted via a single lt left mouse button gt click lt shift key gt plus lt left mouse button gt click for contigu ous selection or lt cntrl key gt plus lt lift mouse button gt click for non contiguous selection To the right of the Queues Panel area are a series of buttons that represent actions that can be performed on selected queue s detail stop start disable enable provides information about selected queue s This functionality can also be achieved by double clicking on a Queue listbox entry for stopping the selected queue s admin only for starting the selected queue s admin only for disabling the selected queue s admin only for enabling the selected queue s admin only The middle portion of the Queues Panel has abbreviated column names indicating the in
126. t for a MPI job bin sh PBS 1 nodes 32 mpirun np 32 machinefile SPBS_NOI Or when using a version of MPI that is integrated with PBS bin sh PBS 1 nodes 32 mpirun np 32 a out 9 6 PVM Jobs with PBS On a typical system to execute a Parallel Virtual Machine PVM program you would use the pvmexec command For example here is a sample PBS script for a PVM job bin sh PBS 1 nodes 32 pvmexec a out inputfile datain 9 7 OpenMP Jobs with PBS To provide support for OpenMP jobs the environment variable OMP_NUM_THREADS is created for the job with the value of the number of CPUs allocated to the job The variable NCPUS is also set to this value PBS Pro 5 4 User Guide 125 Appendix A PBS Environment Variables Table 13 PBS Environment Variables Variable Meaning ENVIRONMENT Indicates if job is a batch job or a PBS interactive job NCPUS Number of threads or cpus per process cpp on the node OM_NUM_THREADS Same as NCPUS PBS_ENVIRONMENT Same as ENVIRONMENT PBS_JOBCOOKIE Unique identifier for inter MOM job based communication PBS_JOBID The job identifier assigned to the job by the batch system PBS_JOBNAME The job name supplied by the user PBS_MOMPORT Port number on which this job s MOMs will communicate PBS_NODEFILE The filename c
127. t the Server that the Server associates with the reservation 8 10 Checkpointing SGI MPI Jobs Under Irix 6 5 and later MPI parallel jobs as well as serial jobs can be checkpointed and restarted on SGI systems provided certain criteria are met SGI s checkpoint system call cannot checkpoint processes that have open sockets Therefore it is necessary to tell mpirun to not create or to close an open socket to the array services daemon used to start the parallel processes One of two options to mpirun must be used cpr This option directs mpirun to close its connection to the array services daemon when a checkpoint is to occur miser This option directs mpirun to directly create the parallel process rather than use the array services This avoids opening the socket connection at all The miser option appears the better choice as it avoids the socket in the first place If the cpr option is used the checkpoint will work but will be slower because the socket connection must be closed first Note that interactive jobs or MPMD jobs more than one executable program can not be checkpointed in any case Both use sockets and TCP IP to communicate outside of the job for interactive jobs and between programs in the MPMD case 114 Chapter 8 Advanced PBS Features 8 11 Running PBS in a DCE Environment PBS Pro includes optional support for DCE By optional we mean that the customer may acquire a copy of PBS Pro with the standard securit
128. ted in the directory u james of the computer called server The staged in file is requested to be placed relative to the users home directory under the name of dat 1 bin sh PBS W stagein datl server u james grid dat PBS uses rcp or scp or cp if the remote host is the local host to perform the transfer Hence stage in and stage out are just rep r remote_host remote_file local_file rep r local_file remote_host remote_file As with rcp the remote_file may be a directory name The local_file specified in the stage in out directive may name a directory For stage in if remote_file isa directory then local_file must also be a directory Likewise for stage out if local_fileisa directory then remote_file must be a directory If Llocal_file on a stage out directive is a directory that directory on the execution host including all files and subdirectories will be copied At the end of the job the direc tory including all files and subdirectories will be deleted Users should be aware that this may create a problem if multiple jobs are using the same directory The same requirements and hints discussed above in regard to delivery of output apply to staging files in and out Wildcards should not be used in either the Local_file or the remote_file name PBS does not expand the wildcard character on the local system If wildcards are used in the remote_file name since rcp is launched by rsh to the remote system the e
129. ted per node is the prod uct of the number of processes per node ppn times the number of CPUs per process cpp The node_spec may be followed by one or more global modifiers see the suf fix in the syntax line above These are special flags which apply to every node_spec in the entire specification Any site defined property may be used as global modifiers as well as the two pre defined modifiers shared requesting shared access to a node and excl requesting exclusive access to the entire node regardless of the number of CPUs requested The shared modifier is used when you are willing to share your nodes with other jobs This typically results in less wait in the queue but the individual runtime of the job may be longer The exc1 modifier is used to request exclusive access the nodes regardless of the number of CPUs on the nodes No other job will be allocated those nodes while your jobs is running on them This typically results in longer queue wait times but faster runtime for the individual application Exclusive access will not be granted to time shared nodes Important The keywords shared and excl only apply when used as global modifiers PBS Pro 5 4 119 User Guide 9 2 Job specific Nodes File PBS provides the names of the all the nodes allocated to a particular job in a job specific file in the directory usr spool PBS aux The file is owned by root but is world readable The full path and name of the fi
130. th operands are job_identifiers which specify the jobs to be exchanged qstat u barry Req d Elap Job Il User Queue Jobname TSK Mem Time S Time 54 south barry workgq twinkie hye 0220 63 south barry workg airfoil Eo SiO 3 gorder 54 63 qstat u barry Req d Job ID User Queue Jobname Sess NDS Mem Time 63 south barry workg airfoil 1 0 13 54 south barry workgq twinkie 1 0 20 To change the order of two jobs using xpbs select the two jobs and then click the order button 7 7 Moving Jobs Between Queues PBS provides the qmove command to move jobs between different queues even queues on different Servers To move a job is to remove the job from the queue in which it resides and instantiate the job in another queue Important A job in the running state cannot be moved PBS Pro 5 4 91 User Guide The usage syntax of the qmove command is qmove destination job_identifier s The first operand is the new destination for queue server queue server If the destination operand describes only a queue then qmove will move jobs into the queue of the specified name at the job s current Server If the dest ination operand describes only a Server then qmove will move jobs into the default queue at that Server If the destination operand describes both a queue and a Server then qmove will move the jobs into the specified queue at the specified Server All following operands are job_identi
131. the requested hold attribute This will have no effect unless the job is requeued with the qrerun command Similarly the qrls command releases a hold on a job However the user executing the qrls command must have the necessary privilege to release a given hold The same rules apply for releasing a hold as exist for setting a hold The usage syntax of the qr 1s command is gris h hold_list job_identifier The following examples illustrate how to use both the qhold and qrls commands Notice that the state S column shows how the state of the job changes with the use of these two commands qstat a 54 Req d Job Il User Jobname Sess NDS TSK Mem Time 54 south barry ngin da gt CO 20 qhold 54 qstat a 54 Req d Job IIl User Jobname Sess NDS TSK Mem Time 54 south barry ngin 1 0 20 gris h u 54 qstat a 54 Req d Job ID User Jobname Sess NDS TSK Mem Time 54 south barry ngin 0 20 If you attempted to release a hold on a job which is not on hold the request will be ignored If you use the gr1s command to release a hold on a job that had been previously PBS Pro 5 4 87 User Guide running and subsequently checkpointed the hold will be released and the job will return to the queued Q state and be eligible to be scheduled to run when resources come avail able To hold or release a job using xpbs first select the job s of interest then cl
132. third applied by the system itself or the PBS Manager See also Operator and Manager in this glossary The basic execution object managed by the batch subsystem A job is a collection of related processes which is managed as a whole A job can often be thought of as a shell script running in a POSIX session A non singleton job consists of multiple tasks of which each is a POSIX session One task will run the job shell script The manager is the person authorized to use all restricted capabilities of PBS The Manager may act upon the Server queues or jobs The Manager is also called the administrator A person authorized to use some but not all of the restricted capabilities of PBS is an operator The owner is the user who submitted the job to PBS This acronym refers to the various standards developed by the Technical Committee on Operating Systems and Application Environments of the IEEE Computer Society under standard P1003 If a PBS job can be terminated and its execution restarted from the beginning without harmful side effects the job is rerunable This process refers to moving a file or files to the execution host prior to the PBS job beginning execution This process refers to moving a file or files off of the execution host after the PBS job completes execution 14 Chapter 2 Concepts and Terms User Task User ID UID Virtual Processor VP Each system user is identified by a unique character stri
133. ting and Exiting The last column gives the type of the queue routing or execution qstat Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type workgd O 10 yes yes 7 L 1 L 0 0 Execution The full display for a queue provides additional information qstat Qf Queue workg queue_type Execution total_jobs 10 state_count Transit 0 Queued 7 Held 1 Waiting 1 Running 1 Exiting 0 resources_assigned ncpus 1 hasnodes False enabled True started True 6 1 5 Viewing Job Information We saw above that the f option could be used to display full or long information for queues and Servers The same applies to jobs By specifying the f option and a job identifier PBS will print all information known about the job e g resources requested resource limits owner source destination queue etc as shown in the following exam ple See Job Attributes on page 45 for a description of attribute 72 Chapter 6 Checking Job System Status qstat f 89 Job Id 89 south Job_Name tns3d Job_Owner susan south pbspro com resources_used cput 00 00 00 resources_used mem 2700kb resources_used ncpus 1 resources_used vmem 5500kb resources_used walltime 00 00 00 job_state R queue workg server south Checkpoint u ctime Thu Aug 23 10 11 09 2003 Error_Path south u susan tns3d e89 exec_host south 0 Hold_Types n Join_Path oe Keep_Files n Mail_Points a
134. ting 5 112 Administrator 12 Administrator Guide ix 9 Aerospace computing 2 Altair Engineering 3 4 Altair Grid Technologies ii 3 Ames Research Center xi API ix 5 9 12 101 Attribute account_string 40 arch 30 cput 30 defined 12 mem 30 modifing 83 mppe 31 mppt 31 mta 32 ncpus 30 124 nice 30 pcput 30 pf 32 pmem 30 pmppt 32 PBS Pro 5 4 129 User Guide pnepus 32 ppf 32 priority 6 38 psds 32 pvmem 31 rerunable 13 37 resources_list 29 sds 32 software 31 vmem 31 walltime 31 authorization 26 B Batch job 16 processing 12 boolean 43 C Changing order of jobs 89 Checking status of jobs 67 of queues 71 of server 70 Checkpointing interval 39 130 Index SGI MPI 113 CLI 16 Cluster 10 Cluster Node 10 Command line interface 16 Commands 8 Common User Environment 5 Complex 12 Computational Grid Support 5 Cray 31 cred 114 credential 114 Cross System Scheduling 5 D DCE 114 Deleting Jobs 84 Destination defined 12 identifier 12 specifing 34 Display nodes assigned to job 74 non running jobs 73 queue limits 75 running jobs 73 size in gigabytes 74 size in megawords 74 user specific jobs 73 Distributed clustering 5 workload management 7 E Enterprise wide Resource Sharing 4 ENVIRONMENT 125 127 Environment Variables 125 excl 123 Exclusive VP 10 Executor 9 External Reference Specification ix 12 F File thosts 26 27 Shosts 27 attribute 30 output 97 output and error 41 spe
135. to even out the workload on each host Being a policy the dis tribution of jobs across execution hosts is solely a function of the Job Scheduler A queue is a named container for jobs within a Server There are two types of queues defined by PBS routing and execution A rout ing queue is a queue used to move jobs to other queues including those that exist on different PBS Servers Routing queues are simi lar to the NQS pipe queues A job must reside in an execution queue to be eligible to run and remains in an execution queue during the time it is running In spite of the name jobs in a queue need not be processed in queue order first come first served or FIFO Nodes have attributes associated with them that provide control information The attributes defined for nodes are state type ntype the list of jobs to which the node is allocated properties max_running max_user_run max_group_run and both assigned and available resources resources_assigned and resources _available A set of zero or more properties may be given to each node in order to have a means of grouping nodes for allocation The property is nothing more than a string of alphanumeric characters first charac ter must be alphabetic without meaning to PBS The PBS adminis trator may assign to nodes whatever property names desired Your PBS administrator will notify you of any locally defined properties PBS consists of one Job Server pbs_server
136. tus Adv Reservation pbs_hostn Report host name s pbs_rsub Submit Adv Reservation pbs_probe PBS diagnostic tool pbsdsh PBS distributed shell pbs_rcep File transfer tool pbspoe Job launcher IBM POE pbs_tclsh TCL shell with PBS API qalter Alter job pbsfs Report fairshare usage qdel Delete job pbsnodes Node manipulation tool qhold Hold a job printjob Report job details qmove Move job qdisable Disable a queue qmsg Send message to job qenable Enable a queue qorder Reorder jobs qmgr PBS manager interface girls Release hold on job qrerun Requeue a running job qselect Select jobs by criteria qrun Manually start a job qsig Send signal to job qstart Start a queue gstat Status job queue server gqstop Stop a queue qsub Submit a job qterm Shutdown PBS Server xpbs Graphical User Interface tracejob Report job history xpbsmon GUI monitoring tool xpbs admin Graphical User Interface 18 Chapter 3 Getting Started With PBS 3 4 User s PBS Environment In order to have your system environment interact seamlessly with PBS there are several items that need to be checked In many cases your system administrator will have already set up your environment to work with PBS In order to use PBS to run your work the following are needed User must have access to the resources hosts that the site has configured for PBS User must have a valid group account on the execution hosts User must be able to transfer files between host
137. uate to han dle the complex scheduling requirements presented by such systems In addition com puter system managers wanted greater control over their compute resources and users wanted a single interface to the systems In the early 1990 s NASA needed a solution to this problem but found nothing on the market that adequately addressed their needs So NASA led an international effort to gather requirements for a next generation resource management system The requirements and functional specification were later adopted as an IEEE POSIX standard 1003 2d Next NASA funded the development of a new resource management system compliant with the standard Thus the Portable Batch Sys tem PBS was born PBS was quickly adopted on distributed parallel systems and replaced NQS on traditional supercomputers and server systems Eventually the entire industry evolved toward distrib uted parallel systems taking the form of both special purpose and commodity clusters Managers of such systems found that the capabilities of PBS mapped well onto cluster systems For information on converting from NQS to PBS see Appendix B The PBS story continued when Veridian the R amp D contractor that developed PBS for NASA released the Portable Batch System Professional Edition PBS Pro a commer cial enterprise ready workload management solution Three years later the Veridian PBS Products business unit was acquired by Altair Engineering Inc Altair set up t
138. ubmit your job 4 3 Submitting a PBS Job Let s assume the above example script is in a file called mysubrun We submit this script using the qsub command qsub mysubrun 16387 cluster pbspro com Notice that upon successful submission of a job PBS returns a job identifier e g 16387 cluster pbspro com in the example above This identifier is a handle to the job It s format will always be sequence number servername domain You ll need the job identifier for any actions involving the job such as checking job sta tus modifying the job tracking the job or deleting the job PBS Pro 5 4 25 User Guide In the previous example we submitted the job script to PBS which in turn read the resource directive contained in the script However you can override resource attributes contained in the job script by specifying them on the command line In fact any job sub mission option or directive that you can specify inside the job script you can also specify on the qsub command line This is particularly useful if you just want to submit a single instance of your job but you don t want to edit the script For example qsub l ncpus 16 1 walltime 4 00 00 mysubrun 16388 cluster pbspro com In this example the 16 CPUs and 4 hours of wallclock time will override the values spec ified in the job script Note that you are not required to use a separate 1 for each resource you request
139. user operator and system A user may place a user hold upon any job the user owns An operator who is a user with operator privilege may place either an user or an operator hold on any job The PBS Manager may place any hold on any job The usage syntax of the qhold com mand is ghold h hold_list job_identifier The hold_list defines the type of holds to be placed on the job The hold_list argument is a string consisting of one or more of the letters u o or s in any combination or the letter n The hold type associated with each letter is Letter Meaning n none u user o operator S system If no h option is given the user hold will be applied to the jobs described by the job_identifier operand list If the job identified by job_identifier is in the queued held or waiting states then all that occurs is that the hold type is added to the job The job is then placed into held state if it resides in an execution queue 86 Chapter 7 Working With PBS Jobs If the job is in running state then the following additional action is taken to interrupt the execution of the job If checkpoint restart is supported by the host system requesting a hold on a running job will cause 1 the job to be checkpointed 2 the resources assigned to the job to be released and 3 the job to be placed in the held state in the execution queue If checkpoint restart is not supported qhold will only set
140. ut when executed the standard input output and error streams of the job are connected through qsub to the terminal session in which qsub is running If the I option is specified on the command line or in a script directive the job is an interactive job If a script is given it will be processed for directives but any executable commands will be discarded When the job begins execution all input to the job is from the terminal session in which qsub is running When an interactive job is sub mitted the qsub command will not terminate when the job is submitted qsub will remain running until the job terminates is aborted or the user interrupts qsub with a SIGINT the control C key If qsub is interrupted prior to job start it will query if the user wishes to exit If the user responds yes qsub exits and the job is aborted Once the interactive job has started execution input to and output from the job pass through qsub Keyboard generated interrupts are passed to the job Lines entered that PBS Pro 5 4 43 User Guide begin with the tilde character and contain special sequences are interpreted by qsub itself The recognized special sequences are qsub terminates execution The batch job is also terminated susp Suspend the qsub program if running under the C shell susp is the suspend character usually CNTL Z asusp Suspend the input half of qsub terminal to job but allow output to continue to be displayed On
141. vironment variable names start with the characters PBS_ Some are then followed by a capital O PBS_O_ indicating that the variable is from the job s originating environment i e the user s Appendix A gives a full listing of all environ ment variables provided to PBS jobs and their meaning The following short example lists some of the more useful variables and typical values 22 Chapter 3 Getting Started With PBS PBS_O_HOME u james PBS_O_LOGNAMF james PBS_O_PATH usr new bin usr local bin bin PBS_O_SHELL sbin csh PBS_O_TZ PST8PDT PBS_O_HOST crayl pbspro com PBS_O_WORKDIR u james PBS_O QUEUE submit PBS _JOBNAME INTERACTIVE PBS_JOBID 16386 crayl pbspro com PBS_QUEUE crayq PBS_ENVIRONMENT PBS_INTERACTIVE There are a number of ways that you can use these environment variables to make more efficient use of PBS In the example above we see PBS_ENVIRONMENT which we used earlier in this chapter to test if we were running under PBS Another commonly used vari able is PBS_O_WORKDIR which contains the name of the directory from which the user submitted the PBS job There are also two environment variables that you can set to affect the behavior of PBS The environment variable PBS_DEFAULT defines the name of the default PBS Server Typically it corresponds to the system name of the host on which the Server is running If PBS_DEFAULT is not set the defau
142. vity W Job is in the Waiting state Jobs will be selected which are in any of the specified states Restricts selection to jobs owned by the specified user names This provides a means of limiting the selection to jobs owned by one or more users The syntax of the user_list is user_name host user_name host Host names may be wild carded on the left end e g pbspro com User_name without a host is equiva lent to user_name i e at any host Jobs will be selected which are owned by the listed users at the corresponding hosts 80 Chapter 6 Checking Job System Status For example say you want to list all jobs owned by user barry that requested more than 16 CPUs You could use the following gselect command syntax o qselect u barry 1 ncpus gt 16 121 south 133 south 154 south Notice that what is returned is the job identifiers of jobs that match the selection criteria This may or may not be enough information for your purposes Many users will use UNIX shell syntax to pass the list of job identifiers directly into qst at for viewing purposes as shown in the next example qstat a qselect u barry 1 ncpus gt 16 Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time Time AL th barry workq airfoil 32 133 th barry workq trialx 20 154 th barry workg airfoil 32 Note This technique of using the output of the qselect command as in
143. xpansion will occur However at job end PBS will attempt to delete the file whose name actually contains the wildcard character and will fail to find it This will leave all the staged in files in place undeleted Using xpbs to set up file staging directives may be easier than using the command line On the Submit Job window in the miscellany options section far left center of window click on the file staging button This will launch the File Staging dialog box shown below in which you will be able to set up the file staging you desire 100 Chapter 8 Advanced PBS Features The File Selection Box will be initialized with your current working directory If you wish to select a different directory double click on its name xpbs with then list the contents of the new directory in the File Selection Box When the correct directory is displayed sim ply click on the name of the file you wish to stage in or out Its name will be written in the File Selected area Next click either of the Add file selected button to add the named file to either the stage in or stage out list Doing so will write the file name into the corresponding area on the lower half of the File Staging window Now you need to provide location information For stage in type in the path and filename where you want the named file placed For stage out specify the hostname and pathname where you want the named file delivered You may repeat this process for as ma
144. y and authentication module replaced with the DCE module There are two W options available with qsub which will enable a dcelogin context to be set up for the job when it eventually executes The use may specify either an encrypted password or a forwardable renewable Kerberos V5 TGT Specify the W cred dce option to qsub if a forwardable renewable Kerberos V5 TGT ticket granting ticket with the user as the listed principal is what is to be sent with the job If the user has an established credentials cache and a non expired forwardable renewable TGT is in the cache that information is used The other choice W cred dce pass causes the gsub command to interact with the user to generate a DES encryption of the user s password This encrypted password is sent to the PBS Server and MOM processes where it is placed in a job specific file for later use by pbs_mom in acquiring a DCE login context for the job The information is destroyed when the job terminates is deleted or aborts Important The W pwd option to qsub has been superseded by the above two options and therefore should no longer be used Any acquired login contexts and accompanying DCE credential caches established for the job get removed on job termination or deletion qsub Wcred dce lt other qsub options gt job script 8 12 Running PBS in a Kerberos Environment PBS Pro includes optional support for Kerberos only i e no DCE envi
145. y do not establish limits on the job The assignment operator however is equivalent to separate specifications of 1 mem x and 1 walltime y in order to set the job limits You can do more than just using the equality and assignment operators You can describe the characteristics of a node but not request them For example if you were to request the following qsub PBS Pro 5 4 45 User Guide l resc ncpus gt 16 amp amp mem gt 2GB lncpus 2 lmem 100MB you would be indicating that you want a node with more then 16 CPUs but you only want two of them allocated to your job 4 11 Job Attributes A PBS job has the following attributes which may be set by the various options to qsub for details see section 4 9 Job Submission Options on page 32 Account_Name Checkpoint depend Error Path Execution_Ti me group_list Hold_Types Reserved for local site accounting If specified using the A option to qsub this value is carried within the job for its duration and is included in the job accounting records If supported by the Server implementation and the host operating system the checkpoint attribute determines when checkpointing will be performed by PBS on behalf of the job The legal values for checkpoint are described under the qalter and qsub commands The type of inter job dependencies specified by the job owner The final path name for the file containing the job s standard error
Download Pdf Manuals
Related Search
Related Contents
VIVOTEK PT8133W surveillance camera CT HiSpeed Series Theory of Operation Jenn-Air JJW2827 Double Oven User Manual 200701 - International Paper Histoire de TeX sous Dos et Windows à l`École nationale supérieure iOPS-76 Series User Manual TCP-124 取扱説明書 (2.43 MB/PDF) Copyright © All rights reserved.
Failed to retrieve file